DVR: no need to reschedule_router if router gateway update

Bug #1496204 reported by shihanzhang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Oleg Bondarev

Bug Description

With None DVR router, if router_gateway changes, it should reschedule_router to proper l3 agents, the reason is bellow:

" When external_network_bridge is set, each L3 agent can be associated
        with at most one external network. If router's new external gateway
        is on other network then the router needs to be rescheduled to the
        proper l3 agent."

But with DVR router, I think it is no need to reschedule_router(there is no other l3 agents), and a serious problem is that during reschedule_router, the communication is broken related to this router.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Not sure what you mean "there is no other l3 agents". Can you please clarify more on why we shouldn't reschedule DVR router from dvr_snat agent if it's not associated with target external network?

Changed in neutron:
status: New → Incomplete
Revision history for this message
Oleg Bondarev (obondarev) wrote :

I guess the bug is about rescheduling from l3 dvr agents on compute nodes. Hopefully this will be covered by blueprint https://review.openstack.org/#/c/175237/

Changed in neutron:
status: Incomplete → Confirmed
Changed in neutron:
importance: Undecided → High
Ryan Moats (rmoats)
tags: added: l3-dvr-backlog
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

The bug description seems to be odd here. The one discussed above is above changing the external network based on what the l3 agent is associated with.

This happens in the past when there were more than one L3-agent running on the same node to address the multiple external network.

Now even a single L3 agent can address multiple external networks. ( This being said, there is no relation here between a DVR and a non-DVR router) they both behave the same.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

https://review.openstack.org/#/c/143567/

This was one the patch that I had earlier to get rid of the 'reschedule required' for DVR routers with single L3 agent.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Swami: patch is abandoned, care to elaborate?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This is a puzzle to me. I am failing to understand the issue at all. I'd be tempted to mark this Incomplete.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Can someone rephrase the bug description in a way that we can understand:

* what the user is trying to achieve
* what he/she sees
* what he/she would expect to happen

So that someone can look into this bug report and file an appropriate fix?

Changed in neutron:
status: Confirmed → Won't Fix
status: Won't Fix → Incomplete
importance: High → Undecided
Revision history for this message
shihanzhang (shihanzhang) wrote :

the real issue is that:
1. there are 3 compute nodes and 2 network nodes
2. create a DVR router, set router_gateway and add one subnet to this router
3. create some VMs in this above subnet(we assume every compute has VM running on it)
4. update router gateway_ip

now neutron-server will re-schedule this router, in the re-schedule method, it will firstly unbind_router from all l3 agents and bind them again, in large scale, it need much time to re-schedule, during the re-schedule, the VMs in compute node can't communicate with their gateway.
I think when we update router gateway ip, we should not unbind_router from l3 agents which running on compute nodes, just need to re-schedule the l3 agents on network nodes

Changed in neutron:
status: Incomplete → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/245631

Changed in neutron:
assignee: nobody → shihanzhang (shihanzhang)
status: New → In Progress
Revision history for this message
Ryan Moats (rmoats) wrote :

First, I would modify comment #8 to say:

4. update router gateway_ip to an IP address that can't be handled by the external network the router(s) are already attached to.

I think there needs to be some more explanation of how the routers on the compute node behave because it's not entirely clear to me that they will pick up the new default route.

Changed in neutron:
assignee: shihanzhang (shihanzhang) → Oleg Bondarev (obondarev)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/245631
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=13ce7c85552ee6af9856bf701b55d2293ba5610d
Submitter: Jenkins
Branch: master

commit 13ce7c85552ee6af9856bf701b55d2293ba5610d
Author: Oleg Bondarev <email address hidden>
Date: Wed Nov 25 15:14:18 2015 +0300

    DVR:don't reschedule the l3 agent running on compute node

    For a DVR router, when it updates router gateway_ip, it should not
    reschedule the l3 agents running on compute nodes whose mode is dvr,
    it just need to reschedule the l3 agents running on network nodes
    whose mode is dvr_snat.

    Change-Id: Ib8ea6797c88cefb473eff9a8a7b2517a6aa90ca4
    Closes-bug: #1496204
    Co-Authored-By: Oleg Bondarev <email address hidden>

Changed in neutron:
status: In Progress → Fix Committed
tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/252855

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/252855
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=12634d6040cbf6a3f8599688b9b30c47e097b974
Submitter: Jenkins
Branch: stable/liberty

commit 12634d6040cbf6a3f8599688b9b30c47e097b974
Author: Oleg Bondarev <email address hidden>
Date: Wed Nov 25 15:14:18 2015 +0300

    DVR:don't reschedule the l3 agent running on compute node

    For a DVR router, when it updates router gateway_ip, it should not
    reschedule the l3 agents running on compute nodes whose mode is dvr,
    it just need to reschedule the l3 agents running on network nodes
    whose mode is dvr_snat.

    Closes-bug: #1496204

    Conflicts:

     neutron/tests/unit/db/test_agentschedulers_db.py

    Change-Id: Ib8ea6797c88cefb473eff9a8a7b2517a6aa90ca4
    (cherry picked from commit 13ce7c85552ee6af9856bf701b55d2293ba5610d)

tags: added: in-stable-liberty
tags: removed: liberty-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.