[OVN] GW rescheduling logic is broken

Bug #1861509 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Maciej Jozefczyk

Bug Description

When a Chassis event happens in the SB database, we attempt to reschedule any possible unhosted gateways [0] *always* due to a problem with the existing logic:

    def get_unhosted_gateways(self, port_physnet_dict, chassis_physnets,
                              gw_chassis):
        unhosted_gateways = []
        for lrp in self._tables['Logical_Router_Port'].rows.values():
            if not lrp.name.startswith('lrp-'):
                continue
            physnet = port_physnet_dict.get(lrp.name[len('lrp-'):])
            chassis_list = self._get_logical_router_port_gateway_chassis(lrp)
            is_max_gw_reached = len(chassis_list) < ovn_const.MAX_GW_CHASSIS
            for chassis_name, prio in chassis_list:
                # TODO(azbiswas): Handle the case when a chassis is no
                # longer valid. This may involve moving conntrack states,
                # so it needs to discussed in the OVN community first.
                if is_max_gw_reached or utils.is_gateway_chassis_invalid(
                        chassis_name, gw_chassis, physnet, chassis_physnets):
                    unhosted_gateways.append(lrp.name)
        return unhosted_gateways

1) is_max_gw_reached is always going to be True (as normally the possible candidates are less than the maximum)

2) unhosted_gateways.append(lrp.name) is executed inside a loop where lrp doesn't change meaning that if there's 3 candidates in the chassis_list, lrp.name is added 3 times to the list!!!

3) Later on, in the caller, we're iterating over the returned list [1] so as it has all the LRPs N times (N being the names of gw chassis), it will do a lot of extra and unnecessary work.

This is almost harmless in the sense that it's not breaking any functionality but it creates unnecessary updates on the logical router port:

2020-01-31 15:54:04.669 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): UpdateLRouterPortCommand(name=lrp-93b49ece-2dbc-4fcc-84cb-e7afd482a12e, columns={'gateway_chassis': ['0444b1f1-e9a9-4a73-ba78-997c87e61795', '43d98571-ccd6-48ce-bf4f-08f24aeed522', 'fe8f9887-27ef-4724-8cfc-50ec6e3d4a98']}, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2020-01-31 15:54:04.670 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121

[0] https://github.com/openstack/neutron/blob/4689564fa29915b042547bdeb3dcb44bca54e20c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py#L449
[1] https://github.com/openstack/neutron/blob/858d7f33950a80c73501377a4b2cd36b915d0f40/neutron/services/ovn_l3/plugin.py#L324

Changed in neutron:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
status: New → Confirmed
Changed in neutron:
importance: Undecided → High
tags: added: ovn
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/705660

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/705660
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a1735c46d8aa3a3df50818ce176e0632592924a7
Submitter: Zuul
Branch: master

commit a1735c46d8aa3a3df50818ce176e0632592924a7
Author: Maciej Józefczyk <email address hidden>
Date: Fri Mar 20 11:11:28 2020 +0000

    Don't reschedule hosts unless we need to

    Only reschedule gateways/update segments when things have changed
    that would require those actions.

    Co-Authored-By: Terry Wilson <email address hidden>

    Change-Id: I62f53dbd862c0f38af4a1434d453e97c18777eb4
    Closes-bug: #1861510
    Closes-bug: #1861509

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
Jakub Libosvar (libosvar) wrote :

The fix has been backported all the way to Queens under different Change ID: https://review.opendev.org/#/q/I62f53dbd862c0f38af4a1434d453e97c18777eb4

tags: added: in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri
removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn queens-eol

This issue was fixed in the openstack/networking-ovn queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn rocky-eol

This issue was fixed in the openstack/networking-ovn rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.