neutron

[OVN] GW rescheduling logic is broken

Bug #1861509 reported by Daniel Alvarez on 2020-01-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	High	Maciej Jozefczyk

Bug Description

When a Chassis event happens in the SB database, we attempt to reschedule any possible unhosted gateways [0] *always* due to a problem with the existing logic:

    def get_unhosted_gateways(self, port_physnet_dict, chassis_physnets,
                              gw_chassis):
        unhosted_gateways = []
        for lrp in self._tables['Logical_Router_Port'].rows.values():
            if not lrp.name.startswith('lrp-'):
                continue
            physnet = port_physnet_dict.get(lrp.name[len('lrp-'):])
            chassis_list = self._get_logical_router_port_gateway_chassis(lrp)
            is_max_gw_reached = len(chassis_list) < ovn_const.MAX_GW_CHASSIS
            for chassis_name, prio in chassis_list:
                # TODO(azbiswas): Handle the case when a chassis is no
                # longer valid. This may involve moving conntrack states,
                # so it needs to discussed in the OVN community first.
                if is_max_gw_reached or utils.is_gateway_chassis_invalid(
                        chassis_name, gw_chassis, physnet, chassis_physnets):
                    unhosted_gateways.append(lrp.name)
        return unhosted_gateways

1) is_max_gw_reached is always going to be True (as normally the possible candidates are less than the maximum)

2) unhosted_gateways.append(lrp.name) is executed inside a loop where lrp doesn't change meaning that if there's 3 candidates in the chassis_list, lrp.name is added 3 times to the list!!!

3) Later on, in the caller, we're iterating over the returned list [1] so as it has all the LRPs N times (N being the names of gw chassis), it will do a lot of extra and unnecessary work.

This is almost harmless in the sense that it's not breaking any functionality but it creates unnecessary updates on the logical router port:

2020-01-31 15:54:04.669 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): UpdateLRouterPortCommand(name=lrp-93b49ece-2dbc-4fcc-84cb-e7afd482a12e, columns={'gateway_chassis': ['0444b1f1-e9a9-4a73-ba78-997c87e61795', '43d98571-ccd6-48ce-bf4f-08f24aeed522', 'fe8f9887-27ef-4724-8cfc-50ec6e3d4a98']}, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2020-01-31 15:54:04.670 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121

[0] https://github.com/openstack/neutron/blob/4689564fa29915b042547bdeb3dcb44bca54e20c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py#L449
[1] https://github.com/openstack/neutron/blob/858d7f33950a80c73501377a4b2cd36b915d0f40/neutron/services/ovn_l3/plugin.py#L324

Tags:

Maciej Jozefczyk (maciejjozefczyk) on 2020-01-31

Changed in neutron:
assignee:	nobody → Maciej Jozefczyk (maciej.jozefczyk)
status:	New → Confirmed

Lucas Alvares Gomes (lucasagomes) on 2020-01-31

Changed in neutron:
importance:	Undecided → High

Jakub Libosvar (libosvar) on 2020-02-04

tags:

added: ovn

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-04: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/705660

Changed in neutron:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-08: Fix merged to neutron (master)

Reviewed: https://review.opendev.org/705660
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a1735c46d8aa3a3df50818ce176e0632592924a7
Submitter: Zuul
Branch: master

commit a1735c46d8aa3a3df50818ce176e0632592924a7
Author: Maciej Józefczyk <email address hidden>
Date: Fri Mar 20 11:11:28 2020 +0000

Don't reschedule hosts unless we need to

Only reschedule gateways/update segments when things have changed
that would require those actions.

Co-Authored-By: Terry Wilson <email address hidden>

    Change-Id: I62f53dbd862c0f38af4a1434d453e97c18777eb4
    Closes-bug: #1861510
    Closes-bug: #1861509

Changed in neutron:
status:	In Progress → Fix Released

Bernard Cafarelli (bcafarel) on 2020-04-21

tags:

added: neutron-proactive-backport-potential

Revision history for this message

Jakub Libosvar (libosvar) wrote on 2020-05-29:

The fix has been backported all the way to Queens under different Change ID: https://review.opendev.org/#/q/I62f53dbd862c0f38af4a1434d453e97c18777eb4

tags:

added: in-stable-queens in-stable-rocky in-stable-stein in-stable-train in-stable-ussuri
removed: neutron-proactive-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-22: Fix included in openstack/networking-ovn queens-eol

This issue was fixed in the openstack/networking-ovn queens-eol release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-22: Fix included in openstack/networking-ovn rocky-eol

This issue was fixed in the openstack/networking-ovn rocky-eol release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.