Comment 9 for bug 1880389

Revision history for this message
sean mooney (sean-k-mooney) wrote :

regarding the VM Resumimg before the 'brctl addif'.

the libvirt xml we generate contains the name of the linux bridge the tap should be added
too. so the linux bridge agent does not need to actully do the brctl addif command.
the tap should already be a member of that bridge when the vm is resumed.

it looks like the vm paused on the source compute node at
020-05-26 14:56:05.246 7 INFO nova.compute.manager [req-fe4495ae-f1a7-4e93-871e-4d034098babd - - - - -] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] VM Paused (Lifecycle Event)

and it resumed on the dest at
2020-05-26 14:56:05.303 6 INFO nova.compute.manager [req-8cd4842d-4783-42b3-9a8e-c4e757b8e6f0 - - - - -] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] VM Resumed (Lifecycle Event)

previously at 2020-05-26 14:55:54.639 6 DEBUG nova.virt.libvirt.driver [req-28bba2f2-89d9-4cbb-8b6a-7a9690469c86 b114d7969c0e465fbd15c2911ca4bb23 28e6517b7d6d4064be1bc878b590c40c - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Plugging VIFs before live migration. pre_live_migration /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7621

on the dest node we have started per pluging the network backend which successfully completed at

2020-05-26 14:55:58.769 6 INFO os_vif [req-28bba2f2-89d9-4cbb-8b6a-7a9690469c86 b114d7969c0e465fbd15c2911ca4bb23 28e6517b7d6d4064be1bc878b590c40c - default default] Successfully plugged vif VIFBridge(active=True,address=fa:16:3e:e1:50:ac,bridge_name='brq49b34298-a8',has_traffic_filtering=True,id=b3526533-dc6a-4174-bd3b-c300e78eda62,network=Network(49b34298-a85a-42a9-b264-b3a9242fef8f),plugin='linux_bridge',port_profile=,preserve_on_delete=False,vif_name='tapb3526533-dc')

this happens before we call libvirt to migrate the instance so at this point os-vif has ensured the linux bridge "brq49b34298-a8" is created so that when the vm starts the the tap is directly created in the correct bridge.

at 2020-05-26 14:56:02.172 7 DEBUG nova.compute.manager [req-40866f62-6362-4e9a-910e-365c3452d29f 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received event network-changed-b3526533-dc6a-4174-bd3b-c300e78eda62 external_instance_event /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:8050

we received a network-changed

and then at 2020-05-26 14:56:04.314 7 DEBUG nova.compute.manager [req-379d2410-5959-45ae-89ce-649dca3ed666 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received event network-vif-plugged-b3526533-dc6a-4174-bd3b-c300e78eda62 external_instance_event /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:8050

we recive a network-vif-plugged event which should ideally only be sent by the ml2 driver when the l2 agent has finished wireing up the networking on the destintaiton node.

as you pointed out the l2 agent does not finish adding the vlan subport to the correct bridge until

2020-05-26 14:56:25.743 6 DEBUG neutron.agent.linux.utils [req-fcca2dcc-5578-4827-ae05-d10935d35223 - - - - -] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'brctl', 'addif', 'brq49b34298-a8', 'p2p2.64'] create_process /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/utils.py:87

19 seconds later

so i think the issue is that the linux bridge ml2 driver is not sending plug time network-vif-plugged events but is instead sending bind time events.

we wait for the netwroking to be configured here
https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L6420-L6425
which wait for the network-vif-plugged event i showed in the log
https://github.com/openstack/nova/blob/bea91b8d58d909852949726296149d93f2c639d5/nova/compute/manager.py#L6352-L6362

before actully starting the migration here
https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L6467-L6470

the linux bridge l2 agent should only notify nova that the iterface is plugged when the tap is fully wired up

https://github.com/openstack/neutron/blob/4acc6843e849e98cd04a6d01861555c3e120f081/neutron/plugins/ml2/drivers/agent/_common_agent.py#L303-L306
but as the comment suggest this behavior is racy

https://github.com/openstack/neutron/blob/4acc6843e849e98cd04a6d01861555c3e120f081/neutron/plugins/ml2/drivers/agent/_common_agent.py#L259-L296

in this it started ensurign the bridge had connectiyt to the physical network at
2020-05-26 14:56:17.576 6 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-fcca2dcc-5578-4827-ae05-d10935d35223 - - - - -] Creating subinterface p2p2.64 for VLAN 64 on interface p2p2 ensure_vlan /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:303

which is long after the vm resumed at 2020-05-26 14:56:05.303

so with that in mind i think this is a neutron bug and its is reporting the network is pluged before it actully is.

i should not the netwrok-vif-plugged event i noted in the log was "unexpected"

2020-05-26 14:56:04.316 7 WARNING nova.compute.manager [req-379d2410-5959-45ae-89ce-649dca3ed666 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received unexpected event network-vif-plugged-b3526533-dc6a-4174-bd3b-c300e78eda62 for instance with vm_state active and task_state migrating.

meaning that nova had already recived a network-vif-plugged event previously when it was waiting for the networking on the destintaiotn to be completed but i think that unexpect plug event was the actual event that should have start the migration.

that is still signifcantly before neutron actully finished wiring up the networking which completed at 2020-05-26 14:56:21.061 6 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-fcca2dcc-5578-4827-ae05-d10935d35223 - - - - -] Done creating subinterface p2p2.64 ensure_vlan /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:317

neutron should not have sent any netwrok-vif-plugged event untill at least the physical interface vlan subport was added to the bridge.

i dont think this is actully a nova bug but suspect its a neutron bug.
likely due to the cahgne that were intoduced for multiple port binding.