regarding the VM Resumimg before the 'brctl addif'.
the libvirt xml we generate contains the name of the linux bridge the tap should be added
too. so the linux bridge agent does not need to actully do the brctl addif command.
the tap should already be a member of that bridge when the vm is resumed.
it looks like the vm paused on the source compute node at
020-05-26 14:56:05.246 7 INFO nova.compute.manager [req-fe4495ae-f1a7-4e93-871e-4d034098babd - - - - -] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] VM Paused (Lifecycle Event)
and it resumed on the dest at
2020-05-26 14:56:05.303 6 INFO nova.compute.manager [req-8cd4842d-4783-42b3-9a8e-c4e757b8e6f0 - - - - -] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] VM Resumed (Lifecycle Event)
previously at 2020-05-26 14:55:54.639 6 DEBUG nova.virt.libvirt.driver [req-28bba2f2-89d9-4cbb-8b6a-7a9690469c86 b114d7969c0e465fbd15c2911ca4bb23 28e6517b7d6d4064be1bc878b590c40c - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Plugging VIFs before live migration. pre_live_migration /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7621
on the dest node we have started per pluging the network backend which successfully completed at
this happens before we call libvirt to migrate the instance so at this point os-vif has ensured the linux bridge "brq49b34298-a8" is created so that when the vm starts the the tap is directly created in the correct bridge.
at 2020-05-26 14:56:02.172 7 DEBUG nova.compute.manager [req-40866f62-6362-4e9a-910e-365c3452d29f 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received event network-changed-b3526533-dc6a-4174-bd3b-c300e78eda62 external_instance_event /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:8050
we received a network-changed
and then at 2020-05-26 14:56:04.314 7 DEBUG nova.compute.manager [req-379d2410-5959-45ae-89ce-649dca3ed666 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received event network-vif-plugged-b3526533-dc6a-4174-bd3b-c300e78eda62 external_instance_event /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:8050
we recive a network-vif-plugged event which should ideally only be sent by the ml2 driver when the l2 agent has finished wireing up the networking on the destintaiton node.
as you pointed out the l2 agent does not finish adding the vlan subport to the correct bridge until
in this it started ensurign the bridge had connectiyt to the physical network at
2020-05-26 14:56:17.576 6 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-fcca2dcc-5578-4827-ae05-d10935d35223 - - - - -] Creating subinterface p2p2.64 for VLAN 64 on interface p2p2 ensure_vlan /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:303
which is long after the vm resumed at 2020-05-26 14:56:05.303
so with that in mind i think this is a neutron bug and its is reporting the network is pluged before it actully is.
i should not the netwrok-vif-plugged event i noted in the log was "unexpected"
2020-05-26 14:56:04.316 7 WARNING nova.compute.manager [req-379d2410-5959-45ae-89ce-649dca3ed666 030ec97d13dd4d9698209595a7ac01c4 ef210c7d6b2146139a9c94ef790081d8 - default default] [instance: 7f050d9a-413c-4143-849b-75f931a2c07d] Received unexpected event network-vif-plugged-b3526533-dc6a-4174-bd3b-c300e78eda62 for instance with vm_state active and task_state migrating.
meaning that nova had already recived a network-vif-plugged event previously when it was waiting for the networking on the destintaiotn to be completed but i think that unexpect plug event was the actual event that should have start the migration.
that is still signifcantly before neutron actully finished wiring up the networking which completed at 2020-05-26 14:56:21.061 6 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-fcca2dcc-5578-4827-ae05-d10935d35223 - - - - -] Done creating subinterface p2p2.64 ensure_vlan /var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:317
neutron should not have sent any netwrok-vif-plugged event untill at least the physical interface vlan subport was added to the bridge.
i dont think this is actully a nova bug but suspect its a neutron bug.
likely due to the cahgne that were intoduced for multiple port binding.
regarding the VM Resumimg before the 'brctl addif'.
the libvirt xml we generate contains the name of the linux bridge the tap should be added
too. so the linux bridge agent does not need to actully do the brctl addif command.
the tap should already be a member of that bridge when the vm is resumed.
it looks like the vm paused on the source compute node at manager [req-fe4495ae- f1a7-4e93- 871e-4d034098ba bd - - - - -] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] VM Paused (Lifecycle Event)
020-05-26 14:56:05.246 7 INFO nova.compute.
and it resumed on the dest at manager [req-8cd4842d- 4783-42b3- 9a8e-c4e757b8e6 f0 - - - - -] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] VM Resumed (Lifecycle Event)
2020-05-26 14:56:05.303 6 INFO nova.compute.
previously at 2020-05-26 14:55:54.639 6 DEBUG nova.virt. libvirt. driver [req-28bba2f2- 89d9-4cbb- 8b6a-7a9690469c 86 b114d7969c0e465 fbd15c2911ca4bb 23 28e6517b7d6d406 4be1bc878b590c4 0c - default default] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] Plugging VIFs before live migration. pre_live_migration /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ nova/virt/ libvirt/ driver. py:7621
on the dest node we have started per pluging the network backend which successfully completed at
2020-05-26 14:55:58.769 6 INFO os_vif [req-28bba2f2- 89d9-4cbb- 8b6a-7a9690469c 86 b114d7969c0e465 fbd15c2911ca4bb 23 28e6517b7d6d406 4be1bc878b590c4 0c - default default] Successfully plugged vif VIFBridge( active= True,address= fa:16:3e: e1:50:ac, bridge_ name='brq49b342 98-a8', has_traffic_ filtering= True,id= b3526533- dc6a-4174- bd3b-c300e78eda 62,network= Network( 49b34298- a85a-42a9- b264-b3a9242fef 8f),plugin= 'linux_ bridge' ,port_profile= ,preserve_ on_delete= False,vif_ name='tapb35265 33-dc')
this happens before we call libvirt to migrate the instance so at this point os-vif has ensured the linux bridge "brq49b34298-a8" is created so that when the vm starts the the tap is directly created in the correct bridge.
at 2020-05-26 14:56:02.172 7 DEBUG nova.compute. manager [req-40866f62- 6362-4e9a- 910e-365c3452d2 9f 030ec97d13dd4d9 698209595a7ac01 c4 ef210c7d6b21461 39a9c94ef790081 d8 - default default] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] Received event network- changed- b3526533- dc6a-4174- bd3b-c300e78eda 62 external_ instance_ event /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ nova/compute/ manager. py:8050
we received a network-changed
and then at 2020-05-26 14:56:04.314 7 DEBUG nova.compute. manager [req-379d2410- 5959-45ae- 89ce-649dca3ed6 66 030ec97d13dd4d9 698209595a7ac01 c4 ef210c7d6b21461 39a9c94ef790081 d8 - default default] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] Received event network- vif-plugged- b3526533- dc6a-4174- bd3b-c300e78eda 62 external_ instance_ event /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ nova/compute/ manager. py:8050
we recive a network-vif-plugged event which should ideally only be sent by the ml2 driver when the l2 agent has finished wireing up the networking on the destintaiton node.
as you pointed out the l2 agent does not finish adding the vlan subport to the correct bridge until
2020-05-26 14:56:25.743 6 DEBUG neutron. agent.linux. utils [req-fcca2dcc- 5578-4827- ae05-d10935d352 23 - - - - -] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/ rootwrap. conf', 'brctl', 'addif', 'brq49b34298-a8', 'p2p2.64'] create_process /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ neutron/ agent/linux/ utils.py: 87
19 seconds later
so i think the issue is that the linux bridge ml2 driver is not sending plug time network-vif-plugged events but is instead sending bind time events.
we wait for the netwroking to be configured here /github. com/openstack/ nova/blob/ stable/ queens/ nova/compute/ manager. py#L6420- L6425 /github. com/openstack/ nova/blob/ bea91b8d58d9098 52949726296149d 93f2c639d5/ nova/compute/ manager. py#L6352- L6362
https:/
which wait for the network-vif-plugged event i showed in the log
https:/
before actully starting the migration here /github. com/openstack/ nova/blob/ stable/ queens/ nova/compute/ manager. py#L6467- L6470
https:/
the linux bridge l2 agent should only notify nova that the iterface is plugged when the tap is fully wired up
https:/ /github. com/openstack/ neutron/ blob/4acc6843e8 49e98cd04a6d018 61555c3e120f081 /neutron/ plugins/ ml2/drivers/ agent/_ common_ agent.py# L303-L306
but as the comment suggest this behavior is racy
https:/ /github. com/openstack/ neutron/ blob/4acc6843e8 49e98cd04a6d018 61555c3e120f081 /neutron/ plugins/ ml2/drivers/ agent/_ common_ agent.py# L259-L296
in this it started ensurign the bridge had connectiyt to the physical network at plugins. ml2.drivers. linuxbridge. agent.linuxbrid ge_neutron_ agent [req-fcca2dcc- 5578-4827- ae05-d10935d352 23 - - - - -] Creating subinterface p2p2.64 for VLAN 64 on interface p2p2 ensure_vlan /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ neutron/ plugins/ ml2/drivers/ linuxbridge/ agent/linuxbrid ge_neutron_ agent.py: 303
2020-05-26 14:56:17.576 6 DEBUG neutron.
which is long after the vm resumed at 2020-05-26 14:56:05.303
so with that in mind i think this is a neutron bug and its is reporting the network is pluged before it actully is.
i should not the netwrok-vif-plugged event i noted in the log was "unexpected"
2020-05-26 14:56:04.316 7 WARNING nova.compute. manager [req-379d2410- 5959-45ae- 89ce-649dca3ed6 66 030ec97d13dd4d9 698209595a7ac01 c4 ef210c7d6b21461 39a9c94ef790081 d8 - default default] [instance: 7f050d9a- 413c-4143- 849b-75f931a2c0 7d] Received unexpected event network- vif-plugged- b3526533- dc6a-4174- bd3b-c300e78eda 62 for instance with vm_state active and task_state migrating.
meaning that nova had already recived a network-vif-plugged event previously when it was waiting for the networking on the destintaiotn to be completed but i think that unexpect plug event was the actual event that should have start the migration.
that is still signifcantly before neutron actully finished wiring up the networking which completed at 2020-05-26 14:56:21.061 6 DEBUG neutron. plugins. ml2.drivers. linuxbridge. agent.linuxbrid ge_neutron_ agent [req-fcca2dcc- 5578-4827- ae05-d10935d352 23 - - - - -] Done creating subinterface p2p2.64 ensure_vlan /var/lib/ kolla/venv/ lib/python2. 7/site- packages/ neutron/ plugins/ ml2/drivers/ linuxbridge/ agent/linuxbrid ge_neutron_ agent.py: 317
neutron should not have sent any netwrok-vif-plugged event untill at least the physical interface vlan subport was added to the bridge.
i dont think this is actully a nova bug but suspect its a neutron bug.
likely due to the cahgne that were intoduced for multiple port binding.