Comment 2 for bug 1784579

Revision history for this message
Matt Riedemann (mriedem) wrote :

Well, clearly at some point vif binding failed and that's why you now have a "binding_failed" vif type in the info cache for the instance. Did you check the neutron agent logs on the source and/or dest hosts to see why binding failed, because that's the root issue. I've also seen cases where the vif type is "unbound". It seems nova should probably *not* store that "binding_failed" vif type in the info cache if there is another value already in the cache, but that might not be trivial to determine based on how the code is structured (we could always store off the vif types in the cache *before* getting the latest information from neutron and then do that comparison afterward and filter out any vif type changes from something like "ovs" to "binding_failed" since we know that won't work if we try to plug/unplug the vif).

To summarize, it looks like the pre_live_migration method on the destination host fails to plug vifs and you end up with the "binding_failed" error, which is raised and makes the source live_migration method fail as expected. The failure is on the dest host. As a result, the info cache is updated with "binding_failed" which causes the source compute restart to fail here:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L958

Note that we're already handling VirtualInterfacePlugException but not the more generic:

"NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'"

We should (1) fix the _init_instance logic to also handle that error so we don't fail to start the compute and (2) then you should be able to reboot the instance to fix the networking - also investigate the vif plugging failures on the destination host (compute008?).