Comment 1 for bug 1912310

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Lee on IRC mentioned he hit something similar today again; copy/pasting his comment from this review[1]:

Weird one, n-cpu (well the libvirt module) on the source (compute1) reports that libvirtd on the dest (controller) has refused the connection but the migration had already failed on the dest.

This smells like a python-libvirt bug tbh, surely the migrateToURI3 call on the source should be able to detect the failure on the dest and return a useful error to the caller?

-----------------------------------------------------------------------
https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/compute1/logs/screen-n-cpu.txt#11964

Jun 10 06:41:59.477301 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: DEBUG nova.virt.libvirt.driver [-] [instance: 0a929010-7361-4372-a785-f9eb052aca46] Migration operation thread notification {{(pid=67549) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:9921}}
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: Traceback (most recent call last):
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: timer()
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: cb(*args, **kw)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/event.py", line 175, in _do_send
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: waiter.switch(result)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/greenthread.py", line 221, in main
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = function(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/utils.py", line 654, in context_wrapper
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: return func(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9576, in _live_migration_operation
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: LOG.error("Live Migration failure: %s", e, instance=instance)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self.force_reraise()
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise self.value
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9564, in _live_migration_operation
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: guest.migrate(self._live_migration_uri(dest),
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 617, in migrate
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self._domain.migrateToURI3(
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = proxy_call(self._autowrap, f, *args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = execute(f, *args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: six.reraise(c, e, tb)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise value
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = meth(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 2099, in migrateToURI3
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise libvirtError('virDomainMigrateToURI3() failed')
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: libvirt.libvirtError: unable to connect to server at 'ubuntu-focal-rax-dfw-0025050041:49152': Connection refused
-----------------------------------------------------------------------

Anyway, the reason for the underlying migration failure isn't clear as QEMU and libvirtd on the dest log practically nothing aside from `failed`. Noice.

-----------------------------------------------------------------------
https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/controller/logs/libvirt/libvirtd_log.txt

121258 2021-06-10 06:41:58.961+0000: 73381: info : qemuMonitorIOWrite:450 : QEMU_MONITOR_IO_WRITE: mon=0x7f5818059ad0 buf={"execute":"query-status","id":"libvirt-377"}^M
[..]
121267 2021-06-10 06:41:58.962+0000: 73381: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7f5818059ad0 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-377"}
[..]
121270 2021-06-10 06:41:58.962+0000: 73383: debug : qemuProcessStop:7279 : Shutting down vm=0x7f5818022c40 name=instance-0000001d id=22 pid=131702, reason=failed, asyncJob=migration in, flags=0x1
[..]
121273 2021-06-10 06:41:58.963+0000: 73383: debug : qemuDomainLogAppendMessage:10691 : Append log message (vm='instance-0000001d' message='2021-06-10 06:41:58.963+0000: shutting down, reason=failed
121274 ) stdioLogD=1
[..]
121278 2021-06-10 06:41:58.963+0000: 73383: debug : qemuProcessKill:7197 : vm=0x7f5818022c40 name=instance-0000001d pid=131702 flags=0x5

https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/controller/logs/libvirt/qemu/instance-0000001d_log.txt

2021-06-10 06:41:58.963+0000: shutting down, reason=failed
-----------------------------------------------------------------------

[1]https://review.opendev.org/c/openstack/nova/+/795533/5#message-3e7b1dc4d3f8d22b2c9a55637c7366199d00eb56