Lee on IRC mentioned he hit something similar today again; copy/pasting his comment from this review[1]:
Weird one, n-cpu (well the libvirt module) on the source (compute1) reports that libvirtd on the dest (controller) has refused the connection but the migration had already failed on the dest.
This smells like a python-libvirt bug tbh, surely the migrateToURI3 call on the source should be able to detect the failure on the dest and return a useful error to the caller?
Jun 10 06:41:59.477301 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: DEBUG nova.virt.libvirt.driver [-] [instance: 0a929010-7361-4372-a785-f9eb052aca46] Migration operation thread notification {{(pid=67549) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:9921}}
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: Traceback (most recent call last):
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: timer()
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: cb(*args, **kw)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/event.py", line 175, in _do_send
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: waiter.switch(result)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/greenthread.py", line 221, in main
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = function(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/utils.py", line 654, in context_wrapper
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: return func(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9576, in _live_migration_operation
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: LOG.error("Live Migration failure: %s", e, instance=instance)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self.force_reraise()
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise self.value
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9564, in _live_migration_operation
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: guest.migrate(self._live_migration_uri(dest),
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 617, in migrate
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self._domain.migrateToURI3(
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = proxy_call(self._autowrap, f, *args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = execute(f, *args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: six.reraise(c, e, tb)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise value
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = meth(*args, **kwargs)
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 2099, in migrateToURI3
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise libvirtError('virDomainMigrateToURI3() failed')
Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: libvirt.libvirtError: unable to connect to server at 'ubuntu-focal-rax-dfw-0025050041:49152': Connection refused
-----------------------------------------------------------------------
Anyway, the reason for the underlying migration failure isn't clear as QEMU and libvirtd on the dest log practically nothing aside from `failed`. Noice.
Lee on IRC mentioned he hit something similar today again; copy/pasting his comment from this review[1]:
Weird one, n-cpu (well the libvirt module) on the source (compute1) reports that libvirtd on the dest (controller) has refused the connection but the migration had already failed on the dest.
This smells like a python-libvirt bug tbh, surely the migrateToURI3 call on the source should be able to detect the failure on the dest and return a useful error to the caller?
------- ------- ------- ------- ------- ------- ------- ------- ------- ------- - /zuul.opendev. org/t/openstack /build/ f3b829801901417 c9310ad5cc5a0e8 86/log/ compute1/ logs/screen- n-cpu.txt# 11964
https:/
Jun 10 06:41:59.477301 ubuntu- focal-rax- dfw-0025050045 nova-compute[ 67549]: DEBUG nova.virt. libvirt. driver [-] [instance: 0a929010- 7361-4372- a785-f9eb052aca 46] Migration operation thread notification {{(pid=67549) thread_finished /opt/stack/ nova/nova/ virt/libvirt/ driver. py:9921} } focal-rax- dfw-0025050045 nova-compute[ 67549]: Traceback (most recent call last): focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ hubs/hub. py", line 476, in fire_timers focal-rax- dfw-0025050045 nova-compute[ 67549]: timer() focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ hubs/timer. py", line 59, in __call__ focal-rax- dfw-0025050045 nova-compute[ 67549]: cb(*args, **kw) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ event.py" , line 175, in _do_send focal-rax- dfw-0025050045 nova-compute[ 67549]: waiter. switch( result) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ greenthread. py", line 221, in main focal-rax- dfw-0025050045 nova-compute[ 67549]: result = function(*args, **kwargs) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/opt/stack/ nova/nova/ utils.py" , line 654, in context_wrapper focal-rax- dfw-0025050045 nova-compute[ 67549]: return func(*args, **kwargs) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/opt/stack/ nova/nova/ virt/libvirt/ driver. py", line 9576, in _live_migration _operation focal-rax- dfw-0025050045 nova-compute[ 67549]: LOG.error("Live Migration failure: %s", e, instance=instance) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ oslo_utils/ excutils. py", line 227, in __exit__ focal-rax- dfw-0025050045 nova-compute[ 67549]: self.force_ reraise( ) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ oslo_utils/ excutils. py", line 200, in force_reraise focal-rax- dfw-0025050045 nova-compute[ 67549]: raise self.value focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/opt/stack/ nova/nova/ virt/libvirt/ driver. py", line 9564, in _live_migration _operation focal-rax- dfw-0025050045 nova-compute[ 67549]: guest.migrate( self._live_ migration_ uri(dest) , focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/opt/stack/ nova/nova/ virt/libvirt/ guest.py" , line 617, in migrate focal-rax- dfw-0025050045 nova-compute[ 67549]: self._domain. migrateToURI3( focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ tpool.py" , line 190, in doit focal-rax- dfw-0025050045 nova-compute[ 67549]: result = proxy_call( self._autowrap, f, *args, **kwargs) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ tpool.py" , line 148, in proxy_call focal-rax- dfw-0025050045 nova-compute[ 67549]: rv = execute(f, *args, **kwargs) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ tpool.py" , line 129, in execute focal-rax- dfw-0025050045 nova-compute[ 67549]: six.reraise(c, e, tb) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ six.py" , line 719, in reraise focal-rax- dfw-0025050045 nova-compute[ 67549]: raise value focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ eventlet/ tpool.py" , line 83, in tworker focal-rax- dfw-0025050045 nova-compute[ 67549]: rv = meth(*args, **kwargs) focal-rax- dfw-0025050045 nova-compute[ 67549]: File "/usr/local/ lib/python3. 8/dist- packages/ libvirt. py", line 2099, in migrateToURI3 focal-rax- dfw-0025050045 nova-compute[ 67549]: raise libvirtError( 'virDomainMigra teToURI3( ) failed') focal-rax- dfw-0025050045 nova-compute[ 67549]: libvirt. libvirtError: unable to connect to server at 'ubuntu- focal-rax- dfw-0025050041: 49152': Connection refused ------- ------- ------- ------- ------- ------- ------- ------- ------- -
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
Jun 10 06:41:59.486684 ubuntu-
-------
Anyway, the reason for the underlying migration failure isn't clear as QEMU and libvirtd on the dest log practically nothing aside from `failed`. Noice.
------- ------- ------- ------- ------- ------- ------- ------- ------- ------- - /zuul.opendev. org/t/openstack /build/ f3b829801901417 c9310ad5cc5a0e8 86/log/ controller/ logs/libvirt/ libvirtd_ log.txt
https:/
121258 2021-06-10 06:41:58.961+0000: 73381: info : qemuMonitorIOWr ite:450 : QEMU_MONITOR_ IO_WRITE: mon=0x7f5818059ad0 buf={"execute" :"query- status" ,"id":" libvirt- 377"}^M IOProcessLine: 239 : QEMU_MONITOR_ RECV_REPLY: mon=0x7f5818059ad0 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-377"} :7279 : Shutting down vm=0x7f5818022c40 name=instance- 0000001d id=22 pid=131702, reason=failed, asyncJob=migration in, flags=0x1 pendMessage: 10691 : Append log message (vm='instance- 0000001d' message='2021-06-10 06:41:58.963+0000: shutting down, reason=failed :7197 : vm=0x7f5818022c40 name=instance- 0000001d pid=131702 flags=0x5
[..]
121267 2021-06-10 06:41:58.962+0000: 73381: info : qemuMonitorJSON
[..]
121270 2021-06-10 06:41:58.962+0000: 73383: debug : qemuProcessStop
[..]
121273 2021-06-10 06:41:58.963+0000: 73383: debug : qemuDomainLogAp
121274 ) stdioLogD=1
[..]
121278 2021-06-10 06:41:58.963+0000: 73383: debug : qemuProcessKill
https:/ /zuul.opendev. org/t/openstack /build/ f3b829801901417 c9310ad5cc5a0e8 86/log/ controller/ logs/libvirt/ qemu/instance- 0000001d_ log.txt
2021-06-10 06:41:58.963+0000: shutting down, reason=failed ------- ------- ------- ------- ------- ------- ------- ------- ------- -
-------
[1]https:/ /review. opendev. org/c/openstack /nova/+ /795533/ 5#message- 3e7b1dc4d3f8d22 b2c9a55637c7366 199d00eb56