Bug #1714254 “libvirtd hangs after # instance creation events” : Bugs : libvirt package : Ubuntu

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#1

This was detected using tempest smoke testing; however its reproducable using the reboot --hard nova operation:

nova reboot --hard <uuid>

may work a few times but in my test cloud, 1-2 times max.

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#2

Tried a few things:

No change rebuild of libvirt-python 3.5.0
New version of libvirt-python 3.6.0

I see the same issue under both python module versions.

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#3

Stacktrace:

http://paste.ubuntu.com/25438609/

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#4

Download full text (3.5 KiB)

Thread 3 (Thread 0x7f07e941e700 (LWP 30193)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f07f4891dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55e8a5ae4aa0) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007f07f51d9c45 in virMutexLock (m=m@entry=0x55e8a5ae4aa0) at ../../../src/util/virthread.c:89
#3 0x00007f07f51ba0ca in virObjectLock (anyobj=anyobj@entry=0x55e8a5ae4a90) at ../../../src/util/virobject.c:388
#4 0x00007f07f52f7558 in virNetDaemonGotInhibitReply (pending=0x7f07b8018fc0, opaque=opaque@entry=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:449
#5 0x00007f07f52f83e7 in virNetDaemonCallInhibit (what=0x7f07f53a6419 "shutdown", mode=0x7f07f53d8ce2 "delay", why=0x7f07f53f99a0 "Virtual machines need to be saved", who=0x7f07f53f9b46 "Libvirt", dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:519
#6 virNetDaemonAddShutdownInhibition (dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:541
#7 0x00007f07b2bbe4d2 in qemuProcessInit (driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migration=migration@entry=false, flags=flags@entry=17) at ../../../src/qemu/qemu_process.c:4726
#8 0x00007f07b2bc1313 in qemuProcessStart (conn=conn@entry=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../../src/qemu/qemu_process.c:5850
#9 0x00007f07b2c1f7ff in qemuDomainObjStart (conn=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=0x7f07ac003360, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../../../src/qemu/qemu_driver.c:7121
#10 0x00007f07b2c1fef6 in qemuDomainCreateWithFlags (dom=0x7f07b8000b20, flags=0) at ../../../src/qemu/qemu_driver.c:7175
#11 0x00007f07f5295ef1 in virDomainCreateWithFlags (domain=domain@entry=0x7f07b8000b20, flags=0) at ../../../src/libvirt-domain.c:6595
#12 0x000055e8a4e3227a in remoteDispatchDomainCreateWithFlags (server=0x55e8a5ae1200, msg=0x55e8a5ae4b90, ret=0x7f07b8015fe0, args=0x7f07b8000af0, rerr=0x7f07e941dbc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:4307
#13 remoteDispatchDomainCreateWithFlagsHelper (server=0x55e8a5ae1200, client=<optimized out>, msg=0x55e8a5ae4b90, rerr=0x7f07e941dbc0, args=0x7f07b8000af0, ret=0x7f07b8015fe0) at ../../../daemon/remote_dispatch.h:4282
#14 0x00007f07f52fe599 in virNetServerProgramDispatchCall (msg=0x55e8a5ae4b90, client=0x55e8a5b03840, server=0x55e8a5ae1200, prog=0x55e8a5b27070) at ../../../src/rpc/virnetserverprogram.c:437
#15 virNetServerProgramDispatch (prog=0x55e8a5b27070, server=server@entry=0x55e8a5ae1200, client=0x55e8a5b03840, msg=0x55e8a5ae4b90) at ../../../src/rpc/virnetserverprogram.c:307
#16 0x000055e8a4e53268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:148
#17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55e8a5ae1200) at ../.....

Thread 3 (Thread 0x7f07e941e700 (LWP 30193)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f07f4891dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55e8a5ae4aa0) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f07f51d9c45 in virMutexLock (m=m@entry=0x55e8a5ae4aa0) at ../../../src/util/virthread.c:89
#3  0x00007f07f51ba0ca in virObjectLock (anyobj=anyobj@entry=0x55e8a5ae4a90) at ../../../src/util/virobject.c:388
#4  0x00007f07f52f7558 in virNetDaemonGotInhibitReply (pending=0x7f07b8018fc0, opaque=opaque@entry=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:449
#5  0x00007f07f52f83e7 in virNetDaemonCallInhibit (what=0x7f07f53a6419 "shutdown", mode=0x7f07f53d8ce2 "delay", why=0x7f07f53f99a0 "Virtual machines need to be saved", who=0x7f07f53f9b46 "Libvirt", dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:519
#6  virNetDaemonAddShutdownInhibition (dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:541
#7  0x00007f07b2bbe4d2 in qemuProcessInit (driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migration=migration@entry=false, flags=flags@entry=17) at ../../../src/qemu/qemu_process.c:4726
#8  0x00007f07b2bc1313 in qemuProcessStart (conn=conn@entry=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../../src/qemu/qemu_process.c:5850
#9  0x00007f07b2c1f7ff in qemuDomainObjStart (conn=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=0x7f07ac003360, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../../../src/qemu/qemu_driver.c:7121
#10 0x00007f07b2c1fef6 in qemuDomainCreateWithFlags (dom=0x7f07b8000b20, flags=0) at ../../../src/qemu/qemu_driver.c:7175
#11 0x00007f07f5295ef1 in virDomainCreateWithFlags (domain=domain@entry=0x7f07b8000b20, flags=0) at ../../../src/libvirt-domain.c:6595
#12 0x000055e8a4e3227a in remoteDispatchDomainCreateWithFlags (server=0x55e8a5ae1200, msg=0x55e8a5ae4b90, ret=0x7f07b8015fe0, args=0x7f07b8000af0, rerr=0x7f07e941dbc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:4307
#13 remoteDispatchDomainCreateWithFlagsHelper (server=0x55e8a5ae1200, client=<optimized out>, msg=0x55e8a5ae4b90, rerr=0x7f07e941dbc0, args=0x7f07b8000af0, ret=0x7f07b8015fe0) at ../../../daemon/remote_dispatch.h:4282
#14 0x00007f07f52fe599 in virNetServerProgramDispatchCall (msg=0x55e8a5ae4b90, client=0x55e8a5b03840, server=0x55e8a5ae1200, prog=0x55e8a5b27070) at ../../../src/rpc/virnetserverprogram.c:437
#15 virNetServerProgramDispatch (prog=0x55e8a5b27070, server=server@entry=0x55e8a5ae1200, client=0x55e8a5b03840, msg=0x55e8a5ae4b90) at ../../../src/rpc/virnetserverprogram.c:307
#16 0x000055e8a4e53268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:148
#17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:169
#18 0x00007f07f51da71b in virThreadPoolWorker (opaque=opaque@entry=0x55e8a5ad6f60) at ../../../src/util/virthreadpool.c:167
#19 0x00007f07f51d9ac8 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206
#20 0x00007f07f488f6ba in start_thread (arg=0x7f07e941e700) at pthread_create.c:333
#21 0x00007f07f45c53dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-08-31:

#5

With a local reproducer like [1] I could trigger it 3/3 times overall.

Then I enabled daemon Debug like:
log_outputs="1:file:/tmp/libvirtd.log"
And it no more occurred in my test environment.

I disabled the logging, and hit it again - maybe my system just isn't beefy enough to hit the race with logging enablerd.

Formerly I also tried to go without uvtool (which uses libvirt-python like Openstack) and switched to virsh only (which is C vs libvirt0.so).
That [2] did not trigger the bug, but since I now struggle to get [1] to trigger I'm not sure.

But even with a non 100% repro rate, this confirmed that is "can" be hit without Openstack.
The resulting livbirtd backtrace [3] looks just like the one I saw from the Openstack team [4].

[1]: http://paste.ubuntu.com/25438499/
[2]: http://paste.ubuntu.com/25438591/
[3]: http://paste.ubuntu.com/25438517/
[4]: http://paste.ubuntu.com/25438609/

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-08-31:

#6

In the above [3] Thread 20 is in a lock via:

virNetDaemonCallInhibit(virNetDaemonPtr dmn
# That seems to be one of the guest creations
490 VIR_DEBUG("dmn=%p what=%s who=%s why=%s mode=%s",
491 dmn, NULLSTR(what), NULLSTR(who), NULLSTR(why), NULLSTR(mode));
[...]
519 virNetDaemonGotInhibitReply(pendingReply, dmn);

And at the same time Thread 1 is via
virNetDaemonRun
# That seems to be the main loop
804 VIR_DEBUG("dmn=%p quit=%d", dmn, dmn->quit);
[...]
829 virObjectLock(dmn);

Both lock against each other.

@James - could you take a look at the owners and/or there should be even without gdb VIR_DEBUG statements with dmn=%p in the log. Especially who is the current owner e.g. in your live case?

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#7

More gdb info

Both threads are trying to lock:

p *(pthread_mutex_t*)0x55e8a5ae4aa0
$3 = {__data = {__lock = 2, __count = 0, __owner = 30193, __nusers = 1, __kind = 512, __spins = 0, __elision = 0, __list = {
__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\361u\000\000\001\000\000\000\000\002", '\000' <repeats 21 times>, __align = 2}

owner is 30193 which is the thread attempting to start the qemu process; possible that its trying to re-lock when it already owns the lock?

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-08-31:

#8

The post above refers to [1].

In reference to that:

virMutexInit
55 pthread_mutexattr_t attr;
56 pthread_mutexattr_init(&attr);
57 pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL);
58 ret = pthread_mutex_init(&m->lock, &attr);

[...]
87 void virMutexLock(virMutexPtr m)
88 {
89 pthread_mutex_lock(&m->lock);

That is:
PTHREAD_MUTEX_NORMAL
This type of mutex does not detect deadlock. A thread attempting to relock this mutex without first unlocking it will deadlock. Attempting to unlock a mutex locked by a different thread results in undefined behaviour. Attempting to unlock an unlocked mutex results in undefined behaviour.

And from man pthread_mutex_lock:
If the mutex is already locked by the calling thread, the behavior of pthread_mutex_lock depends on the kind of the mutex. If the mutex is of the ``fast'' kind, the calling thread is suspended until the mutex is unlocked, thus effectively causing the calling thread to deadlock.

[1]: http://paste.ubuntu.com/25438609/

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-08-31:

#9

Should be broken since
https://libvirt.org/git/?p=libvirt.git;a=commit;h=ace45e67abbd9a033be54602db71a9dbc71408d7

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#10

Specifically this behavioural change we think is the cause:

        if (dbus_pending_call_get_completed(pendingReply)) {
            virNetDaemonGotInhibitReply(pendingReply, dmn);
        } else {
            dbus_pending_call_set_notify(pendingReply,
                                         virNetDaemonGotInhibitReply,
                                         dmn, NULL);
        }

if dbus_pending_call_get_completed(pendingReply) is true, then virNetDaemonGotInhibitReply is called directly which attempts to re-lock the same mutex from the same thread (its locked one frame above) resulting in the thread dead-locking itself.

Obviously this is not always true, so sometimes the call to virNetDaemonGotInhibitReply is deferred, where the mutex lock *is* appropriate in virNetDaemonGotInhibitReply.

This is new in 3.6.0.

Revision history for this message

In Red Hat Bugzilla #1487322, Christian (christian-redhat-bugs) wrote on 2017-08-31:

#18

Download full text (4.7 KiB)

Description of problem:

Version-Release number of selected component (if applicable):
- Current Head (3.6 and 3.7-rc*)

How reproducible:
- takes a while but close to 100%

Steps to Reproduce:
- Run concurrent create/start/stop/destroy loops
- Openstack reboot --hard
- probably others
- Different details are at pad.lv/1714254

Actual results:
- libvirtd hangs up on a a deadlock

Expected results:
- libvirtd continues to work

Additional info:
The change introducing the issue is
https://libvirt.org/git/?p=libvirt.git;a=commit;h=ace45e67abbd9a033be54602db71a9dbc71408d7
Author: Michal Privoznik <email address hidden>
Date: Tue Aug 1 11:04:22 2017 +0200

virNetDaemonCallInhibit: Call virNetDaemonGotInhibitReply properly

Since that there is a race that can happen and cause the execution to lock the same object twice.
Since pthread_mutex lock on the default locks behaves works without error check - the second lock is a deadlock.

On a stacktrace like the following:
Thread 3 (Thread 0x7f07e941e700 (LWP 30193)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f07f4891dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55e8a5ae4aa0) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007f07f51d9c45 in virMutexLock (m=m@entry=0x55e8a5ae4aa0) at ../../../src/util/virthread.c:89
#3 0x00007f07f51ba0ca in virObjectLock (anyobj=anyobj@entry=0x55e8a5ae4a90) at ../../../src/util/virobject.c:388
#4 0x00007f07f52f7558 in virNetDaemonGotInhibitReply (pending=0x7f07b8018fc0, opaque=opaque@entry=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:449
#5 0x00007f07f52f83e7 in virNetDaemonCallInhibit (what=0x7f07f53a6419 "shutdown", mode=0x7f07f53d8ce2 "delay", why=0x7f07f53f99a0 "Virtual machines need to be saved", who=0x7f07f53f9b46 "Libvirt", dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:519
#6 virNetDaemonAddShutdownInhibition (dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:541
#7 0x00007f07b2bbe4d2 in qemuProcessInit (driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migration=migration@entry=false, flags=flags@entry=17) at ../../../src/qemu/qemu_process.c:4726
#8 0x00007f07b2bc1313 in qemuProcessStart (conn=conn@entry=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../../src/qemu/qemu_process.c:5850
#9 0x00007f07b2c1f7ff in qemuDomainObjStart (conn=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=0x7f07ac003360, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../../../src/qemu/qemu_driver.c:7121
#10 0x00007f07b2c1fef6 in qemuDomainCreateWithFlags (dom=0x7f07b8000b20, flags=0) at ../../../src/qemu/qemu_driver.c:7175
#11 0x00007f07f5295ef1 in virDomainCreateWithFlags (domain=domain@entry=0x7f07b8000b20, flags=0) at ../../../src/libvirt-domain.c:6595
#12 0x000055e8a4e3227a in remoteDispatchDomainCreateWithFlags (server=0x55e8a5ae1200, msg=0x55e8a5ae4b...

Description of problem:

Version-Release number of selected component (if applicable):
- Current Head (3.6 and 3.7-rc*)

How reproducible:
- takes a while but close to 100%

Steps to Reproduce:
- Run concurrent create/start/stop/destroy loops
- Openstack reboot --hard
- probably others
- Different details are at pad.lv/1714254

Actual results:
- libvirtd hangs up on a a deadlock

Expected results:
- libvirtd continues to work

Additional info:
The change introducing the issue is
https://libvirt.org/git/?p=libvirt.git;a=commit;h=ace45e67abbd9a033be54602db71a9dbc71408d7
Author: Michal Privoznik <mprivozn@redhat.com>
Date:   Tue Aug 1 11:04:22 2017 +0200

virNetDaemonCallInhibit: Call virNetDaemonGotInhibitReply properly

Since that there is a race that can happen and cause the execution to lock the same object twice.
Since pthread_mutex lock on the default locks behaves works without error check - the second lock is a deadlock.

On a stacktrace like the following:
Thread 3 (Thread 0x7f07e941e700 (LWP 30193)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f07f4891dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55e8a5ae4aa0) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f07f51d9c45 in virMutexLock (m=m@entry=0x55e8a5ae4aa0) at ../../../src/util/virthread.c:89
#3  0x00007f07f51ba0ca in virObjectLock (anyobj=anyobj@entry=0x55e8a5ae4a90) at ../../../src/util/virobject.c:388
#4  0x00007f07f52f7558 in virNetDaemonGotInhibitReply (pending=0x7f07b8018fc0, opaque=opaque@entry=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:449
#5  0x00007f07f52f83e7 in virNetDaemonCallInhibit (what=0x7f07f53a6419 "shutdown", mode=0x7f07f53d8ce2 "delay", why=0x7f07f53f99a0 "Virtual machines need to be saved", who=0x7f07f53f9b46 "Libvirt", dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:519
#6  virNetDaemonAddShutdownInhibition (dmn=0x55e8a5ae4a90) at ../../../src/rpc/virnetdaemon.c:541
#7  0x00007f07b2bbe4d2 in qemuProcessInit (driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migration=migration@entry=false, flags=flags@entry=17) at ../../../src/qemu/qemu_process.c:4726
#8  0x00007f07b2bc1313 in qemuProcessStart (conn=conn@entry=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=vm@entry=0x7f07ac003360, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at ../../../src/qemu/qemu_process.c:5850
#9  0x00007f07b2c1f7ff in qemuDomainObjStart (conn=0x7f07bc000c70, driver=driver@entry=0x7f07a01e1b70, vm=0x7f07ac003360, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at ../../../src/qemu/qemu_driver.c:7121
#10 0x00007f07b2c1fef6 in qemuDomainCreateWithFlags (dom=0x7f07b8000b20, flags=0) at ../../../src/qemu/qemu_driver.c:7175
#11 0x00007f07f5295ef1 in virDomainCreateWithFlags (domain=domain@entry=0x7f07b8000b20, flags=0) at ../../../src/libvirt-domain.c:6595
#12 0x000055e8a4e3227a in remoteDispatchDomainCreateWithFlags (server=0x55e8a5ae1200, msg=0x55e8a5ae4b90, ret=0x7f07b8015fe0, args=0x7f07b8000af0, rerr=0x7f07e941dbc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:4307
#13 remoteDispatchDomainCreateWithFlagsHelper (server=0x55e8a5ae1200, client=<optimized out>, msg=0x55e8a5ae4b90, rerr=0x7f07e941dbc0, args=0x7f07b8000af0, ret=0x7f07b8015fe0) at ../../../daemon/remote_dispatch.h:4282
#14 0x00007f07f52fe599 in virNetServerProgramDispatchCall (msg=0x55e8a5ae4b90, client=0x55e8a5b03840, server=0x55e8a5ae1200, prog=0x55e8a5b27070) at ../../../src/rpc/virnetserverprogram.c:437
#15 virNetServerProgramDispatch (prog=0x55e8a5b27070, server=server@entry=0x55e8a5ae1200, client=0x55e8a5b03840, msg=0x55e8a5ae4b90) at ../../../src/rpc/virnetserverprogram.c:307
#16 0x000055e8a4e53268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:148
#17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55e8a5ae1200) at ../../../src/rpc/virnetserver.c:169
#18 0x00007f07f51da71b in virThreadPoolWorker (opaque=opaque@entry=0x55e8a5ad6f60) at ../../../src/util/virthreadpool.c:167
#19 0x00007f07f51d9ac8 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206
#20 0x00007f07f488f6ba in start_thread (arg=0x7f07e941e700) at pthread_create.c:333
#21 0x00007f07f45c53dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There was a lock at frame #6:
534     virObjectLock(dmn);
But under certain conditions (the race) it later on tried to lock the same in frame #4
449     virObjectLock(dmn);

Christian Ehrhardt  (paelzer) on 2017-08-31

no longer affects:

libvirt

Revision history for this message

James Page (james-page) wrote on 2017-08-31:

#11

avoid-double-locking.patch Edit (1.4 KiB, text/plain)

Patch that reverts back to deferred notification callback avoiding in-band double locking.

Changed in libvirt (Ubuntu):
status:	New → Triaged
importance:	Undecided → Critical

Ubuntu Foundations Team Bug Bot (crichton) on 2017-08-31

tags:

added: patch

Revision history for this message

James Page (james-page) wrote on 2017-09-01:

#12

Patch in #11 tested by myself and Corey; full pass on tempest smoke testing and unable to reproduce the problem using repeated "nova reboot --hard".

Revision history for this message

In Red Hat Bugzilla #1487322, Christian (christian-redhat-bugs) wrote on 2017-09-01:

#19

There is a suggested patch we will use as hot,fix but wait for a proper discussion/solution to be picked up eventually.

See: https://launchpadlibrarian.net/335315545/avoid-double-locking.patch

Revision history for this message

In Red Hat Bugzilla #1487322, Christian (christian-redhat-bugs) wrote on 2017-09-01:

#20

Hmm that is actually just the "partial revert" hotfix, the suggestion I actually meant is at http://paste.ubuntu.com/25439049/

Revision history for this message

In Red Hat Bugzilla #1487322, Michal (michal-redhat-bugs) wrote on 2017-09-01:

#21

I've proposed patch on the upstream list:

https://www.redhat.com/archives/libvir-list/2017-September/msg00015.html

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-09-01:

#13

While I was testing builds on the hotfix fix suggested upstream:
https://www.redhat.com/archives/libvir-list/2017-September/msg00015.html

Revision history for this message

In Red Hat Bugzilla #1487322, Christian (christian-redhat-bugs) wrote on 2017-09-01:

#22

Change builds fine and seems to fix the issue.
Survived 20 minutes in my stress loop, which it never did before.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-09-01:

#14

Testing with that fix seems good.
Pushing now as it is a hot-fix, will re-roll if upstream decides to modify and/or if more in depth regression checks will uncover anything.

Revision history for this message

In Red Hat Bugzilla #1487322, Michal (michal-redhat-bugs) wrote on 2017-09-01:

#23

And I've pushed the patch:

commit 054c6d27214ef90feab3897b052f450492d5ef3f
Author: Michal Privoznik <email address hidden>
AuthorDate: Fri Sep 1 10:33:12 2017 +0200
Commit: Michal Privoznik <email address hidden>
CommitDate: Fri Sep 1 13:21:33 2017 +0200

virnetdaemon: Don't deadlock when talking to D-Bus

https://bugzilla.redhat.com/show_bug.cgi?id=1487322

    In ace45e67abbd I tried to fix a problem that we get the reply to
    a D-Bus call while we were sleeping. In that case the callback
    was never set. So I changed the code that the callback is called
    directly in this case. However, I hadn't realized that since the
    callback is called out of order it locks the virNetDaemon.
    Exactly the very same virNetDaemon object that we are dealing
    with right now and that we have locked already (in
    virNetDaemonAddShutdownInhibition())

Signed-off-by: Michal Privoznik <email address hidden>

v3.7.0-rc2-5-g054c6d272

Revision history for this message

James Page (james-page) wrote on 2017-09-01:

#15

Proposed fix upstream resolves the libvirtd deadlock issue in my test environment.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-09-01:

#16

Things are in migration but test resources seem pretty busy, so dep8's take a while :-/

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-02:

#17

This bug was fixed in the package libvirt - 3.6.0-1ubuntu4

---------------
libvirt (3.6.0-1ubuntu4) artful; urgency=medium

* d/p/avoid-double-locking.patch: fix a deadlock that could occur when
libvirtd interactions raced with dbus causing a deadlock (LP: #1714254).

-- Christian Ehrhardt <email address hidden> Fri, 01 Sep 2017 10:29:35 +0200

Changed in libvirt (Ubuntu):
status:	Triaged → Fix Released

Bug Watch Updater (bug-watch-updater) on 2017-10-26

Changed in libvirt:
importance:	Unknown → Undecided
status:	Unknown → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	libvirt	Fix Released	Undecided	redhat-bugs #1487322
	libvirt (Ubuntu)	Fix Released	Critical	Unassigned

Ubuntu
libvirt package

libvirtd hangs after # instance creation events

Bug Description

Other bug subscribers

Patches

Remote bug watches

Ubuntulibvirt package

libvirtd hangs after # instance creation events

Bug Description

Other bug subscribers

Patches

Remote bug watches

Ubuntu
libvirt package