libvirt-bin upstart job considers service "started" before it is ready

Bug #1228210 reported by Robie Basak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
uvtool
Fix Released
High
Unassigned
libvirt (Ubuntu)
Won't Fix
Undecided
Unassigned
uvtool (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Steps to reproduce:

1. This is a race condition. I observed it without needing this step, but to ensure that it reproduces, you can patch a sleep into libvirt. See reproduce.patch attachment.
2. apt-get install libvirt-bin

Actual results:

Note that /var/run/libvirt/libvirt-sock doesn't appear until some time after this command is done.

Expected results:

The daemon should be ready and answering queries at the time that the libvirt-bin package postinst has completed.

Impact:

It seems that libvirt's postinst finishes before the socket is available, and uvtool's postinst assumes that the socket is available. This causes a race. Workaround: "apt-get -f install".

Perhaps the libvirt postinst should wait for the socket to be available, or our postinst should wait. I'm not sure which.

Log:

Setting up ubuntu-cloud-utils-libvirt (0.1~alpha5) ...
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
Failed to define libvirt pool 'ubuntu-cloud'
dpkg: error processing ubuntu-cloud-utils-libvirt (--configure):
 subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin ...
Errors were encountered while processing:
 ubuntu-cloud-utils-libvirt
E: Sub-process /usr/bin/dpkg returned an error code (1)
ubuntu@ubuntu-cloud-saucy:~/repo$ sudo apt-get -f install
sudo: unable to resolve host ubuntu-cloud-saucy
Reading package lists... Done
Building dependency tree
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 76 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up ubuntu-cloud-utils-libvirt (0.1~alpha5) ...
Pool ubuntu-cloud defined from /tmp/tmp.0R9GjioWBq
Pool ubuntu-cloud marked as autostarted
Pool ubuntu-cloud started

Tags: patch
Robie Basak (racb)
Changed in uvtool:
importance: Undecided → High
Robie Basak (racb)
summary: - postinst fails to connect to libvirt
+ libvirt-bin upstart job considers service "started" when it is not
Revision history for this message
Robie Basak (racb) wrote : Re: libvirt-bin upstart job considers service "started" when it is not

Root cause analysis:

libvirtd creates and starts listening on the socket after forking to daemonize. However, it also ensures that the original process only exits after the daemon process has started to listen on the socket (using a pipe to signal readiness internally).

In the Sys V init case, this means that the init.d script would not exit until the daemon is ready.

In the Upstart case, it seems to me that Upstart's documented "expect daemon" behaviour means that Upstart treats the daemon as started as soon as the second (daemon) fork occurs, and there is no provision to wait() on the original process.

description: updated
Revision history for this message
Robie Basak (racb) wrote :

Here is the patch I'm using to reproduce the race.

Revision history for this message
Robie Basak (racb) wrote :

cjwatson suggested that I consider "expect stop" like this openssh patch: http://anonscm.debian.org/loggerhead/pkg-ssh/openssh/trunk/revision/3578

However, I also wonder if it would be a problem for upstart to wait() on the original process anyway in the cases of "expect daemon" and "expect fork". Is there any reason why this would break anything? It would seem closer to Sys V's behaviour to me.

summary: - libvirt-bin upstart job considers service "started" when it is not
+ libvirt-bin upstart job considers service "started" before it is ready
Revision history for this message
Robie Basak (racb) wrote :

So, I think there are three things we can do:

1) Make uvtool's postinst wait for libvirtd's unix socket to exist and be connectable. I think I should do this anyway as a workaround, as it's not particularly intrusive, will get us by for now and will help with the no-change backport to Precise I'd like to be possible without having to worry about a libvirt SRU. But it doesn't seem like a clean solution to me. I have a debdiff ready for this that uses socat, and another that uses a Python script to avoid a universe dependency in case we want to MIR uvtool in the future.

2) Modify libvirt's packaging to use "expect stop". I have an experimental debdiff that appears to work fine with this option and solves the original issue with no modification to uvtool's postinst needed. But it seems a bit ugly to me, since libvirt's behaviour in not existing the original called process before it is ready seems perfectly reasonable to me.

3) Make upstart wait on the original called process after it detects the fork with "expect daemon". Presumably this applies to "expect fork" as well. Is there any use case for which this isn't reasonable or would create a regression? Are there any other problems with this approach?

Revision history for this message
Robie Basak (racb) wrote :

Patch to make uvtool-libvirt.postinst wait for libvirtd's socket to be ready with socat.

Revision history for this message
Robie Basak (racb) wrote :

Patch to make uvtool-libvirt.postinst wait for libvirtd's socket to be ready with Python, to avoid a universe dependency.

Revision history for this message
Robie Basak (racb) wrote :

Experimental patch on libvirt. This introduces a sleep to help with demonstrating and testing the race, converts the packaging to use "expect stop" and patches libvirt to support sending itself a SIGSTOP when it is ready.

tags: added: patch
Revision history for this message
Robie Basak (racb) wrote :

Further discussion here: https://lists.ubuntu.com/archives/upstart-devel/2013-October/002772.html

Following that thread, I've come to the following conclusions:

libvirtd is perfectly acceptable in its current behaviour. upstart plans to support this edge case in the future. The only impact of this bug at the moment is in uvtool's postinst. A simple workaround can be used in uvtool's postinst. Although we can patch libvirtd to support "expect stop", it seems pretty invasive for something that we might want to drop in the future, and a the workaround in uvtool's postinst seems like the easier option.

So I'll mark this as Won't Fix in libvirt for now, to reflect the existence of this problem but my current intention to leave it as is. And I'll work around the problem in uvtool.

Changed in libvirt (Ubuntu):
status: New → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package uvtool - 0~bzr36-0ubuntu1

---------------
uvtool (0~bzr36-0ubuntu1) saucy; urgency=low

  * New upstream snapshot:
    - Workaround to make sure that libvirtd is running and ready before
      attempting to create the volume pool (LP: #1228210).
      + d/uvtool-libvirt-postinst: wait logic.
      + d/control: add dependency on socat (used by the wait logic).
 -- Robie Basak <email address hidden> Wed, 02 Oct 2013 11:35:06 +0100

Changed in uvtool (Ubuntu):
status: New → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Fixed in bzr rev 36.

Changed in uvtool:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.