virsh start domain sometimes fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Precise |
Confirmed
|
Medium
|
Unassigned |
Bug Description
I have at least four domains I control over Jenkins, in a matrix job. When I start the Jenkins job, all four domains will be started, each on its own process (i.e., since this is a matrix job, Jenkins will expand & start all instances at the same time).
Most of the times, at least one of the domains fail to be started. There is no visible fix on one of the domain, either could fail.
When it fails, I see this:
+ virsh -d 5 -l visrh_start.log start clean-oneiric-
error: failed to get domain 'clean-
error: server closed connection:start: domain(optdata): clean-oneiric-
start: <domain> trying as domain NAME
+ '[' 1 '!=' 0 ']'+ echo 'virsh start clean-oneiric-
Chatting with Serge, he suggests bug 903212 as a possible hit, and asked me to run libvirt with debug=1 (and cross my fingers against a possible heisenbug). I am opening this bug as a placeholder for when I am able to run libvirt under debug.
ProblemType: BugDistroRelease: Ubuntu 11.10
Package: libvirt-bin 0.9.2-4ubuntu15.2
ProcVersionSign
Uname: Linux 3.0.0-16-server x86_64
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
Date: Wed Mar 21 14:05:42 2012SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile.
mtime.conffile.
mtime.conffile.
mtime.conffile.
WORKAROUND:
retry the virsh start command; try *not* to run cuncurrent 'virsh list'.
summary: |
- virsh start domain sometime fail + virsh start domain sometime fail in oneiric |
Changed in libvirt (Ubuntu): | |
importance: | Undecided → High |
summary: |
- virsh start domain sometime fail in oneiric + virsh start domain sometimes fail in oneiric |
description: | updated |
summary: |
- virsh start domain sometimes fail in oneiric + virsh start domain sometimes fail |
Changed in libvirt (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in libvirt (Ubuntu Precise): | |
status: | New → Confirmed |
in bug 903212 it's suggested that a specific, not-cleanly- cherrypickable patchset might fix it. However, the rationale behind the patchset doesn't match the patchset. asctime_r etc *are* threadsafe. The only thing I can find in the source which isn't is one use of localtime (not localtime_r). It may be worth fixing that and seeing if it helps.