resume_state_on_host_boot fails on instances in error state

Bug #1092108 reported by James Troup
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nova (Ubuntu)
Expired
High
Unassigned

Bug Description

After an unexpected host reboot, all the guests went away. I added
'--start_guests_on_host_boot=true' to /etc/nova/nova.conf and started
up nova-compute. It started some instances but then died on:

2012-12-19 11:11:47 CRITICAL nova [-] Domain not found: no domain with matching name 'instance-000000bb'
2012-12-19 11:11:47 TRACE nova Traceback (most recent call last):
2012-12-19 11:11:47 TRACE nova File "/usr/bin/nova-compute", line 49, in <module>
2012-12-19 11:11:47 TRACE nova service.wait()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 413, in wait
2012-12-19 11:11:47 TRACE nova _launcher.wait()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 131, in wait
2012-12-19 11:11:47 TRACE nova service.wait()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2012-12-19 11:11:47 TRACE nova return self._exit_event.wait()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2012-12-19 11:11:47 TRACE nova return hubs.get_hub().switch()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2012-12-19 11:11:47 TRACE nova return self.greenlet.switch()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2012-12-19 11:11:47 TRACE nova result = function(*args, **kwargs)
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 101, in run_server
2012-12-19 11:11:47 TRACE nova server.start()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 162, in start
2012-12-19 11:11:47 TRACE nova self.manager.init_host()
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 269, in init_host
2012-12-19 11:11:47 TRACE nova block_device_info)
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
2012-12-19 11:11:47 TRACE nova return f(*args, **kw)
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 852, in resume_state_on_host_boot
2012-12-19 11:11:47 TRACE nova block_device_info=block_device_info)
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 790, in _hard_reboot
2012-12-19 11:11:47 TRACE nova virt_dom = self._conn.lookupByName(instance['name'])
2012-12-19 11:11:47 TRACE nova File "/usr/lib/python2.7/dist-packages/libvirt.py", line 2370, in lookupByName
2012-12-19 11:11:47 TRACE nova if ret is None:raise libvirtError('virDomainLookupByName() failed', conn=self)
2012-12-19 11:11:47 TRACE nova libvirtError: Domain not found: no domain with matching name 'instance-000000bb'
2012-12-19 11:11:47 TRACE nova

This instance is in an error state:

RESERVATION r-n1d0t747 c519923c921a404c96ebc8210a4ec67a juju-canonistack2, juju-canonistack2-10
INSTANCE i-000000bb ami-000000bf server-187 server-187 error None (c519923c921a404c96ebc8210a4ec67a, alce) 0 m1.small 2012-07-02T02:12:56.000Z nova monitoring-disabled instance-store

And no longer exists on alce. I couldn't find any reasonable way to
kill the instance entirely (ec2-terminate-instances as an admin user
had no affect) or trivially remove it from the database. I ended up
modifying the nova libvirt driver to skip instances it can't find with
the attached patch.

(FAOD, I'm attaching the patch mostly to illustrate the problem and
 our workaround, not necessarily for use as is in the packages or
 upstream.)

This is all with current Ubuntu 12.04 packages (including
precise-proposed).

Revision history for this message
James Troup (elmo) wrote :
tags: added: canonistack
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Skip instances which can't be found in hard_reboot" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Changed in nova (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Chuck Short (zulcss) wrote :

James which version is this with?

James Page (james-page)
Changed in nova (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nova (Ubuntu) because there has been no activity for 60 days.]

Changed in nova (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.