libvirt: Attempting to resume VMs with a corrupt save file is irrecoverable

Bug #1094398 reported by Rafi Khardalian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned

Bug Description

If a VM is suspended and the save file becomes corrupt somehow, either because it was never saved properly or otherwise, Nova cannot recover. It will continue to limit the user to "resume" or "terminate" with no option to just reboot without trying to load the save file. The VM will be in this state forever until an admin manually intervenes. We can do better, as the condition is completely recoverable.

The fix is to add proper exception handling to the driver.resume() call such that it catches this case, so that it can clear the saved file (equivalent of 'virsh managedsave-remove') and simply start the VM normally. The user should probably be notified when this occurs, to the effect of the resume having failed but the VM being rebooted anyway.

Changed in nova:
assignee: nobody → Rafi Khardalian (rkhardalian)
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Dan Genin (daniel-genin) wrote :

No activity for 575 days, so changing state from In Progress to New.

Changed in nova:
status: In Progress → New
Revision history for this message
Dan Genin (daniel-genin) wrote :

Could not verify the bug.

-----------
Setup:
-----------
* DevStack with Nova e0fbb747059bc296a94382739f3b3eddfc2baa9e
* libvirt virtualizatino driver

--------------------
Steps taken:
--------------------
1. Created an instance
3. Created and launced a "ticking" script (to verify that the state is not restored)
2. Suspended the instance with nova suspend
3. Removed the libvirt domain save file in /var/lib/libvirt/qemu/save
4. Resumed the instance with nova resume
5. Verified that the "ticking" script is no longer running

Also tried overwriting the saved state file with garbage in step 3.
-----------
Result:
-----------

The instance resumes normally even without/with corrupted saved state.

Changed in nova:
status: New → Incomplete
Revision history for this message
Sean Dague (sdague) wrote :

Assuming this is fixed now as it can't be reproduced. Please reopen if that's not true

Changed in nova:
status: Incomplete → Invalid
assignee: Rafi Khardalian (rkhardalian) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.