Rebooting not allowed for instances stuck in rebooting_hard state

Bug #982108 reported by Peng Yong
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Rafi Khardalian

Bug Description

i have a rebooting an instance in one compute. and it's failed.

the task_state is always rebooting_hard, and i can't recover it any more.

i can't reboot it on dashboard or nova-client:

# nova reboot tt
ERROR: Cannot 'reboot' while instance is in task_state rebooting (HTTP 409)

finally, i update the database, and it works again:

mysql> update instances set task_state=NULL where uuid='2aef3845-55d3-40b3-b325-879a5d104496';

Revision history for this message
Peng Yong (ppyy) wrote :

i checked commit 26227b79e9246a87eeb83766cfcc8e96d294d28b, it remove many scheduler_api.

 git log -p 26227b79

- @exception.novaclient_converter
- @scheduler_api.redirect_handler
     def _action_reboot(self, req, id, body):
         if 'reboot' in body and 'type' in body['reboot']:
             valid_reboot_types = ['HARD', 'SOFT']

and i can't find any schedule api in openstack any more.

Dan Prince (dan-prince)
Changed in nova:
importance: Undecided → Low
status: New → Triaged
Mark McLoughlin (markmc)
summary: - rebooting server failed when it's in rebooting_hard state
+ Rebooting not allowed for instances stuck in rebooting_hard state
Changed in nova:
importance: Low → High
Revision history for this message
Mark McLoughlin (markmc) wrote :

Saw this recently in a real deployment - a nova-compute service locked up for 12 hours (looks like that was a kernel bug) and during that time a user tried to hard reboot their instance

The reboot cast message was lost, so the instance stayed in task_state=REBOOTING_HARD. After the compute node came back, the user wasn't able to reboot the instance because:

    @check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,
                                    vm_states.RESCUED],
                          task_state=[None, task_states.REBOOTING])
    def reboot(self, context, instance, reboot_type):

i.e. reboot is allowed while in task_state=REBOOTING_HARD

Looking at https://review.openstack.org/5090 and https://review.openstack.org/12368 I'm reading into Vish's comments that there can be problems if you do attempt hard reboot while a hard reboot is in progress

ISTM that if that's a concern, the compute manager should just take a lock on the instance while it's rebooting

Revision history for this message
Mark McLoughlin (markmc) wrote :
Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

There are a lot more cases where hard reboot resolves a user's issue. I've proposed this patch to relax the restrictions in the API:

https://review.openstack.org/20009

Changed in nova:
assignee: nobody → Rafi Khardalian (rkhardalian)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/20009
Committed: http://github.com/openstack/nova/commit/a69251804e62f871f93ffa420978f9b61a36df9c
Submitter: Jenkins
Branch: master

commit a69251804e62f871f93ffa420978f9b61a36df9c
Author: Rafi Khardalian <email address hidden>
Date: Fri Jan 18 07:01:50 2013 +0000

    Relax API restrictions around the use of reboot

    Fixes bug 1101082 bug 982108

    Reboots are the only sledge hammer users have available to resolve
    issues with their instances without administrative intervention.
    This patch modifies the API policy to allow reboot calls to be made
    in many more power and task states.

    Change-Id: Ia8702448f6b7b863da40e4d498f2e2ee0a12882e

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.