cannot detach volume from terminated instance

Bug #954692 reported by Paul Collins
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Adam Gandelman
nova (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Version: 2012.1~e4~20120217.12709-0ubuntu1

I attached a volume to an instance via iSCSI, shut down the instance, and then attempted to detach the volume. The result is the following in nova-compute.log, and the volume remains "in-use". I also tried "euca-detach-volume --force" with the same result.

VOLUME vol-00000009 10 nova in-use (pjdc_project, zucchini, i-00001a05[ankaa], /dev/vdc) 2012-03-13T20:49:28Z

2012-03-14 03:04:08,486 DEBUG nova.rpc.common [-] received {u'_context_roles': [u'cloudadmin', u'netadmin', u'projectmanager', u'admin'], u'_context_request_id': u'req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3', u'_context_read_deleted': u'no', u'args': {u'instance_uuid': u'f7620968-686d-4a3d-a1b3-2d0881e1656d', u'volume_id': 9}, u'_context_auth_token': None, u'_context_strategy': u'noauth', u'_context_is_admin': True, u'_context_project_id': u'pjdc_project', u'_context_timestamp': u'2012-03-14T03:03:59.303517', u'_context_user_id': u'pjdc', u'method': u'detach_volume', u'_context_remote_address': u'XXX.XXX.XXX.XXX'} from (pid=8590) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:144
2012-03-14 03:04:08,487 DEBUG nova.rpc.common [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] unpacked context: {'request_id': u'req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3', 'user_id': u'pjdc', 'roles': [u'cloudadmin', u'netadmin', u'projectmanager', u'admin'], 'timestamp': '2012-03-14T03:03:59.303517', 'is_admin': True, 'auth_token': None, 'project_id': u'pjdc_project', 'remote_address': u'XXX.XXX.XXX.XXX', 'read_deleted': u'no', 'strategy': u'noauth'} from (pid=8590) unpack_context /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:186
2012-03-14 03:04:08,515 INFO nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] check_instance_lock: decorating: |<function detach_volume at 0x1f591b8>|
2012-03-14 03:04:08,516 INFO nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x1c6b1d0>| |<nova.rpc.amqp.RpcContext object at 0x4987a50>| |f7620968-686d-4a3d-a1b3-2d0881e1656d|
2012-03-14 03:04:08,516 DEBUG nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] instance f7620968-686d-4a3d-a1b3-2d0881e1656d: getting locked state from (pid=8590) get_lock /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1508
2012-03-14 03:04:08,668 ERROR nova.rpc.common [-] Exception during message handling
(nova.rpc.common): TRACE: Traceback (most recent call last):
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 250, in _process_data
(nova.rpc.common): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 112, in wrapped
(nova.rpc.common): TRACE: return f(*args, **kw)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 139, in decorated_function
(nova.rpc.common): TRACE: locked = self.get_lock(context, instance_uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 112, in wrapped
(nova.rpc.common): TRACE: return f(*args, **kw)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 168, in decorated_function
(nova.rpc.common): TRACE: return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1509, in get_lock
(nova.rpc.common): TRACE: instance_ref = self.db.instance_get_by_uuid(context, instance_uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 586, in instance_get_by_uuid
(nova.rpc.common): TRACE: return IMPL.instance_get_by_uuid(context, uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 119, in wrapper
(nova.rpc.common): TRACE: return f(*args, **kwargs)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1452, in instance_get_by_uuid
(nova.rpc.common): TRACE: raise exception.InstanceNotFound(instance_id=uuid)
(nova.rpc.common): TRACE: InstanceNotFound: Instance f7620968-686d-4a3d-a1b3-2d0881e1656d could not be found.
(nova.rpc.common): TRACE:

Changed in nova (Ubuntu):
importance: Undecided → Medium
Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Steps to reproduce:
1. create volume
2. boot an instance (tested with cirros)
3. attach volume
4. in instance: 'sudo poweroff'
5. after kvm machine has stopped on compute node, terminate instance. results in traceback on compute.log:

  (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 719, in _delete_instance
  (nova.rpc.amqp): TRACE: self._shutdown_instance(context, instance, 'Terminating')
  (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 681, in _shutdown_instance
  (nova.rpc.amqp): TRACE: raise exception.Invalid(_msg % instance_uuid)
  (nova.rpc.amqp): TRACE: Invalid: trying to destroy already destroyed instance: b929bf81-65ee-46e0-8c07-5aae49a0213c

6. instance is destroyed, volume is still 'in-use'
7. detach volume, results in traceback in original bug report

Changed in nova:
assignee: nobody → Adam Gandelman (gandelman-a)
status: New → In Progress
Revision history for this message
Thierry Carrez (ttx) wrote :

fwiw following recent fix might have changed the issue:

commit 22b484a6d0b65c2a41fd4c730a5ebddf98c70c84
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Mar 14 09:26:40 2012 -0700

    Allow errored volumes to be deleted

     * Allow volumes with no host set to be deleted from db
     * Allow volumes in state error to be deleted
     * Replicates code from nova-manage command
     * Fixes bug 953594

Revision history for this message
Vish Ishaya (vishvananda) wrote :

we discussed on irc. This is a separate bug

Changed in nova:
importance: Undecided → Medium
tags: added: essec-rc-potential
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Bogged down with other things ATM, but spent some time looking at this last night.

nova.compute.manager._shutdown_instance() raises an exception if the instance is already in POWEROFF state. It looks like this conditional has existed forever:

        if current_power_state == power_state.SHUTOFF:
            self.db.instance_destroy(context, instance_id)
            _msg = _('trying to destroy already destroyed instance: %s')
            raise exception.Invalid(_msg % instance_uuid)

It currently does nothing to cleanup bdms and inform nova-volume that the volume is free. We can certainly do that from the compute manager when the condition is met, volumes are freed up to be used elsewhere. The problem there is the iSCSI sessions are never cleaned up from the compute host. Reattaching the volume to another instance on the same compute node works okay since https://review.openstack.org/#change,4611, but having dangling iSCSI sessions hanging around seems dirty.

Looking at the libvirt compute driver, it appears the l_shutdown_instance()'s later call to driver.destroy() handles terminating an already SHUTOFF'd instance just fine, and also properly cleans up its iscsi connections, among other things. It would appear that, in teh case of libvirt, the condition raised above is obsolete. But I'm unsure if this is true for other compute drivers and hesitant to propose dropping it without confirmation.

Changed in nova:
milestone: none → essex-rc1
Revision history for this message
Chris Behrens (cbehrens) wrote :

I think removing that block of code makes sense.

However, a few lines below, I would add a try/except around the call to self.driver.destroy(). It looks like it could raise if the instance is not created at the driver level, which could be the case for an instance that's in ERROR state.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/5437

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Chris-

I'm a little uncomfortable wrapping self.driver.destroy in a try/except block until there is a common exception language virt drivers can use to raise specific, fatal exceptions. I'm thinking of a scenario where libvirt/xs cannot shutdown instance, raises, and compute continues on with volume disassociation. The stuck instance may still be powered on and scribbling all over its disk, meanwhile the volume has been attached to another instance.

Revision history for this message
Chris Behrens (cbehrens) wrote :

Makes sense. Thanks.

Changed in nova:
assignee: Adam Gandelman (gandelman-a) → Vish Ishaya (vishvananda)
Changed in nova:
assignee: Vish Ishaya (vishvananda) → Adam Gandelman (gandelman-a)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/5437
Committed: http://github.com/openstack/nova/commit/6657f70ee3e792b39e45a2a96fb5d4b380f0ae91
Submitter: Jenkins
Branch: master

commit 6657f70ee3e792b39e45a2a96fb5d4b380f0ae91
Author: Adam Gandelman <email address hidden>
Date: Thu Mar 15 18:08:35 2012 -0700

    Allow proper instance cleanup if state == SHUTOFF

    Removes an obsolete check for instance's power_state
    on shutdown_instance(). With it in place, volume detachment
    and disassociation never takes place. Compute should instead rely
    on virt drivers to handle this case and raise accordingly. libvirt's
    destroy() currently handles powered off instances fine, and properly
    detaches any existing volume connections.

    Fixes bug 954692

    Change-Id: I200d5b2073e5b52a9733d8324d016b14bdc96067

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Chuck Short (zulcss)
Changed in nova (Ubuntu):
status: Confirmed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.