Ubuntu
nova package

cannot detach volume from terminated instance

Bug #954692 reported by Paul Collins on 2012-03-14

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Adam Gandelman	OpenStack Compute (nova) 2012.1 "essex"
	nova (Ubuntu)	Fix Released	Medium	Unassigned

Bug Description

Version: 2012.1~e4~20120217.12709-0ubuntu1

I attached a volume to an instance via iSCSI, shut down the instance, and then attempted to detach the volume. The result is the following in nova-compute.log, and the volume remains "in-use". I also tried "euca-detach-volume --force" with the same result.

VOLUME vol-00000009 10 nova in-use (pjdc_project, zucchini, i-00001a05[ankaa], /dev/vdc) 2012-03-13T20:49:28Z

2012-03-14 03:04:08,486 DEBUG nova.rpc.common [-] received {u'_context_roles': [u'cloudadmin', u'netadmin', u'projectmanager', u'admin'], u'_context_request_id': u'req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3', u'_context_read_deleted': u'no', u'args': {u'instance_uuid': u'f7620968-686d-4a3d-a1b3-2d0881e1656d', u'volume_id': 9}, u'_context_auth_token': None, u'_context_strategy': u'noauth', u'_context_is_admin': True, u'_context_project_id': u'pjdc_project', u'_context_timestamp': u'2012-03-14T03:03:59.303517', u'_context_user_id': u'pjdc', u'method': u'detach_volume', u'_context_remote_address': u'XXX.XXX.XXX.XXX'} from (pid=8590) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:144
2012-03-14 03:04:08,487 DEBUG nova.rpc.common [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] unpacked context: {'request_id': u'req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3', 'user_id': u'pjdc', 'roles': [u'cloudadmin', u'netadmin', u'projectmanager', u'admin'], 'timestamp': '2012-03-14T03:03:59.303517', 'is_admin': True, 'auth_token': None, 'project_id': u'pjdc_project', 'remote_address': u'XXX.XXX.XXX.XXX', 'read_deleted': u'no', 'strategy': u'noauth'} from (pid=8590) unpack_context /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:186
2012-03-14 03:04:08,515 INFO nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] check_instance_lock: decorating: |<function detach_volume at 0x1f591b8>|
2012-03-14 03:04:08,516 INFO nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x1c6b1d0>| |<nova.rpc.amqp.RpcContext object at 0x4987a50>| |f7620968-686d-4a3d-a1b3-2d0881e1656d|
2012-03-14 03:04:08,516 DEBUG nova.compute.manager [req-da8a3819-70b2-4d0b-a9fb-0dfefa85f9f3 pjdc pjdc_project] instance f7620968-686d-4a3d-a1b3-2d0881e1656d: getting locked state from (pid=8590) get_lock /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1508
2012-03-14 03:04:08,668 ERROR nova.rpc.common [-] Exception during message handling
(nova.rpc.common): TRACE: Traceback (most recent call last):
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 250, in _process_data
(nova.rpc.common): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 112, in wrapped
(nova.rpc.common): TRACE: return f(*args, **kw)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 139, in decorated_function
(nova.rpc.common): TRACE: locked = self.get_lock(context, instance_uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 112, in wrapped
(nova.rpc.common): TRACE: return f(*args, **kw)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 168, in decorated_function
(nova.rpc.common): TRACE: return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1509, in get_lock
(nova.rpc.common): TRACE: instance_ref = self.db.instance_get_by_uuid(context, instance_uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 586, in instance_get_by_uuid
(nova.rpc.common): TRACE: return IMPL.instance_get_by_uuid(context, uuid)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 119, in wrapper
(nova.rpc.common): TRACE: return f(*args, **kwargs)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1452, in instance_get_by_uuid
(nova.rpc.common): TRACE: raise exception.InstanceNotFound(instance_id=uuid)
(nova.rpc.common): TRACE: InstanceNotFound: Instance f7620968-686d-4a3d-a1b3-2d0881e1656d could not be found.
(nova.rpc.common): TRACE:

Tags:

Serge Hallyn (serge-hallyn) on 2012-03-14

Changed in nova (Ubuntu):
importance:	Undecided → Medium

Adam Gandelman (gandelman-a) on 2012-03-15

Changed in nova (Ubuntu):
status:	New → Confirmed

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-03-15:

Steps to reproduce:
1. create volume
2. boot an instance (tested with cirros)
3. attach volume
4. in instance: 'sudo poweroff'
5. after kvm machine has stopped on compute node, terminate instance. results in traceback on compute.log:

  (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 719, in _delete_instance
  (nova.rpc.amqp): TRACE: self._shutdown_instance(context, instance, 'Terminating')
  (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 681, in _shutdown_instance
  (nova.rpc.amqp): TRACE: raise exception.Invalid(_msg % instance_uuid)
  (nova.rpc.amqp): TRACE: Invalid: trying to destroy already destroyed instance: b929bf81-65ee-46e0-8c07-5aae49a0213c

6. instance is destroyed, volume is still 'in-use'
7. detach volume, results in traceback in original bug report

Adam Gandelman (gandelman-a) on 2012-03-15

Changed in nova:
assignee:	nobody → Adam Gandelman (gandelman-a)
status:	New → In Progress

Revision history for this message

Thierry Carrez (ttx) wrote on 2012-03-15:

fwiw following recent fix might have changed the issue:

commit 22b484a6d0b65c2a41fd4c730a5ebddf98c70c84
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Mar 14 09:26:40 2012 -0700

Allow errored volumes to be deleted

     * Allow volumes with no host set to be deleted from db
     * Allow volumes in state error to be deleted
     * Replicates code from nova-manage command
     * Fixes bug 953594

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2012-03-15:

we discussed on irc. This is a separate bug

Changed in nova:
importance:	Undecided → Medium
tags:	added: essec-rc-potential

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-03-15:

Bogged down with other things ATM, but spent some time looking at this last night.

nova.compute.manager._shutdown_instance() raises an exception if the instance is already in POWEROFF state. It looks like this conditional has existed forever:

        if current_power_state == power_state.SHUTOFF:
            self.db.instance_destroy(context, instance_id)
            _msg = _('trying to destroy already destroyed instance: %s')
            raise exception.Invalid(_msg % instance_uuid)

It currently does nothing to cleanup bdms and inform nova-volume that the volume is free. We can certainly do that from the compute manager when the condition is met, volumes are freed up to be used elsewhere. The problem there is the iSCSI sessions are never cleaned up from the compute host. Reattaching the volume to another instance on the same compute node works okay since https://review.openstack.org/#change,4611, but having dangling iSCSI sessions hanging around seems dirty.

Looking at the libvirt compute driver, it appears the l_shutdown_instance()'s later call to driver.destroy() handles terminating an already SHUTOFF'd instance just fine, and also properly cleans up its iscsi connections, among other things. It would appear that, in teh case of libvirt, the condition raised above is obsolete. But I'm unsure if this is true for other compute drivers and hesitant to propose dropping it without confirmation.

Vish Ishaya (vishvananda) on 2012-03-15

Changed in nova:
milestone:	none → essex-rc1

Revision history for this message

Chris Behrens (cbehrens) wrote on 2012-03-15:

I think removing that block of code makes sense.

However, a few lines below, I would add a try/except around the call to self.driver.destroy(). It looks like it could raise if the instance is not created at the driver level, which could be the case for an instance that's in ERROR state.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-03-16: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/5437

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-03-16:

Chris-

I'm a little uncomfortable wrapping self.driver.destroy in a try/except block until there is a common exception language virt drivers can use to raise specific, fatal exceptions. I'm thinking of a scenario where libvirt/xs cannot shutdown instance, raises, and compute continues on with volume disassociation. The stuck instance may still be powered on and scribbling all over its disk, meanwhile the volume has been attached to another instance.

Revision history for this message

Chris Behrens (cbehrens) wrote on 2012-03-16:

Makes sense. Thanks.

OpenStack Infra (hudson-openstack) on 2012-03-16

Changed in nova:
assignee:	Adam Gandelman (gandelman-a) → Vish Ishaya (vishvananda)

Vish Ishaya (vishvananda) on 2012-03-16

Changed in nova:
assignee:	Vish Ishaya (vishvananda) → Adam Gandelman (gandelman-a)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-03-16: Fix merged to nova (master)

Reviewed: https://review.openstack.org/5437
Committed: http://github.com/openstack/nova/commit/6657f70ee3e792b39e45a2a96fb5d4b380f0ae91
Submitter: Jenkins
Branch: master

commit 6657f70ee3e792b39e45a2a96fb5d4b380f0ae91
Author: Adam Gandelman <email address hidden>
Date: Thu Mar 15 18:08:35 2012 -0700

Allow proper instance cleanup if state == SHUTOFF

    Removes an obsolete check for instance's power_state
    on shutdown_instance(). With it in place, volume detachment
    and disassociation never takes place. Compute should instead rely
    on virt drivers to handle this case and raise accordingly. libvirt's
    destroy() currently handles powered off instances fine, and properly
    detaches any existing volume connections.

Fixes bug 954692

Change-Id: I200d5b2073e5b52a9733d8324d016b14bdc96067

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2012-03-20

Changed in nova:
status:	Fix Committed → Fix Released

Chuck Short (zulcss) on 2012-03-20

Changed in nova (Ubuntu):
status:	Confirmed → Fix Released

Thierry Carrez (ttx) on 2012-04-05

Changed in nova:
milestone:	essex-rc1 → 2012.1

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #959310

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntunova package

cannot detach volume from terminated instance

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
nova package