failed attach leaves stale iSCSI session on compute host

Bug #955510 reported by Paul Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nova (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Version: 2012.1~e4~20120217.12709-0ubuntu1

I attempted to attach an iSCSI volume to one of my instances. This
failed because I specified /dev/vdb as the device, which was in use.
Any further attempts to attach the volume then also failed. When I
inspected nova-compute.log, I discovered the following at the end
(full log attached):

(nova.rpc.common): TRACE: Stdout: 'Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-00000009, portal: YY.YY.YY.15,3260]\n'
(nova.rpc.common): TRACE: Stderr: 'iscsiadm: Could not login to [iface: default, target: iqn.2010-10.org.openstack:volume-00000009, portal: YY.YY.YY.15,3260]: \niscsiadm: initiator reported error (15 - already exists)\n'

I guessed from this that the previous failed attach had left the iSCSI
session up and that nova-compute wasn't able to deal with this. I logged
into the compute node, removed it with "iscsiadm --mode node
--targetname iqn.2010-10.org.openstack:volume-00000009 --portal
YY.YY.YY.15:3260 --logout" and was then able to attach the volume
to my instance.

Tags: canonistack
Revision history for this message
Paul Collins (pjdc) wrote :
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

I believe this has since been fixed: https://review.openstack.org/#change,4611

This may be a duplicate of bug #914974.

Paul, can you test on a more recent version of nova if you have access to such a deployment? Using the latest trunk I was unable to reproduce.

Revision history for this message
Paul Collins (pjdc) wrote :

I can no longer reproduce the problem with 2012.1~rc1~20120309.13261-0ubuntu1, so I reckon this is indeed fixed.

Revision history for this message
James Page (james-page) wrote :

Marking 'Fix Released'.

Changed in nova (Ubuntu):
status: New → Fix Released
Revision history for this message
Tom Hite (tdhite) wrote :
Download full text (8.2 KiB)

Hi,

I am convinced this bug is partially fixes, though not completely. I can recreate the issue very simply:

1) As a first point, let's use KVM/libvirt/Essex release on Ubuntu 12.04 (up to date as of 5/11/2012);

2) Create a volume (pick a size);

3) Attach that volume to a VM, supplying, /dev/vdz as your attach;

On my hardware (Dell) that creates precisely the bug given above. Logs are below. Those are the last nova-computes that will occur until it's restarted.

After this log, virsh *will* respond to virsh list--all, but nova-compute is no longer responsive. If I thereafter stop nova-compute, and restart, it will never start because at that point libvirt is hung (even virsh cannot connect, though it could prior to the nova-compute restart).

As for attaching under normal circumstances (pick /dev/vdc for your first attach), and all goes well. However, I do not (re)raise the 'stupid device' thing, for instance /dev/vdz, just to raise this bug. I'm working with another team using HP Proliant DL380G6 with BIOS v. P62, and it exhibits precisely the error in *all* cases of /dev/vdc, but not using higher device names (e.g., /dev/vdx) -- in that case, the VM has only /dev/vda at that point.

In any event, this is a pretty bad situation.

2012-05-14 15:13:16 DEBUG nova.rpc.amqp [-] received {u'_context_roles': [u'Member', u'admin', u'developer'], u'_context_request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', u'_context_read_deleted': u'no', u'args': {u'instance_uuid': u'6c87ccb4-3bd9-4e5e-88e9-905af0c919f3', u'mountpoint': u'/dev/vdz', u'volume_id': u'7'}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'be344db2784445da9415d19c2bb31ac1', u'_context_timestamp': u'2012-05-14T20:13:15.966217', u'_context_user_id': u'c081868089a34ca2a4a64f1af779b0c8', u'method': u'attach_volume', u'_context_remote_address': u'127.0.0.1'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 DEBUG nova.rpc.amqp [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] unpacked context: {'user_id': u'c081868089a34ca2a4a64f1af779b0c8', 'roles': [u'Member', u'admin', u'developer'], 'timestamp': '2012-05-14T20:13:15.966217', 'auth_token': '<SANITIZED>', 'remote_address': u'127.0.0.1', 'is_admin': True, 'request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', 'project_id': u'be344db2784445da9415d19c2bb31ac1', 'read_deleted': u'no'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: decorating: |<function attach_volume at 0x1dcce60>|
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x7fbf0832ab90>| |<nova.rpc.amqp.RpcContext object at 0x3dd83d0>| |6c87ccb4-3bd9-4e5e-88e9-905af0c919f3|
2012-05-14 15:13:16 DEBUG nova.compute.manager [req...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.