Bug #955510 “failed attach leaves stale iSCSI session on compute...” : Bugs : nova package : Ubuntu

Revision history for this message

Paul Collins (pjdc) wrote on 2012-03-14:

#1

failed-attach-stale-session.log Edit (12.5 KiB, text/plain)

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-03-15:

#2

I believe this has since been fixed: https://review.openstack.org/#change,4611

This may be a duplicate of bug #914974.

Paul, can you test on a more recent version of nova if you have access to such a deployment? Using the latest trunk I was unable to reproduce.

Revision history for this message

Paul Collins (pjdc) wrote on 2012-03-15:

#3

I can no longer reproduce the problem with 2012.1~rc1~20120309.13261-0ubuntu1, so I reckon this is indeed fixed.

Revision history for this message

James Page (james-page) wrote on 2012-03-15:

#4

Marking 'Fix Released'.

Changed in nova (Ubuntu):
status:	New → Fix Released

Revision history for this message

Tom Hite (tdhite) wrote on 2012-05-14:

#5

Download full text (8.2 KiB)

Hi,

I am convinced this bug is partially fixes, though not completely. I can recreate the issue very simply:

1) As a first point, let's use KVM/libvirt/Essex release on Ubuntu 12.04 (up to date as of 5/11/2012);

2) Create a volume (pick a size);

3) Attach that volume to a VM, supplying, /dev/vdz as your attach;

On my hardware (Dell) that creates precisely the bug given above. Logs are below. Those are the last nova-computes that will occur until it's restarted.

After this log, virsh *will* respond to virsh list--all, but nova-compute is no longer responsive. If I thereafter stop nova-compute, and restart, it will never start because at that point libvirt is hung (even virsh cannot connect, though it could prior to the nova-compute restart).

As for attaching under normal circumstances (pick /dev/vdc for your first attach), and all goes well. However, I do not (re)raise the 'stupid device' thing, for instance /dev/vdz, just to raise this bug. I'm working with another team using HP Proliant DL380G6 with BIOS v. P62, and it exhibits precisely the error in *all* cases of /dev/vdc, but not using higher device names (e.g., /dev/vdx) -- in that case, the VM has only /dev/vda at that point.

In any event, this is a pretty bad situation.

2012-05-14 15:13:16 DEBUG nova.rpc.amqp [-] received {u'_context_roles': [u'Member', u'admin', u'developer'], u'_context_request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', u'_context_read_deleted': u'no', u'args': {u'instance_uuid': u'6c87ccb4-3bd9-4e5e-88e9-905af0c919f3', u'mountpoint': u'/dev/vdz', u'volume_id': u'7'}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'be344db2784445da9415d19c2bb31ac1', u'_context_timestamp': u'2012-05-14T20:13:15.966217', u'_context_user_id': u'c081868089a34ca2a4a64f1af779b0c8', u'method': u'attach_volume', u'_context_remote_address': u'127.0.0.1'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 DEBUG nova.rpc.amqp [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] unpacked context: {'user_id': u'c081868089a34ca2a4a64f1af779b0c8', 'roles': [u'Member', u'admin', u'developer'], 'timestamp': '2012-05-14T20:13:15.966217', 'auth_token': '<SANITIZED>', 'remote_address': u'127.0.0.1', 'is_admin': True, 'request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', 'project_id': u'be344db2784445da9415d19c2bb31ac1', 'read_deleted': u'no'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: decorating: |<function attach_volume at 0x1dcce60>|
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x7fbf0832ab90>| |<nova.rpc.amqp.RpcContext object at 0x3dd83d0>| |6c87ccb4-3bd9-4e5e-88e9-905af0c919f3|
2012-05-14 15:13:16 DEBUG nova.compute.manager [req...

Hi,

I am convinced this bug is partially fixes, though not completely. I can recreate the issue very simply:

1) As a first point, let's use KVM/libvirt/Essex release on Ubuntu 12.04 (up to date as of 5/11/2012);

2) Create a volume (pick a size);

3) Attach that volume to a VM, supplying, /dev/vdz as your attach;

On my hardware (Dell) that creates precisely the bug given above. Logs are below. Those are the last nova-computes that will occur until it's restarted.

After this log, virsh *will* respond to virsh list--all, but nova-compute is no longer responsive. If I thereafter stop nova-compute, and restart, it will never start because at that point libvirt is hung (even virsh cannot connect, though it could prior to the nova-compute restart).

As for attaching under normal circumstances (pick /dev/vdc for your first attach), and all goes well. However, I do not (re)raise the 'stupid device' thing, for instance /dev/vdz, just to raise this bug. I'm working with another team using HP Proliant DL380G6 with BIOS v. P62, and it exhibits precisely the error in *all* cases of /dev/vdc, but not using higher device names (e.g., /dev/vdx) -- in that case, the VM has only /dev/vda at that point.

In any event, this is a pretty bad situation.

2012-05-14 15:13:16 DEBUG nova.rpc.amqp [-] received {u'_context_roles': [u'Member', u'admin', u'developer'], u'_context_request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', u'_context_read_deleted': u'no', u'args': {u'instance_uuid': u'6c87ccb4-3bd9-4e5e-88e9-905af0c919f3', u'mountpoint': u'/dev/vdz', u'volume_id': u'7'}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'be344db2784445da9415d19c2bb31ac1', u'_context_timestamp': u'2012-05-14T20:13:15.966217', u'_context_user_id': u'c081868089a34ca2a4a64f1af779b0c8', u'method': u'attach_volume', u'_context_remote_address': u'127.0.0.1'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 DEBUG nova.rpc.amqp [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] unpacked context: {'user_id': u'c081868089a34ca2a4a64f1af779b0c8', 'roles': [u'Member', u'admin', u'developer'], 'timestamp': '2012-05-14T20:13:15.966217', 'auth_token': '<SANITIZED>', 'remote_address': u'127.0.0.1', 'is_admin': True, 'request_id': u'req-ac084e00-207c-43d1-a856-f5c15e2fffc3', 'project_id': u'be344db2784445da9415d19c2bb31ac1', 'read_deleted': u'no'} from (pid=22262) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: decorating: |<function attach_volume at 0x1dcce60>|
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: arguments: |<nova.compute.manager.ComputeManager object at 0x7fbf0832ab90>| |<nova.rpc.amqp.RpcContext object at 0x3dd83d0>| |6c87ccb4-3bd9-4e5e-88e9-905af0c919f3|
2012-05-14 15:13:16 DEBUG nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] instance 6c87ccb4-3bd9-4e5e-88e9-905af0c919f3: getting locked state from (pid=22262) get_lock /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1597
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: locked: |False|
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: admin: |True|
2012-05-14 15:13:16 INFO nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] check_instance_lock: executing: |<function attach_volume at 0x1dcce60>|
2012-05-14 15:13:16 AUDIT nova.compute.manager [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] [instance: 6c87ccb4-3bd9-4e5e-88e9-905af0c919f3] Attaching volume 7 to /dev/vdz
2012-05-14 15:13:16 DEBUG nova.rpc.amqp [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Making asynchronous call on volume.essexfe.momentumsoftware.com ... from (pid=22262) multicall /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:321
2012-05-14 15:13:16 DEBUG nova.rpc.amqp [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] MSG_ID is 10794236679d45bcba5490301706d441 from (pid=22262) multicall /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:324
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Attempting to grab semaphore "connect_volume" for method "connect_volume"... from (pid=22262) inner /usr/lib/python2.7/dist-packages/nova/utils.py:927
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Got semaphore "connect_volume" for method "connect_volume"... from (pid=22262) inner /usr/lib/python2.7/dist-packages/nova/utils.py:931
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Running cmd (subprocess): sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 172.31.254.13:3260 from (pid=22262) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Result was 255 from (pid=22262) execute /usr/lib/python2.7/dist-packages/nova/utils.py:235
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Running cmd (subprocess): sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 172.31.254.13:3260 --op new from (pid=22262) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219
2012-05-14 15:13:17 DEBUG nova.virt.libvirt.volume [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] iscsiadm ('--op', 'new'): stdout=New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 172.31.254.13,3260,-1 iqn.2010-10.org.openstack:volume-00000007] added
 stderr= from (pid=22262) _run_iscsiadm /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py:112
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Running cmd (subprocess): sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 172.31.254.13:3260 --login from (pid=22262) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219
2012-05-14 15:13:17 DEBUG nova.virt.libvirt.volume [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] iscsiadm ('--login',): stdout=Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-00000007, portal: 172.31.254.13,3260]
Login to [iface: default, target: iqn.2010-10.org.openstack:volume-00000007, portal: 172.31.254.13,3260]: successful
 stderr= from (pid=22262) _run_iscsiadm /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py:112
2012-05-14 15:13:17 DEBUG nova.utils [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] Running cmd (subprocess): sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000007 -p 172.31.254.13:3260 --op update -n node.startup -v automatic from (pid=22262) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219
2012-05-14 15:13:17 DEBUG nova.virt.libvirt.volume [req-ac084e00-207c-43d1-a856-f5c15e2fffc3 c081868089a34ca2a4a64f1af779b0c8 be344db2784445da9415d19c2bb31ac1] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'automatic'): stdout= stderr= from (pid=22262) _run_iscsiadm /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume.py:112

Ubuntu
nova package

failed attach leaves stale iSCSI session on compute host

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntunova package

failed attach leaves stale iSCSI session on compute host

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nova package