Duplicate iSCSI initiators causing live migration failures

Bug #1945983 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Undecided
Unassigned

Bug Description

Description
===========

See c#2 for the actual issue here.

Steps to reproduce
==================

LiveAutoBlockMigrationV225Test:test_live_migration_with_trunk or any live migration test with trunk ports fails during cleanup.

Expected result
===============

Both the test and cleanup pass without impacting libvirtd.

Actual result
=============

The test passes, cleanup locks up the single thread handling the libvirtd event loop in 6.0.0.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   stable/xena and master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   libvirt (6.0.0) and QEMU

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   N/A

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

    Trunk ports.

Logs & Configs
==============

Initially discovered and discussed as part of https://bugs.launchpad.net/nova/+bug/1912310 where the locking up within libvirtd causes other tests to then fail.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i dont see how this can be related to trunk ports

with ml2/ovs trunk ports are implemeted using ovs bridges and patch ports in ovs
from a libvirt point of view the tap device is just attach to a different bridge with vlan tagging handeled entirly in ovs.

for ml2/ovn there is no visable differnce at teh libvirt level between a trunk port and a non trunk port. in both cases its just a tap device added directly to the br-int. in the trunk case openflow rules installed in the br-int handel the vlan tagging an striping so its entirly transparent to libvirt/qemu.

i suspect that this is just an unrelated failure that happens to occur in those test ratehr then beign casued by the use fo trunk ports. i dont know of any mechanium by which the use of trunk ports could impact libvirtd in any way so setting this to incomplete until we have a theroy as to how this could be related.

Changed in nova:
status: New → Incomplete
Lee Yarwood (lyarwood)
tags: added: gate-failure
Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (4.1 KiB)

*sigh* I had this totally wrong earlier, this has nothing to do with trunk ports and is just duplicate iSCSI initiators yet again (I feel like we've fixed this many times in the past) causing the connection to the volume to be cut during post_live_migration when we unmap them from the source.

First of all the instances are clearly being used by test_volume_backed_live_migration tests:

2021-10-01 11:10:18.262 118292 INFO tempest.lib.common.rest_client [req-43af85da-1678-4d4d-950a-69062abb6aae ] Request (LiveMigrationTest:test_volume_backed_live_migration): 202 POST https://198.72.124.126/compute/v2.1/servers/45adbb55-491d-418b-ba68-7db43d1c235b/action 1.164s

2021-10-01 11:10:24.911 118290 INFO tempest.lib.common.rest_client [req-14fb6fc1-9d0a-44f7-9fbc-d24befa6ffeb ] Request (LiveAutoBlockMigrationV225Test:test_volume_backed_live_migration): 202 POST https://198.72.124.126/compute/v2.1/servers/cfdf210a-f2bf-46a4-85f2-56e7d58893c2/action 1.221s

45adbb55-491d-418b-ba68-7db43d1c235b is being live migrated to compute1 from controller with volume
ba95db3d-c2ff-40c6-ba7c-563c3070c85d attached that is itself served from c-vol on controller.

cfdf210a-f2bf-46a4-85f2-56e7d58893c2 is being live migrated to controller from compute1 with volume 867090f6-2e48-4b0b-97e3-afdbc4cc946b attached that is itself served from c-vol on compute1.

ubuntu-focal-iweb-mtl01-0026751351 == controller
ubuntu-focal-iweb-mtl01-0026751352 == compute1

We see the same initiator used to map both volumes to each compute when these instances are initially spawned:

Oct 01 11:10:11 ubuntu-focal-iweb-mtl01-0026751351 sudo[121117]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool add-initiator iqn.2010-10.org.openstack:volume-ba95db3d-c2ff-40c6-ba7c-563c3070c85d MNgH9PG7eU8gt7QLdi9n Vo78NSpxj9JAFfs4 iqn.1993-08.org.debian:01:ef12d882804f

Oct 01 11:10:18 ubuntu-focal-iweb-mtl01-0026751352 sudo[73346]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool add-initiator iqn.2010-10.org.openstack:volume-867090f6-2e48-4b0b-97e3-afdbc4cc946b h6kS7Yf39w3Xwp4qk63J i9EckeRwVHst5CKD iqn.1993-08.org.debian:01:ef12d882804f

Later during the live migration attempts we again see the same initiators used to map the volumes to the destination hosts:

Oct 01 11:10:21 ubuntu-focal-iweb-mtl01-0026751351 sudo[121346]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool add-initiator iqn.2010-10.org.openstack:volume-ba95db3d-c2ff-40c6-ba7c-563c3070c85d MNgH9PG7eU8gt7QLdi9n Vo78NSpxj9JAFfs4 iqn.1993-08.org.debian:01:ef12d882804f

Oct 01 11:10:28 ubuntu-focal-iweb-mtl01-0026751352 sudo[73576]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf cinder-rtstool add-initiator iqn.2010-10.org.openstack:volume-867090f6-2e48-4b0b-97e3-afdbc4cc946b h6kS7Yf39w3Xwp4qk63J i9EckeRwVHst5CKD iqn.1993-08.org.debian:01:ef12d882804f

45adbb55-491d-418b-ba68-7db43d1c235b wins the race to migrate first and we then see the ba95db3d-c2ff-40c6-ba7c...

Read more...

summary: - Instances with trunk ports attached taking a long time to delete
+ Duplicate iSCSI initiators causing live migration failures
Lee Yarwood (lyarwood)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "Lee Yarwood <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/812266

Changed in devstack:
status: New → In Progress
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812391
Committed: https://opendev.org/openstack/devstack/commit/714826d1a27085ba2384ca495c876588d77f0d27
Submitter: "Zuul (22348)"
Branch: master

commit 714826d1a27085ba2384ca495c876588d77f0d27
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/devstack/+/812925

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/devstack/+/812926

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/devstack/+/812928

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/devstack/+/812929

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/devstack/+/812930

Lee Yarwood (lyarwood)
no longer affects: nova
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812925
Committed: https://opendev.org/openstack/devstack/commit/ee629cc77554b18cfca77506d1531f99523d2a58
Submitter: "Zuul (22348)"
Branch: stable/xena

commit ee629cc77554b18cfca77506d1531f99523d2a58
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5
    (cherry picked from commit 714826d1a27085ba2384ca495c876588d77f0d27)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812929
Committed: https://opendev.org/openstack/devstack/commit/d927b6017880983cf873e199bb50332fe2e3f254
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit d927b6017880983cf873e199bb50332fe2e3f254
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    NOTE(lyarwood): Conflict due to
    If2f74f146a166b9721540aaf3f1f9fce3030525c not being present on
    stable/wallaby.

    Conflicts:
        lib/nova

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5
    (cherry picked from commit 714826d1a27085ba2384ca495c876588d77f0d27)
    (cherry picked from commit ee629cc77554b18cfca77506d1531f99523d2a58)
    (cherry picked from commit a41fff99b35a2723348c933dc6c37ccf218e280f)
    (cherry picked from commit 43364b7198a90b16a53f6e89d79238bd40a76953)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812928
Committed: https://opendev.org/openstack/devstack/commit/43364b7198a90b16a53f6e89d79238bd40a76953
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 43364b7198a90b16a53f6e89d79238bd40a76953
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    NOTE(lyarwood): Conflict due to
    If2f74f146a166b9721540aaf3f1f9fce3030525c not being present on
    stable/wallaby.

    Conflicts:
        lib/nova

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5
    (cherry picked from commit 714826d1a27085ba2384ca495c876588d77f0d27)
    (cherry picked from commit ee629cc77554b18cfca77506d1531f99523d2a58)
    (cherry picked from commit a41fff99b35a2723348c933dc6c37ccf218e280f)

tags: added: in-stable-victoria
tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812926
Committed: https://opendev.org/openstack/devstack/commit/a41fff99b35a2723348c933dc6c37ccf218e280f
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit a41fff99b35a2723348c933dc6c37ccf218e280f
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    NOTE(lyarwood): Conflict due to
    If2f74f146a166b9721540aaf3f1f9fce3030525c not being present on
    stable/wallaby.

    Conflicts:
        lib/nova

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5
    (cherry picked from commit 714826d1a27085ba2384ca495c876588d77f0d27)
    (cherry picked from commit ee629cc77554b18cfca77506d1531f99523d2a58)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (stable/train)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/812930
Committed: https://opendev.org/openstack/devstack/commit/c5f7cbfabe249498b45668c6838f284729fedc75
Submitter: "Zuul (22348)"
Branch: stable/train

commit c5f7cbfabe249498b45668c6838f284729fedc75
Author: Lee Yarwood <email address hidden>
Date: Mon Oct 4 18:07:17 2021 +0100

    nova: Ensure each compute uses a unique iSCSI initiator

    The current initiator name embedded in our CI images is not unique at
    present and can often cause failures during live migrations with
    attached volumes. This change ensures the name is unique by running
    iscsi-iname again and overwriting the existing name.

    We could potentially do this during the image build process itself but
    given that devstack systems are not supposed to be multi-purpose this
    should be safe to do during the devstack run.

    NOTE(lyarwood): Conflict due to
    If2f74f146a166b9721540aaf3f1f9fce3030525c not being present on
    stable/wallaby.

    Conflicts:
        lib/nova

    Closes-Bug: #1945983
    Change-Id: I9ed26a17858df96c04be9ae52bf2e33e023869a5
    (cherry picked from commit 714826d1a27085ba2384ca495c876588d77f0d27)
    (cherry picked from commit ee629cc77554b18cfca77506d1531f99523d2a58)
    (cherry picked from commit a41fff99b35a2723348c933dc6c37ccf218e280f)
    (cherry picked from commit 43364b7198a90b16a53f6e89d79238bd40a76953)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.