Attach/Detach encrypted volume problems with real paths

Bug #1703954 reported by Gorka Eguileor
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Gorka Eguileor
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned
os-brick
Fix Released
High
Unassigned

Bug Description

OS-Brick on 1.14 and 1.15 returns real paths instead of returning symbolic links, which results in the encryption attach_volume call replacing the real device with a link to the crypt dm.

The issue comes from the Nova flow when attaching an encrypted volume:

1- Attach volume
2- Generate libvirt configuration with path from step 1
3- Encrypt attach volume

Since step 2 has already generated the config with the path from step 1 then step 3 must preserve this path.

When step 1 returns a symbolic link we just forcefully replace it with a link to the crypt dm and everything is OK, but when we return a real path it does the same thing, which means we'll be replacing for example /dev/sda with a symlink, which will then break the detach process, and all future attachments.

If flow order was changed to be 1, 3, 2 then the encrypt attach volume could give a different path to be used for the libvirt config generation.

Gorka Eguileor (gorka)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/483068

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/483069

Eric Harney (eharney)
Changed in os-brick:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (master)

Reviewed: https://review.openstack.org/483069
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=f341e9c3ed86d15b599c4547e783d1f9da011cdb
Submitter: Jenkins
Branch: master

commit f341e9c3ed86d15b599c4547e783d1f9da011cdb
Author: Gorka Eguileor <email address hidden>
Date: Wed Jul 12 19:55:20 2017 +0200

    Return symlinks for encrypted volumes

    When connecting encrypted volumes we need to return a symbolink link or
    we will break all future attachments after detaching the volume.

    OS-Brick on 1.14 and 1.15 returns real paths instead of returning symbolic
    links, which results in the encryption attach_volume call replacing the
    real device with a link to the crypt dm.

    The issue comes from the Nova flow when attaching an encrypted volume:

    1- Attach volume
    2- Generate libvirt configuration with path from step 1
    3- Encrypt attach volume

    Since step 2 has already generated the config with the path from step 1 then
    step 3 must preserve this path.

    When step 1 returns a symbolic link we just forcefully replace it with a link
    to the crypt dm and everything is OK, but when we return a real path it
    does the same thing, which means we'll be replacing for example /dev/sda
    with a symlink, which will then break the detach process, and all future
    attachments.

    Until Nova, Cinder, and OS-Brick are changed to have a different flow
    (1, 3, 2) we need a workaround to make it work.

    The workaround this patch introduces is to return a symbolic link when
    the volume is encrypted.

    It will try to return the symlink that always exists, but if it's not
    there it will just look for ANY link to the device in '/dev/disk/by-id'.

    Related-Bug: #1703954
    Change-Id: If4461c3dcd67e5d948be9d9a3643c1eb81aaace9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/483068
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=66520dcf6c1815f555eed4a241edf475d6ed0fee
Submitter: Jenkins
Branch: master

commit 66520dcf6c1815f555eed4a241edf475d6ed0fee
Author: Gorka Eguileor <email address hidden>
Date: Wed Jul 12 20:04:57 2017 +0200

    Return WWN in multipath_id

    When we refactored the iSCSI connect mechanism we inadvertently changed
    the value returned for the "multipath_id" key.

    This patch fixes this and return the WWN as the value again.

    This value is used by the encryption mechanism and expects it to be the
    WWN.

    Related-Bug: #1703954
    Change-Id: Ia6d96a1e3a71488b44b3ca2323610a8f0a7cf675

Revision history for this message
Sean Dague (sdague) wrote :

The patch was recently merged, is this still an actionable Nova bug that is going to require a Nova patch?

Changed in nova:
status: New → Incomplete
Revision history for this message
Gorka Eguileor (gorka) wrote :

The code in os-brick has a workaround for the unexpected flow that Nova seems to have, so it will work, but the optimal solution would be to fix this flow in Nova.

Gorka Eguileor (gorka)
Changed in os-brick:
status: New → Fix Released
Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528694

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/528694
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=3b01eb78ce1c1ce388521f8122b3d6b8f2686fda
Submitter: Zuul
Branch: master

commit 3b01eb78ce1c1ce388521f8122b3d6b8f2686fda
Author: Gorka Eguileor <email address hidden>
Date: Fri Dec 15 12:09:41 2017 +0100

    Fix create encrypted volume from image

    If we run two create encrypted volume from image serially (waiting for
    one operation to complete before the making the next request) the first
    operation succeeds while the second one will fail during the attach
    phase.

    The failure we see in the logs indicate that OS-Brick is unable to get
    the WWN from the connected volume.

    In reality the failure to attach the volume on the second operation is
    caused by a failed cleanup of the first create, even if there's no error
    in the logs.

    The reason why OS-Brick didn't correctly do the cleanup of the first
    volume is because didn't know that it was attaching an encrypted volume
    and returned a real path (ie: /dev/sdf) that then go overwritten by the
    unencrypted device mapper.

    The solution is letting OS-Brick know that it's an encrypted volume
    during the attachment phase so it will return a symlink to the real
    device instead of the real device itself.

    The manager's "initialize_connection" does this correctly, but the
    method creating the volume from an image doesn't use the method of that
    class and calls the driver's "initialize_connection" method instead,
    which is missing the "encrypted" field.

    Closes-Bug: #1703954
    Change-Id: I35db18f36f86683f2ab16694c9787c908251a382

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/529045

Revision history for this message
Jonathan Mills (jonmills-t) wrote :

I can 100% confirm this is still not working. I am running fully patched CentOS 7.4 with all of the very latest available Pike RPMs from the CentOS Cloud Pike repos. I can reproduce this again and again.

# rpm -qa |egrep '(nova|cinder|os-brick)'|sort
openstack-cinder-11.0.2-1.el7.noarch
openstack-cinder-doc-11.0.2-1.el7.noarch
openstack-nova-common-16.0.3-2.el7.noarch
openstack-nova-compute-16.0.3-2.el7.noarch
python2-cinderclient-3.1.0-1.el7.noarch
python2-novaclient-9.1.1-1.el7.noarch
python2-os-brick-1.15.4-1.el7.noarch
python-cinder-11.0.2-1.el7.noarch
python-nova-16.0.3-2.el7.noarch

Revision history for this message
Jonathan Mills (jonmills-t) wrote :

Here is an empty, encrypted volume prior to attachment:

-rw-rw----. 1 root qemu 4294967296 Jan 12 12:37 volume-14c92c19-5c3d-480e-8c19-c70c9304ebce

Then, during attachment, nova-compute does the following:

2018-01-12 12:39:36.357 215450 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ln --symbolic --force /dev/mapper/crypt-volume-14c92c19-5c3d-480e-8c19-c70c9304ebce /gpfs/fs_adapt2/openstack/cinder/volume-14c92c19-5c3d-480e-8c19-c70c9304ebce execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:367

Results end up looking like:

lrwxrwxrwx. 1 root qemu 61 Jan 12 12:39 volume-14c92c19-5c3d-480e-8c19-c70c9304ebce -> /dev/mapper/crypt-volume-14c92c19-5c3d-480e-8c19-c70c9304ebce

Thus, after volume detachment, your /dev/mapper device is gone, and so is your volume and data

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/529045
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=cfe0cf93d7de384f81597214336c8d0d92dca233
Submitter: Zuul
Branch: stable/pike

commit cfe0cf93d7de384f81597214336c8d0d92dca233
Author: Gorka Eguileor <email address hidden>
Date: Fri Dec 15 12:09:41 2017 +0100

    Fix create encrypted volume from image

    If we run two create encrypted volume from image serially (waiting for
    one operation to complete before the making the next request) the first
    operation succeeds while the second one will fail during the attach
    phase.

    The failure we see in the logs indicate that OS-Brick is unable to get
    the WWN from the connected volume.

    In reality the failure to attach the volume on the second operation is
    caused by a failed cleanup of the first create, even if there's no error
    in the logs.

    The reason why OS-Brick didn't correctly do the cleanup of the first
    volume is because didn't know that it was attaching an encrypted volume
    and returned a real path (ie: /dev/sdf) that then go overwritten by the
    unencrypted device mapper.

    The solution is letting OS-Brick know that it's an encrypted volume
    during the attachment phase so it will return a symlink to the real
    device instead of the real device itself.

    The manager's "initialize_connection" does this correctly, but the
    method creating the volume from an image doesn't use the method of that
    class and calls the driver's "initialize_connection" method instead,
    which is missing the "encrypted" field.

    Closes-Bug: #1703954
    Change-Id: I35db18f36f86683f2ab16694c9787c908251a382
    (cherry picked from commit 3b01eb78ce1c1ce388521f8122b3d6b8f2686fda)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.0.0b3

This issue was fixed in the openstack/cinder 12.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.1.0

This issue was fixed in the openstack/cinder 11.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.