[SRU] Parallel VM creation fails when nova-computes share the disks and each nova-compute node has no cached images.

Bug #973194 reported by Mandar Vaze
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Mandar Vaze
Essex
Fix Released
Undecided
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Fix Released
Undecided
Chuck Short

Bug Description

Scenario:

1. There are two nova-compute Hosts : HostA and HostB
2. NFS server : HostC exports "/shared_instances_path"
3. Both HostA and HostB mount HostC:/shared_instances_path at /shared_instances_path
4. Both HostA and HostB have "instances_path=/shared_instances_path" in their nova.conf
5. HostC:/shared_instances_path is empty to begin with
6. Two VMs are launched from the same image at the same time (Parallel VM creation)
7. Since there are no cached images in <instances_path>/_base folder - both nova-compute hosts try to download kernel and ramdisk images to the same location, same filename.

This seems like problem related to file locking.
Since these are two different compute hosts, @utils.synchronized is not useful.

Expected Response:

Parallel VM Creation as explained above should not fail.

Actual Response:

using same disk area from multiple nova-compute crashes instance images.

Reported on :
Branch: master
git commit : d9019f7aa6e1817d2aabcd59e7dde3d212b4e092

Related branches

CVE References

Revision history for this message
Mandar Vaze (mandarvaze) wrote :

Proposed fix :

instead of using _base folder to store cached image, use "_base_<compute_hostname>" so that in above scenario, images will be cached in _base_HostA and _base_HostB respectively - preventing the file locking issue.

Changed in nova:
assignee: nobody → Mandar Vaze (mandarvaze)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/6262

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/6262
Committed: http://github.com/openstack/nova/commit/647e4584773716d4b0e3fc114cc5db9c550ec078
Submitter: Jenkins
Branch: master

commit 647e4584773716d4b0e3fc114cc5db9c550ec078
Author: Mandar Vaze <email address hidden>
Date: Thu Apr 5 01:33:34 2012 -0700

    Introduced flag base_dir_name. Fixes bug 973194

    rebased from master.

    If user faces locking related problem when two nova-compute hosts
    sharing same disk area via nfs, try to download same image into
    cache concurrently - Then base_dir_name can be set to "_base_$my_ip" in
    nova.conf

    Default value for base_dir_name is "_base" thus retaining existing
    behavior.

    Change-Id: Icff10ed75ba83f7256731614dc9e01e578b347a4

Changed in nova:
status: In Progress → Fix Committed
Mandar Vaze (mandarvaze)
tags: added: essex-backport
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/7269

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/essex)

Reviewed: https://review.openstack.org/7269
Committed: http://github.com/openstack/nova/commit/7028d66ae97c68f888a2bbf2d3b431702f72b4c5
Submitter: Jenkins
Branch: stable/essex

commit 7028d66ae97c68f888a2bbf2d3b431702f72b4c5
Author: Mandar Vaze <email address hidden>
Date: Thu Apr 5 01:33:34 2012 -0700

    Introduced flag base_dir_name. Fixes bug 973194

    rebased from master.

    If user faces locking related problem when two nova-compute hosts
    sharing same disk area via nfs, try to download same image into
    cache concurrently - Then base_dir_name can be set to "_base_$my_ip" in
    nova.conf

    Default value for base_dir_name is "_base" thus retaining existing
    behavior.

    Change-Id: Icff10ed75ba83f7256731614dc9e01e578b347a4

tags: added: in-stable-essex
Devin Carlen (devcamcar)
Changed in nova:
milestone: none → folsom-1
importance: Undecided → Medium
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Chuck Short (zulcss)
affects: glance (Ubuntu) → nova (Ubuntu)
Chuck Short (zulcss)
Changed in nova (Ubuntu):
status: New → In Progress
Changed in nova (Ubuntu Precise):
status: New → In Progress
Revision history for this message
Chuck Short (zulcss) wrote : Re: Parallel VM creation fails when nova-computes share the disks and each nova-compute node has no cached images.

** Impact **

If user faces locking related problem when two nova-compute hosts sharing same disk area via nfs, try to download same image into cache concurrently - Then base_dir_name can be set to "_base_$my_ip" in nova.conf

Default value for base_dir_name is "_base" thus retaining existing behavior.

** Development Fix **

This issue has been address in the development trunk: https://review.openstack.org/6262 and fixed in quantal.

** Stable Fix **

This issue has been address in the stable tree: https://review.openstack.org/7269

** Test Case **

1. There are two nova-compute Hosts : HostA and HostB
2. NFS server : HostC exports "/shared_instances_path"
3. Both HostA and HostB mount HostC:/shared_instances_path at /shared_instances_path
4. Both HostA and HostB have "instances_path=/shared_instances_path" in their nova.conf
5. HostC:/shared_instances_path is empty to begin with
6. Two VMs are launched from the same image at the same time (Parallel VM creation)
7. Since there are no cached images in <instances_path>/_base folder - both nova-compute hosts try to download kernel and ramdisk images to the same location, same filename.

This seems like problem related to file locking.
Since these are two different compute hosts, @utils.synchronized is not useful.

Expected Response:

Parallel VM Creation as explained above should not fail.

Actual Response:

using same disk area from multiple nova-compute crashes instance images.

** Regression **

Minimal this is geared to small systems like ARM machines.

summary: - Parallel VM creation fails when nova-computes share the disks and each
- nova-compute node has no cached images.
+ [SRU] Parallel VM creation fails when nova-computes share the disks and
+ each nova-compute node has no cached images.
Chuck Short (zulcss)
Changed in nova (Ubuntu Precise):
assignee: nobody → Chuck Short (zulcss)
milestone: none → ubuntu-12.04.1
Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Mandar, or anyone else affected,

Accepted nova into precise-proposed. The package will build now and be available in a few hours. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Please find the attached Jenkins job results from the Ubuntu Server Team's CI
infrastructure. As part of the verification process for this bug, Nova has
been deployed and configured across multiple nodes using precise-proposed as
an installation source. After successful bring-up and configuration of the
cluster, a number of exercises and smoke tests have be invoked to ensure the
updated package did not introduce any regressions. A number of test iterations
were carried out to catch any possible transient errors.

Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the
Jenkins links in the comments of the relevant upstream code-review:

https://review.openstack.org/7269

As per the provisional Micro Release Exception granted to this package by
the Technical Board, we hope this contributes toward verification of this
update.

Dave Walker (davewalker)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2012.1+stable~20120612-3ee026e-0ubuntu1

---------------
nova (2012.1+stable~20120612-3ee026e-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1010473)
  * Dropped, superseeded by new snapshot:
    - debian/patches/upstream/0001-fix-bug-where-nova-ignores-glance-host-in-imageref.patch
    - debian/patches/upstream/0002-Stop-libvirt-test-from-deleting-instances-dir.patch
    - debian/patches/upstream/0003-Allow-unprivileged-RADOS-users-to-access-rbd-volumes.patch
    - debian/patches/upstream/0004-Fixed-bug-962840-added-a-test-case.patch
    - debian/patches/upstream/0005-Populate-image-properties-with-project_id-again.patch
    - debian/patches/upstream/0006-Use-project_id-in-ec2.cloud._format_image.patc
    - debian/patches/CVE-2012-2101.patch
    - debian/patches/CVE-2012-2654.patch
  * Resynchronize with stable/essex:
    - 3ee026e Only invoke .lower() on non-None protocols. (LP: #1010514)
    - f0a9f47 Create a utf8 version of the dns_domains table. (LP: #993663)
    - 84a43e1 Report memory correctly on Xen. (LP: #997014)
    - 8c72924 Add libvirt get_console_output tests: pty and file. (LP: #990237)
    - 4e423cd Fix Multi_Scheduler to process host capabilities. (LP: #1000403)
    - 4aea7f1 Nail pep8 dependencies to 1.0.1
    - 2b3bbc4 handle updated qemu-img info output. (LP: #1000261)
    - 2d7d51c Fix type of snapshot_id column to match db. (LP: #962615)
    - ec70c69 Generate a Changelog for Nova
    - e5e890f Fix nova.tests.test_nova_rootwrap on Fedora 17. (LP: #992916)
    - 9e9a554 Ec2 handle strings with "0x" (LP: #983206)
    - 26dc6b7 QuantumManager will start dnsmasq during startup. Fixes (LP: #977759)
    - 7028d66 Introduced flag base_dir_name. (LP: #973194)
    - 76b525a Get unit tests functional in OS X.
    - facb936 Update KillFilter to handle 'deleted' exe's. (LP: #967931)
    - 1209af4 Checks if value is string or not before decode. (LP: #952176)
    - 1209af4 Fix timeout in EC2 CloudController.create_image(). (LP: #989764)
    - 108e74b Re-add console_log from console_console_output(). (LP: #987335)
    - 48a0768 Don't leak RPC connections on timeouts or other exceptions. (LP: #968843)
    - 7c64de9 Cloudpipe tap vpn not always working. (LP: #975043)
    - 5ab5051 add libvirt_inject_key flag fix (LP: #971640)
    - 6c68ef5 Xen: Pass session to destroy_vdi. (LP: #988615)
    - 015744e Delete fixed_ips when network is deleted. (LP: #754900)
  * Add debian/scripts/changelog.sh to help generate the changelog.
  * Add debian/nova-common.docs:
    - Include changelog and README.rst
  * debian/rules: Generate a tarball from git snapshot.
  * debian/patches/fix-pep8-errors.patch: Fix pep8 errors due to pep8 upstream
    migration.
 -- Chuck Short <email address hidden> Tue, 05 Jun 2012 09:50:59 -0400

Changed in nova (Ubuntu Precise):
status: Fix Committed → Fix Released
Chuck Short (zulcss)
Changed in nova (Ubuntu):
status: In Progress → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.