VMwareVCDriver: snapshot failure when host in maintenance mode

Bug #1229994 reported by Vui Lam
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vui Lam
Havana
Fix Released
High
Yaguang Tang
VMwareAPI-Team
In Progress
High
Vui Lam

Bug Description

Image snapshot through the VC cluster driver may fail if, within the datacenter containing the cluster managed by the driver, there are one or more hosts in maintenance mode with access to the datastore containing the disk image snapshot.

A sign that this situation has occurred is the appearance in the nova compute log of an error similar to the following:

2013-08-02 07:10:30.036 WARNING nova.virt.vmwareapi.driver [-] Task [DeleteVirtualDisk_Task] (returnval){
value = "task-228"
_type = "Task"
} status: error The operation is not allowed in the current state.

What this means is that even if all hosts in cluster are running fine in normal mode, a host outside of the cluster going into maintenance mode may
lead to snapshot failure.

The root cause of the problem is due to an issue in VC's handler of the VirtualDiskManager.DeleteVirtualDisk_Task API, which may incorrectly pick a host in maintenance mode to service the disk deletion even though such an operation will be rejected by the host under maintenance.

Tags: vmware
Revision history for this message
Vui Lam (vui) wrote :

I am looking into a more reliable means to perform the disk deletion.

Changed in nova:
assignee: nobody → Vui Lam (vui)
importance: Undecided → High
description: updated
tags: added: vmware
Changed in openstack-vmwareapi-team:
importance: Undecided → High
Vui Lam (vui)
summary: - VMwareVCDriver: host in maintenance mode may cause snapshot failure
+ VMwareVCDriver: snapshot failure when host in maintenance mode
Tracy Jones (tjones-i)
Changed in openstack-vmwareapi-team:
assignee: nobody → Vui Lam (vui)
Mathew Odden (locke105)
Changed in nova:
status: New → Triaged
Mathew Odden (locke105)
Changed in nova:
status: Triaged → Confirmed
status: Confirmed → In Progress
Changed in openstack-vmwareapi-team:
status: New → In Progress
Revision history for this message
Mark McLoughlin (markmc) wrote :
Tracy Jones (tjones-i)
tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/49305
Committed: http://github.com/openstack/nova/commit/7910385825ccfa705785af360fcd5717656e3557
Submitter: Jenkins
Branch: master

commit 7910385825ccfa705785af360fcd5717656e3557
Author: Vui Lam <email address hidden>
Date: Mon Sep 30 11:25:41 2013 -0700

    VMware: fix snapshot failure when host in maintenance mode

    The root cause is due to a bug in the VC's handling of the
    VirtualDiskManager.DeleteVirtualDisk_Task API, which allows the picking
    of any host in a datacenter with access to the datastore participating
    in the disk deletion picked be to perform the operation, even when the
    host is in maintenance mode and hence will always reject the call when
    sent.

    The fix uses an alternative API (FileManager.DeleteDatastoreFile_Task)
    to delete the vmdk and -flat vmdk files separately. This API does not
    suffer from the above-mentioned failure mode.

    Closes-Bug: #1229994

    Change-Id: I786365847673e5192a21b654cba951b2e7a6f291

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/58703

Changed in nova:
milestone: none → icehouse-1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Alan Pevec (apevec)
tags: removed: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/58703
Committed: http://github.com/openstack/nova/commit/96257fd7d5ecd00e0eb84b110e52eb811d268d55
Submitter: Jenkins
Branch: stable/havana

commit 96257fd7d5ecd00e0eb84b110e52eb811d268d55
Author: Vui Lam <email address hidden>
Date: Mon Sep 30 11:25:41 2013 -0700

    VMware: fix snapshot failure when host in maintenance mode

    The root cause is due to a bug in the VC's handling of the
    VirtualDiskManager.DeleteVirtualDisk_Task API, which allows the picking
    of any host in a datacenter with access to the datastore participating
    in the disk deletion picked be to perform the operation, even when the
    host is in maintenance mode and hence will always reject the call when
    sent.

    The fix uses an alternative API (FileManager.DeleteDatastoreFile_Task)
    to delete the vmdk and -flat vmdk files separately. This API does not
    suffer from the above-mentioned failure mode.

    Closes-Bug: #1229994

    (cherry picked from commit 7910385825ccfa705785af360fcd5717656e3557)

    Conflicts:
     nova/virt/vmwareapi/fake.py
     nova/virt/vmwareapi/vmops.py

    Change-Id: I786365847673e5192a21b654cba951b2e7a6f291

Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.