Resizing a pinned VM results in inconsistent state

Bug #1545675 reported by Stephen Finucane
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Stephen Finucane
Mitaka
Fix Released
Undecided
Stephen Finucane

Bug Description

It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations.

---

# Steps

Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-300.fc23.x86_64) OpenStack instance (built with DevStack). The '12d224e' commit of Nova was used. The Tempest tests (commit 'e913b82') were run using modified flavors, as seen below:

    nova flavor-create m1.small_nfv 420 2048 0 2
    nova flavor-create m1.medium_nfv 840 4096 0 4
    nova flavor-key 420 set "hw:numa_nodes=2"
    nova flavor-key 840 set "hw:numa_nodes=2"
    nova flavor-key 420 set "hw:cpu_policy=dedicated"
    nova flavor-key 840 set "hw:cpu_policy=dedicated"

    cd $TEMPEST_DIR
    cp etc/tempest.conf etc/tempest.conf.orig
    sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf
    sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf

Tests were run in the order given below.

1. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
2. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server
3. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
4. tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
5. tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server

Like so:

    ./run_tempest.sh -- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance

# Expected Result

The tests should pass.

# Actual Result

    +---+--------------------------------------+--------+
    | # | test id | status |
    +---+--------------------------------------+--------+
    | 1 | 1164e700-0af0-4a4c-8792-35909a88743c | ok |
    | 2 | 77eba8e0-036e-4635-944b-f7a8f3b78dc9 | ok |
    | 3 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok |
    | 4 | 1164e700-0af0-4a4c-8792-35909a88743c | FAIL |
    | 5 | c03aab19-adb1-44f5-917d-c419577e9e68 | ok* |

* this test reports as passing but is actually generating errors. Bad test! :)

One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions:

    CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25]

**NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this:

* tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm

What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker.

    $ openstack server list

    $ cat /opt/stack/logs/screen/n-cpu.log | grep 'Total usable vcpus' | tail -1
    *snip* INFO nova.compute.resource_tracker [*snip*] Total usable vcpus: 40, total allocated vcpus: 8

The error messages for both are given below, along with examples of this "snowballing" CPU list:

{0} tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance [36.713046s] ... FAILED

 Setting instance vm_state to ERROR
 Traceback (most recent call last):
   File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance
     self._delete_instance(context, instance, bdms, quotas)
   File "/opt/stack/nova/nova/hooks.py", line 149, in inner
     rv = f(*args, **kwargs)
   File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance
     quotas.rollback()
   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
     self.force_reraise()
   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
     six.reraise(self.type_, self.value, self.tb)
   File "/opt/stack/nova/nova/compute/manager.py", line 2432, in _delete_instance
     self._update_resource_tracker(context, instance)
   File "/opt/stack/nova/nova/compute/manager.py", line 751, in _update_resource_tracker
     rt.update_usage(context, instance)
   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
     return f(*args, **kwargs)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 376, in update_usage
     self._update_usage_from_instance(context, instance)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 863, in _update_usage_from_instance
     self._update_usage(instance, sign=sign)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 705, in _update_usage
     self.compute_node, usage, free)
   File "/opt/stack/nova/nova/virt/hardware.py", line 1441, in get_host_numa_usage_from_instance
     host_numa_topology, instance_numa_topology, free=free))
   File "/opt/stack/nova/nova/virt/hardware.py", line 1307, in numa_usage_from_instances
     newcell.unpin_cpus(pinned_cpus)
   File "/opt/stack/nova/nova/objects/numa.py", line 93, in unpin_cpus
     pinned=list(self.pinned_cpus))
 CPUPinningInvalid: Cannot pin/unpin cpus [0] from the following pinned set [1]

{0} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_shelve_unshelve_server [29.131132s] ... ok

 Traceback (most recent call last):
   File "/opt/stack/nova/nova/compute/manager.py", line 2474, in do_terminate_instance
     self._delete_instance(context, instance, bdms, quotas)
   File "/opt/stack/nova/nova/hooks.py", line 149, in inner
     rv = f(*args, **kwargs)
   File "/opt/stack/nova/nova/compute/manager.py", line 2437, in _delete_instance
     quotas.rollback()
   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
     self.force_reraise()
   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
     six.reraise(self.type_, self.value, self.tb)
   File "/opt/stack/nova/nova/compute/manager.py", line 2432, in _delete_instance
     self._update_resource_tracker(context, instance)
   File "/opt/stack/nova/nova/compute/manager.py", line 751, in _update_resource_tracker
     rt.update_usage(context, instance)
   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
     return f(*args, **kwargs)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 376, in update_usage
     self._update_usage_from_instance(context, instance)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 863, in _update_usage_from_instance
     self._update_usage(instance, sign=sign)
   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 705, in _update_usage
     self.compute_node, usage, free)
   File "/opt/stack/nova/nova/virt/hardware.py", line 1441, in get_host_numa_usage_from_instance
     host_numa_topology, instance_numa_topology, free=free))
   File "/opt/stack/nova/nova/virt/hardware.py", line 1307, in numa_usage_from_instances
     newcell.unpin_cpus(pinned_cpus)
   File "/opt/stack/nova/nova/objects/numa.py", line 93, in unpin_cpus
     pinned=list(self.pinned_cpus))
 CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25]

The nth run (n ~= 6):

CPUPinningInvalid: Cannot pin/unpin cpus [24] from the following pinned set [0, 1, 9, 8, 25]

The nth+1 run:

CPUPinningInvalid: Cannot pin/unpin cpus [27] from the following pinned set [0, 1, 24, 25, 8, 9]

The nth+2 run:

CPUPinningInvalid: Cannot pin/unpin cpus [2] from the following pinned set [0, 1, 24, 25, 8, 9, 27]

Changed in nova:
status: New → In Progress
assignee: nobody → Stephen Finucane (sfinucan)
description: updated
description: updated
description: updated
summary: - Shelve/unshelve fails for pinned instance
+ Resizing a pinned VM leaves system in inconsistent state
summary: - Resizing a pinned VM leaves system in inconsistent state
+ Resizing a pinned VM leaves Nova in inconsistent state
summary: - Resizing a pinned VM leaves Nova in inconsistent state
+ Resizing a pinned VM results in inconsistent state
description: updated
description: updated
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

So just by looking at the code - I can see that we probably want to call resource_tracker.update_usage() in the shelve_offload method so that we immediately unpin CPUs once that happens and not wait for the RT periodic task which in case of a speedy tempest test likely never happens.

This _might_ be the cause of the stack trace in the cleanup delete in case the spawn failed during unshelving for example.

Here's what would happen:

https://github.com/openstack/nova/blob/7616c88ad3a2769e9c9ee8a51ac55ddeed0bfd84/nova/compute/manager.py#L4368

When unshelving a shelve-offloaded instance, we would first run the claim_instance which does a new claim against the current state of the NumaTopology of the compute host (keep in mind that we never dropped the usage when we did shelve-offloading so we are actually leaking resources here). A successful claim here means the instance and compute node are update with the new pinning information (and we leak resources on the compute node until the next RT periodic update).

Now suppose the spawn on the next line fails for whatever reason that may or may not be related to a CPU pinning bug - this calls the __exit__ method of the claim, which in turn calls the abort() method of the claim unpinning the CPUs that were pinned during (see: https://github.com/openstack/nova/blob/7616c88ad3a2769e9c9ee8a51ac55ddeed0bfd84/nova/compute/claims.py#L121)

This will unpin the CPUs as tracked by the host NumaTopology, but it will not clear the mapping to host CPUs in the Instance object.

Finally - the test cleanup attempts to delete the instance, which attempts to unpin the already unpinned CPUs and fails.

In case the above makes sense - I think we need to do 2 things:

* Make sure offloading an instance also updates the resource_tracker in the same manner deleting it does immediately.
* Make sure that aborting the claim clears both the host field, and NUMA information of the Instance (host is problematic here as this is why the delete request ends up RPCed to the host instead of being done locally in the API, even though the instance is clearly not there since the claim failed and is being aborted).

I propose we start there and see if it fixes things.

PS - it is worth noting (if it was not clear from the text) that the above bugs impact mostly shelve functionality - nova spawn clears these things as part of the retry process so is not affected.

Revision history for this message
Andrew Laski (alaski) wrote :

As someone with familiarity with the shelve/unshelve process that analysis makes sense to me. Updating resource usage on offload is something that should be done since resources are not in use, or "reserved", at that point. Clearing the host on a failed claim makes sense as well since as was pointed out there is nothing on the host at that point and therefore no real association with that host.

Something else to note is that shelving code leverages much of what resize did at the time it was written, so any issues discovered in that path may also affect resize. Though Tempest tests may not be resizing instances so shelving tests tend to find issues first.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/280914

Changed in nova:
assignee: Stephen Finucane (sfinucan) → Nikola Đipanov (ndipanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/281482

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/281483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/280914
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a54997c62f1422c592dbabc28ea71d0a34e70b74
Submitter: Jenkins
Branch: master

commit a54997c62f1422c592dbabc28ea71d0a34e70b74
Author: Nikola Dipanov <email address hidden>
Date: Tue Feb 16 19:14:50 2016 +0000

    RT: Decrese usage for offloaded instances

    Allow for update_usage to consider SHELVED_OFFLOADED instances as
    removed and update the resource usage accordingly. This means we want to
    make sure that the stats class does the same.

    We can now make sure that RT is updated immediately once the instance
    has been shelved offloaded.

    Change-Id: Ia22963021995c71758a18b21070dfdf6a950da09
    Partial-bug: #1545675

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Set to High because it seems to be impacting the NFV CI

Changed in nova:
importance: Undecided → Medium
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/281482
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=521decbe9f79e337fb14571367c6008969a7133a
Submitter: Jenkins
Branch: master

commit 521decbe9f79e337fb14571367c6008969a7133a
Author: Nikola Dipanov <email address hidden>
Date: Wed Feb 17 19:24:06 2016 +0000

    objects: Allow instance to reset the NUMA topology

    This will be used when aborting claims, since the claim will (if
    successful) update the instance with the claimed topology,

    In case of an abort - we want to make sure it's clear for consistency
    sake. This will be done in the follow up patch.

    Change-Id: I2989446fdaa44a30d90ae2ab29fc27fb2ad03c4a
    Partial-bug: 1545675

Changed in nova:
assignee: Nikola Đipanov (ndipanov) → John Garbutt (johngarbutt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/281483
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c7a6673fd5621d1c121c20376634ec49644fae59
Submitter: Jenkins
Branch: master

commit c7a6673fd5621d1c121c20376634ec49644fae59
Author: Nikola Dipanov <email address hidden>
Date: Wed Feb 17 19:27:36 2016 +0000

    RT: aborting claims clears instance host and NUMA info

    When the claim is aborted, this information is no longer correct for the
    instance, so we clear it to avoid inconsistencies.

    Change-Id: I83a5f06adb22c21392d5fc867728181ea4b0454d
    Resolves-bug: 1545675

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/289342

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/289972

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/289972
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d969169022b32fc98a0a7af4891e75bc072269c6
Submitter: Jenkins
Branch: master

commit d969169022b32fc98a0a7af4891e75bc072269c6
Author: Stephen Finucane <email address hidden>
Date: Tue Mar 8 15:24:18 2016 +0000

    Address nits in Ia2296302

    Be consistent in which functions are called in tests to reinforce
    what's being checked.

    Change-Id: I54f0abda8db65d633189b31684467bae87016738
    Related-Bug: #1545675

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/289342
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0469f71181c724480f3b3f12cb7eea97321ca79e
Submitter: Jenkins
Branch: master

commit 0469f71181c724480f3b3f12cb7eea97321ca79e
Author: Stephen Finucane <email address hidden>
Date: Mon Mar 7 13:40:21 2016 +0000

    Address nits in I83a5f06ad

    Make functions a little more coherent by ensuring they do one thing and
    one thing only. Similarly ensure we don't use "private" variables in
    other files.

    Change-Id: Ib916e01f655d59bcc7cd0932d909e56bb1d4af8a
    Related-Bug: #1545675

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/332243

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/323269
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f1320a7c2debf127a93773046adffb80563fd20b
Submitter: Jenkins
Branch: master

commit f1320a7c2debf127a93773046adffb80563fd20b
Author: Stephen Finucane <email address hidden>
Date: Mon May 30 16:03:35 2016 +0100

    Evaluate 'task_state' in resource (de)allocation

    There are two types of VM states associated with shelving. The first,
    'shelved' indicates that the VM has been powered off but the resources
    remain allocated on the hypervisor. The second, 'shelved_offloaded',
    indicates that the VM has been powered off and the resources freed.
    When "unshelving" VMs in the latter state, the VM state does not change
    from 'shelved_offloaded' until some time after the VM has been
    "unshelved".

    Change I83a5f06 introduced a change that allowed for deallocation of
    resources when they were set to the 'shelved_offloaded' state. However,
    the resource (de)allocation code path assumes any VM with a state of
    'shelved_offloaded' should have resources deallocated from it, rather
    than allocated to it. As the VM state has not changed when this code
    path is executed, resources are incorrectly deallocated from the
    instance twice.

    Enhance the aformentioned check to account for task state in addition to
    VM state. This ensures a VM that's still in 'shelved_offloaded' state,
    but is in fact being unshelved, does not trigger deallocation.

    Change-Id: Ie2e7b91937fc3d61bb1197fffc3549bebc65e8aa
    Signed-off-by: Stephen Finucane <email address hidden>
    Resolves-bug: #1587386
    Related-bug: #1545675

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/332243
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=59f55a14b5f5396e653d9eb1e5ded31db72f620b
Submitter: Jenkins
Branch: master

commit 59f55a14b5f5396e653d9eb1e5ded31db72f620b
Author: Stephen Finucane <email address hidden>
Date: Tue Jun 21 15:47:50 2016 +0100

    Don't immediately null host/node when shelving

    When offloading a shelved instance, resources should be freed. However,
    the ability to free resources is dependant on being able to find the
    resource tracker for an instance's node. At present, the instance node
    and host are nulled before attempting to update the resource tracker,
    meaning the resources are never actually freed. Fix this by nullifying
    these values *after* resources updates.

    Change-Id: I8f91367aacca0c7c673b28b3c844c70c0d12f0a5
    Related-bug: #1545675

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337108

Matt Riedemann (mriedem)
Changed in nova:
assignee: John Garbutt (johngarbutt) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/337107
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2703a3d80bcbd49eaafaae624289d00c521b5192
Submitter: Jenkins
Branch: stable/mitaka

commit 2703a3d80bcbd49eaafaae624289d00c521b5192
Author: Stephen Finucane <email address hidden>
Date: Mon May 30 16:03:35 2016 +0100

    Evaluate 'task_state' in resource (de)allocation

    There are two types of VM states associated with shelving. The first,
    'shelved' indicates that the VM has been powered off but the resources
    remain allocated on the hypervisor. The second, 'shelved_offloaded',
    indicates that the VM has been powered off and the resources freed.
    When "unshelving" VMs in the latter state, the VM state does not change
    from 'shelved_offloaded' until some time after the VM has been
    "unshelved".

    Change I83a5f06 introduced a change that allowed for deallocation of
    resources when they were set to the 'shelved_offloaded' state. However,
    the resource (de)allocation code path assumes any VM with a state of
    'shelved_offloaded' should have resources deallocated from it, rather
    than allocated to it. As the VM state has not changed when this code
    path is executed, resources are incorrectly deallocated from the
    instance twice.

    Enhance the aformentioned check to account for task state in addition to
    VM state. This ensures a VM that's still in 'shelved_offloaded' state,
    but is in fact being unshelved, does not trigger deallocation.

    Change-Id: Ie2e7b91937fc3d61bb1197fffc3549bebc65e8aa
    Signed-off-by: Stephen Finucane <email address hidden>
    Resolves-bug: #1587386
    Related-bug: #1545675
    (cherry picked from commit f1320a7c2debf127a93773046adffb80563fd20b)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/337108
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c455c82dac32f5a0313077ec3bd8aca1f7db1bcb
Submitter: Jenkins
Branch: stable/mitaka

commit c455c82dac32f5a0313077ec3bd8aca1f7db1bcb
Author: Stephen Finucane <email address hidden>
Date: Tue Jun 21 15:47:50 2016 +0100

    Don't immediately null host/node when shelving

    When offloading a shelved instance, resources should be freed. However,
    the ability to free resources is dependant on being able to find the
    resource tracker for an instance's node. At present, the instance node
    and host are nulled before attempting to update the resource tracker,
    meaning the resources are never actually freed. Fix this by nullifying
    these values *after* resources updates.

    Change-Id: I8f91367aacca0c7c673b28b3c844c70c0d12f0a5
    Related-bug: #1545675
    (cherry picked from commit 59f55a14b5f5396e653d9eb1e5ded31db72f620b)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.