Resizing a pinned VM results in inconsistent state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Stephen Finucane | ||
Mitaka |
Fix Released
|
Undecided
|
Stephen Finucane |
Bug Description
It appears that executing certain resize operations on a pinned instance results in inconsistencies in the "state machine" that Nova uses to track instances. This was identified using Tempest and manifests itself in failures in follow up shelve/unshelve operations.
---
# Steps
Testing was conducted on host containing a single-node, Fedora 23-based (4.3.5-
nova flavor-create m1.small_nfv 420 2048 0 2
nova flavor-create m1.medium_nfv 840 4096 0 4
nova flavor-key 420 set "hw:numa_nodes=2"
nova flavor-key 840 set "hw:numa_nodes=2"
nova flavor-key 420 set "hw:cpu_
nova flavor-key 840 set "hw:cpu_
cd $TEMPEST_DIR
cp etc/tempest.conf etc/tempest.
sed -i "s/flavor_ref = .*/flavor_ref = 420/" etc/tempest.conf
sed -i "s/flavor_ref_alt = .*/flavor_ref_alt = 840/" etc/tempest.conf
Tests were run in the order given below.
1. tempest.
2. tempest.
3. tempest.
4. tempest.
5. tempest.
Like so:
./run_
# Expected Result
The tests should pass.
# Actual Result
+--
| # | test id | status |
+--
| 1 | 1164e700-
| 2 | 77eba8e0-
| 3 | c03aab19-
| 4 | 1164e700-
| 5 | c03aab19-
* this test reports as passing but is actually generating errors. Bad test! :)
One test fails while the other "passes" but raises errors. The failures, where raised, are CPUPinningInvalid exceptions:
CPUPinningI
**NOTE:** I also think there are issues with the non-reverted resize test, though I've yet to investigate this:
* tempest.
What's worse, this error "snowballs" on successive runs. Because of the nature of the failure (a failure to pin/unpin CPUs), we're left with a list of CPUs that Nova thinks to be pinned but which are no longer actually used. This is reflected by the resource tracker.
$ openstack server list
$ cat /opt/stack/
*snip* INFO nova.compute.
The error messages for both are given below, along with examples of this "snowballing" CPU list:
{0} tempest.
Setting instance vm_state to ERROR
Traceback (most recent call last):
File "/opt/stack/
self.
File "/opt/stack/
rv = f(*args, **kwargs)
File "/opt/stack/
quotas.
File "/usr/lib/
self.
File "/usr/lib/
six.
File "/opt/stack/
self.
File "/opt/stack/
rt.
File "/usr/lib/
return f(*args, **kwargs)
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
host_
File "/opt/stack/
newcell.
File "/opt/stack/
pinned=
CPUPinningInvalid: Cannot pin/unpin cpus [0] from the following pinned set [1]
{0} tempest.
Traceback (most recent call last):
File "/opt/stack/
self.
File "/opt/stack/
rv = f(*args, **kwargs)
File "/opt/stack/
quotas.
File "/usr/lib/
self.
File "/usr/lib/
six.
File "/opt/stack/
self.
File "/opt/stack/
rt.
File "/usr/lib/
return f(*args, **kwargs)
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
host_
File "/opt/stack/
newcell.
File "/opt/stack/
pinned=
CPUPinningInvalid: Cannot pin/unpin cpus [1] from the following pinned set [0, 25]
The nth run (n ~= 6):
CPUPinningInvalid: Cannot pin/unpin cpus [24] from the following pinned set [0, 1, 9, 8, 25]
The nth+1 run:
CPUPinningInvalid: Cannot pin/unpin cpus [27] from the following pinned set [0, 1, 24, 25, 8, 9]
The nth+2 run:
CPUPinningInvalid: Cannot pin/unpin cpus [2] from the following pinned set [0, 1, 24, 25, 8, 9, 27]
Changed in nova: | |
status: | New → In Progress |
assignee: | nobody → Stephen Finucane (sfinucan) |
description: | updated |
description: | updated |
description: | updated |
summary: |
- Shelve/unshelve fails for pinned instance + Resizing a pinned VM leaves system in inconsistent state |
summary: |
- Resizing a pinned VM leaves system in inconsistent state + Resizing a pinned VM leaves Nova in inconsistent state |
summary: |
- Resizing a pinned VM leaves Nova in inconsistent state + Resizing a pinned VM results in inconsistent state |
description: | updated |
description: | updated |
Changed in nova: | |
assignee: | Nikola Đipanov (ndipanov) → John Garbutt (johngarbutt) |
Changed in nova: | |
assignee: | John Garbutt (johngarbutt) → Stephen Finucane (stephenfinucane) |
So just by looking at the code - I can see that we probably want to call resource_ tracker. update_ usage() in the shelve_offload method so that we immediately unpin CPUs once that happens and not wait for the RT periodic task which in case of a speedy tempest test likely never happens.
This _might_ be the cause of the stack trace in the cleanup delete in case the spawn failed during unshelving for example.
Here's what would happen:
https:/ /github. com/openstack/ nova/blob/ 7616c88ad3a2769 e9c9ee8a51ac55d deed0bfd84/ nova/compute/ manager. py#L4368
When unshelving a shelve-offloaded instance, we would first run the claim_instance which does a new claim against the current state of the NumaTopology of the compute host (keep in mind that we never dropped the usage when we did shelve-offloading so we are actually leaking resources here). A successful claim here means the instance and compute node are update with the new pinning information (and we leak resources on the compute node until the next RT periodic update).
Now suppose the spawn on the next line fails for whatever reason that may or may not be related to a CPU pinning bug - this calls the __exit__ method of the claim, which in turn calls the abort() method of the claim unpinning the CPUs that were pinned during (see: https:/ /github. com/openstack/ nova/blob/ 7616c88ad3a2769 e9c9ee8a51ac55d deed0bfd84/ nova/compute/ claims. py#L121)
This will unpin the CPUs as tracked by the host NumaTopology, but it will not clear the mapping to host CPUs in the Instance object.
Finally - the test cleanup attempts to delete the instance, which attempts to unpin the already unpinned CPUs and fails.
In case the above makes sense - I think we need to do 2 things:
* Make sure offloading an instance also updates the resource_tracker in the same manner deleting it does immediately.
* Make sure that aborting the claim clears both the host field, and NUMA information of the Instance (host is problematic here as this is why the delete request ends up RPCed to the host instead of being done locally in the API, even though the instance is clearly not there since the claim failed and is being aborted).
I propose we start there and see if it fixes things.
PS - it is worth noting (if it was not clear from the text) that the above bugs impact mostly shelve functionality - nova spawn clears these things as part of the retry process so is not affected.