Comment 3 for bug 1714924

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at jobs where this shows up in CI:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20clean%20allocation%20of%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

For example:

http://logs.openstack.org/26/577926/1/check/legacy-grenade-dsvm-neutron-multinode-live-migration/13d3684/logs/new/screen-n-cpu.txt#_2018-06-25_21_31_21_929

It's only showing up in stable/pike live migration + grenade jobs and likely due to a race when the resource tracker runs and sees there is an ocata compute so it does auto-heal for allocations and removes the allocations from the other host:

http://logs.openstack.org/26/577926/1/check/legacy-grenade-dsvm-neutron-multinode-live-migration/13d3684/logs/new/screen-n-cpu.txt#_2018-06-25_21_31_21_201

2018-06-25 21:31:21.201 21212 DEBUG nova.compute.resource_tracker [req-e64b2af6-e08b-49c6-b9c5-82d6b9f2454a tempest-LiveMigrationRemoteConsolesV26Test-1813726258 tempest-LiveMigrationRemoteConsolesV26Test-1813726258] We're on a compute host from Nova version >=16 (Pike or later) in a deployment with at least one compute host version <16 (Ocata or earlier). Will auto-correct allocations to handle Ocata-style assumptions. _update_usage_from_instances /opt/stack/new/nova/nova/compute/resource_tracker.py:1204

The only time the error shows up on master branch jobs is in actual failures, so it's probably not worth investigating this on master for live migration since we have migration-based allocations since Queens:

https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html

And in stable/queens jobs we wouldn't have any <Pike computes to trigger the auto-heal code because the scheduler in pike handles allocations, not the computes/resource tracker.

I'm not sure if this is still an issue for evacuate and restoring an evacuated node.