Intermittent "KeyError: 'allocations'" in functional tests

Bug #1696834 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Chris Dent

Bug Description

Seen here:

http://logs.openstack.org/87/472287/1/gate/gate-nova-tox-functional-py35-ubuntu-xenial/79cb96d/console.html#_2017-06-08_17_42_12_495403

2017-06-08 17:42:12.494766 | b'2017-06-08 17:38:24,561 ERROR [nova.compute.manager] Error updating resources for node fake-mini.'
2017-06-08 17:42:12.494781 | b'Traceback (most recent call last):'
2017-06-08 17:42:12.494821 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/compute/manager.py", line 6594, in update_available_resource_for_node'
2017-06-08 17:42:12.494840 | b' rt.update_available_resource(context, nodename)'
2017-06-08 17:42:12.494881 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/compute/resource_tracker.py", line 626, in update_available_resource'
2017-06-08 17:42:12.494900 | b' self._update_available_resource(context, resources)'
2017-06-08 17:42:12.494946 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/.tox/functional-py35/lib/python3.5/site-packages/oslo_concurrency/lockutils.py", line 271, in inner'
2017-06-08 17:42:12.494960 | b' return f(*args, **kwargs)'
2017-06-08 17:42:12.495012 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/compute/resource_tracker.py", line 667, in _update_available_resource'
2017-06-08 17:42:12.495062 | b' self._update_usage_from_instances(context, instances, nodename)'
2017-06-08 17:42:12.495111 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/compute/resource_tracker.py", line 1047, in _update_usage_from_instances'
2017-06-08 17:42:12.495133 | b' self._remove_deleted_instances_allocations(context, cn)'
2017-06-08 17:42:12.495182 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/compute/resource_tracker.py", line 1055, in _remove_deleted_instances_allocations'
2017-06-08 17:42:12.495194 | b' cn.uuid) or {}'
2017-06-08 17:42:12.495234 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/scheduler/client/__init__.py", line 37, in __run_method'
2017-06-08 17:42:12.495256 | b' return getattr(self.instance, __name)(*args, **kwargs)'
2017-06-08 17:42:12.495294 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/scheduler/client/report.py", line 55, in wrapper'
2017-06-08 17:42:12.495318 | b' return f(self, *a, **k)'
2017-06-08 17:42:12.495362 | b' File "/home/jenkins/workspace/gate-nova-tox-functional-py35-ubuntu-xenial/nova/scheduler/client/report.py", line 914, in get_allocations_for_resource_provider'
2017-06-08 17:42:12.495390 | b" return resp.json()['allocations']"
2017-06-08 17:42:12.495403 | b"KeyError: 'allocations'"

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22KeyError%5C%22%20AND%20message%3A%5C%22allocations%5C%22%20AND%20tags%3A%5C%22console%5C%22%20AND%20project%3A%5C%22openstack%2Fnova%5C%22&from=7d

Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Chris Dent (cdent) wrote :

I spent a bit of time looking into this, and though I couldn't figure out why it was happening I got closer. Elsewhere in the logs <http://logs.openstack.org/87/472287/1/gate/gate-nova-tox-functional-py35-ubuntu-xenial/79cb96d/console.html#_2017-06-08_17_42_12_498469> we get things like this:

2017-06-08 17:42:12.498469 | b'2017-06-08 17:38:27,167 ERROR [nova.scheduler.client.report] Failed to retrieve filtered list of resource providers from placement API for filters {\'resources\': \'DISK_GB:20,MEMORY_MB:2048,VCPU:1\'}. Got 300: {"choices": [{"id": "v2.0", "status": "SUPPORTED", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.compute+json;version=2"}], "links": [{"rel": "self", "href": "http://127.0.0.1:41392/v2/resource_providers"}]}, {"id": "v2.1", "status": "CURRENT", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.compute+json;version=2.1"}], "links": [{"rel": "self", "href": "http://127.0.0.1:41392/v2.1/resource_providers"}]}]}.'

That's the output from the nova/api/openstack/compute/versions.py Versions.multi method which a request to the placement service should never see, so somehow things have been confused in the fixtures such that requests that are supposed to be destined for the placement fixture are ending up on the compute fixture. You can see from elsewhere in the logs <http://logs.openstack.org/87/472287/1/gate/gate-nova-tox-functional-py35-ubuntu-xenial/79cb96d/console.html#_2017-06-08_17_42_12_488625> that a request gets a 200 and logs as nova.placement.wsgi.server but then is immediate followed by a 300 and logs as nova.osapi_compute.wsgi.server

wat?

So recording this for posterity in case anyone has some some ideas. The wsgi service fixtures aren't something I'm too familiar with. If I had to guess, eventlet is being odd.

Revision history for this message
Chris Dent (cdent) wrote :

Oh, I forget to mention: the key error is because there is a json body (the weird version doc) but it doesn't have the key...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/483972

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/483972
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d812c7eb3c7f0ce80c3548e9aec10b9fdc650a67
Submitter: Jenkins
Branch: master

commit d812c7eb3c7f0ce80c3548e9aec10b9fdc650a67
Author: Matt Riedemann <email address hidden>
Date: Fri Jul 14 11:40:13 2017 -0400

    Set wsgi.keep_alive=False globally for tests

    This was previously set for all functional API samples
    tests to try and fix a weird issue where requests and responses
    seemed to get mixed up between the compute API wsgi service
    fixture used in the API samples functional tests and the
    Placement API functional tests.

    However, we're also seeing these weird mixup issues in
    non-API samples tests, and there are other wsgi server fixtures
    that should be setting this, like the OSMetadataServer
    fixture. And there are functional tests that don't inherit from
    _IntegratedTestBase so we should just set this globally to be
    safe.

    Related-Bug: #1686109
    Related-Bug: #1696834

    Change-Id: Ie307e4d43be4b3e40dd75384ea6bb77598811713

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is likely fixed by these changes to use wsgi_intercept in our compute API and Placement API fixtures:

https://github.com/openstack/nova/commit/fdf27abf7db233ca51f12e2926d78c272b54935b

https://github.com/openstack/nova/commit/eed6ced78776a3b9a7ada7b0c8ff74eaa376efaf

Changed in nova:
status: Confirmed → Fix Released
assignee: nobody → Chris Dent (cdent)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.