test_list_migrations_in_flavor_resize_situation fails with NoValidHost - AvailabilityZoneFilter returned 0 hosts

Bug #1675607 reported by Matt Riedemann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith

Bug Description

Seen here:

http://logs.openstack.org/62/449362/1/check/gate-tempest-dsvm-py35-ubuntu-xenial/ec959b4/console.html#_2017-03-24_00_27_47_508109

2017-03-24 00:27:47.508109 | tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_list_migrations_in_flavor_resize_situation[id-1b512062-8093-438e-b47a-37d2f597cd64]
2017-03-24 00:27:47.508149 | ------------------------------------------------------------------------------------------------------------------------------------------------------
2017-03-24 00:27:47.508155 |
2017-03-24 00:27:47.508165 | Captured traceback:
2017-03-24 00:27:47.508175 | ~~~~~~~~~~~~~~~~~~~
2017-03-24 00:27:47.508214 | b'Traceback (most recent call last):'
2017-03-24 00:27:47.508256 | b' File "/opt/stack/new/tempest/tempest/api/compute/admin/test_migrations.py", line 47, in test_list_migrations_in_flavor_resize_situation'
2017-03-24 00:27:47.508277 | b' self.resize_server(server_id, self.flavor_ref_alt)'
2017-03-24 00:27:47.508303 | b' File "/opt/stack/new/tempest/tempest/api/compute/base.py", line 364, in resize_server'
2017-03-24 00:27:47.508327 | b' cls.servers_client.resize_server(server_id, new_flavor_id, **kwargs)'
2017-03-24 00:27:47.508358 | b' File "/opt/stack/new/tempest/tempest/lib/services/compute/servers_client.py", line 279, in resize_server'
2017-03-24 00:27:47.508377 | b" return self.action(server_id, 'resize', **kwargs)"
2017-03-24 00:27:47.508405 | b' File "/opt/stack/new/tempest/tempest/lib/services/compute/servers_client.py", line 191, in action'
2017-03-24 00:27:47.508416 | b' post_body)'
2017-03-24 00:27:47.508442 | b' File "/opt/stack/new/tempest/tempest/lib/common/rest_client.py", line 277, in post'
2017-03-24 00:27:47.508466 | b" return self.request('POST', url, extra_headers, headers, body, chunked)"
2017-03-24 00:27:47.508496 | b' File "/opt/stack/new/tempest/tempest/lib/services/compute/base_compute_client.py", line 48, in request'
2017-03-24 00:27:47.508515 | b' method, url, extra_headers, headers, body, chunked)'
2017-03-24 00:27:47.508541 | b' File "/opt/stack/new/tempest/tempest/lib/common/rest_client.py", line 666, in request'
2017-03-24 00:27:47.508557 | b' self._error_checker(resp, resp_body)'
2017-03-24 00:27:47.508585 | b' File "/opt/stack/new/tempest/tempest/lib/common/rest_client.py", line 777, in _error_checker'
2017-03-24 00:27:47.508605 | b' raise exceptions.BadRequest(resp_body, resp=resp)'
2017-03-24 00:27:47.508622 | b'tempest.lib.exceptions.BadRequest: Bad request'
2017-03-24 00:27:47.508649 | b"Details: {'code': 400, 'message': 'No valid host was found. No valid host found for resize'}"
2017-03-24 00:27:47.508656 | b''

From the scheduler logs:

http://logs.openstack.org/62/449362/1/check/gate-tempest-dsvm-py35-ubuntu-xenial/ec959b4/logs/screen-n-sch.txt.gz#_2017-03-24_00_00_55_499

2017-03-24 00:00:55.498 24173 DEBUG nova.scheduler.filters.availability_zone_filter [req-5060a632-af41-477e-aa2c-bf991703f8db tempest-MigrationsAdminTest-780851299 tempest-MigrationsAdminTest-780851299] Availability Zone 'tempest-test_az-1317288057' requested. (ubuntu-xenial-ovh-bhs1-8052498, ubuntu-xenial-ovh-bhs1-8052498) ram: 6960MB disk: 50176MB io_ops: 5 instances: 5 has AZs: nova host_passes /opt/stack/new/nova/nova/scheduler/filters/availability_zone_filter.py:59
2017-03-24 00:00:55.499 24173 INFO nova.filters [req-5060a632-af41-477e-aa2c-bf991703f8db tempest-MigrationsAdminTest-780851299 tempest-MigrationsAdminTest-780851299] Filter AvailabilityZoneFilter returned 0 hosts
2017-03-24 00:00:55.499 24173 DEBUG nova.filters [req-5060a632-af41-477e-aa2c-bf991703f8db tempest-MigrationsAdminTest-780851299 tempest-MigrationsAdminTest-780851299] Filtering removed all hosts for the request with instance ID 'a0b65d74-e1c4-4c70-ae65-5ee577872919'. Filter results: [('RetryFilter', [('ubuntu-xenial-ovh-bhs1-8052498', 'ubuntu-xenial-ovh-bhs1-8052498')]), ('AvailabilityZoneFilter', None)] get_filtered_objects /opt/stack/new/nova/nova/filters.py:129

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Logstash isn't helping me out to figure out when this started, but if it does then we could know if we have a regression.

Revision history for this message
Matt Riedemann (mriedem) wrote :
tags: added: availability-zones scheduler
Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

The current thinking is that https://github.com/openstack/nova/commit/03b4c67b22f49d325386bc3ebd2ade79b44fa699 might be contributing because an AZ is set on the instance from the compute it lands on whether or not the user asked for an AZ.

So what's probably happening is since this is a single-compute environment, we have a test running concurrently that creates an AZ for the single compute in test A and when test_list_migrations_in_flavor_resize_situation runs, the instance from test_list_migrations_in_flavor_resize_situation lands on the single compute which is in the AZ from test A. When test A cleans up, it removes the AZ. When the instance from test_list_migrations_in_flavor_resize_situation is moved, it has the now deleted AZ associated with it and the scheduler fails to find another compute in the same AZ (because that AZ is now gone).

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at:

http://logs.openstack.org/62/449362/1/check/gate-tempest-dsvm-py35-ubuntu-xenial/ec959b4/console.html#_2017-03-24_00_00_55_563868

There is an AZ test right before the failed test:

2017-03-24 00:00:55.563868 | {2} tempest.api.compute.admin.test_aggregates.AggregatesAdminTestJSON.test_aggregate_create_update_with_az [0.578035s] ... ok
2017-03-24 00:00:55.673002 | {0} tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_list_migrations_in_flavor_resize_situation [7.590200s] ... FAILED

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/449640

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/449690

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/449640
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8c3aa8df249f02b6e14079f073870f8f0b6816cb
Submitter: Jenkins
Branch: master

commit 8c3aa8df249f02b6e14079f073870f8f0b6816cb
Author: Dan Smith <email address hidden>
Date: Fri Mar 24 07:02:29 2017 -0700

    Remove legacy regeneration of RequestSpec in MigrationTask

    Previously we regenerated the RequestSpec from details in the Instance
    before we had the full original object available to us. That is no
    longer necessary and means we will not honor some of the original
    request. Remove that now.

    Change-Id: I195d389ac59574724a5e7202ba1a17d92c53a676
    Closes-Bug: #1675607

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b1

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/449690
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/571265

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/571265
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a638685c469da18dcf8f4bc4763763d90e50be17
Submitter: Zuul
Branch: master

commit a638685c469da18dcf8f4bc4763763d90e50be17
Author: Matt Riedemann <email address hidden>
Date: Wed May 30 13:40:35 2018 -0400

    Add functional test for AggregateMultiTenancyIsolation + migrate

    A bug was reported against Ocata where a non-admin user
    creates a server and the user's project is isolated to a
    set of hosts via the AggregateMultiTenancyIsolation filter.

    The admin, with a different project, cold migrates the server
    and the filter rejects the request because before change
    I195d389ac59574724a5e7202ba1a17d92c53a676 the cold migrate
    task would re-generate the RequestSpec using the request context
    which was from the admin, not the owner of the instance.

    Even though this is not a problem past Ocata, we did not have
    functional test coverage for this scenario so it is added here.

    This will also be used to backport the fix to Ocata to show
    the regression and fix in that branch.

    Change-Id: I97559607fc720fb98c3543ff3dd6095281752cd4
    Related-Bug: #1774205
    Related-Bug: #1675607

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.