OpenStack Compute (nova)

Bug #1790204
Comment #17

Comment 17 for bug 1790204

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-24: Related fix merged to nova (master)

#17

Reviewed: https://review.openstack.org/638700
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4363b10f5b9eaa7be2df36a94b6bbad5f4674c57
Submitter: Zuul
Branch: master

commit 4363b10f5b9eaa7be2df36a94b6bbad5f4674c57
Author: Matt Riedemann <email address hidden>
Date: Fri Feb 22 11:04:14 2019 -0500

Remove misleading code from _move_operation_alloc_request()

    Change I1c9442eed850a3eb7ac9871fafcb0ae93ba8117c in Pike
    made the scheduler "double up" the old_flavor and new_flavor
    allocations when resizing to the same host. If there were
    Ocata computes still in the deployment at the time, which would
    be normal in Pike for rolling upgrades, the ResourceTracker
    would overwrite the instance allocations in placement based on
    the instance.flavor when the update_available_resource periodic
    task would run.

    If that periodic ran after scheduling but before finish_resize(),
    the old_flavor would be used to report allocations. If the periodic
    ran after finish_resize(), the new_flavor would be used to report
    allocations.

    That Ocata-compute auto-heal code was removed in Rocky with change
    I39d93dbf8552605e34b9f146e3613e6af62a1774, but should have effectively
    been vestigial since Queens when nova-compute should be at most N-1 so
    there should be no Ocata compute services.

    This change removes the misleading Pike-era code in the
    _move_operation_alloc_request() which sums the allocations for the
    old and new flavor when resizing to the same host since:

    1. The compute service no longer does what the comment says.
    2. Since Queens, conductor swaps the instance-held allocations
       on the source node to the migration record and the
       _move_operation_alloc_request method is only called if the
       instance has allocations, which it won't during resize. So the
       only time _move_operation_alloc_request is called now is during
       an evacuate because conductor doesn't do the allocation swap in
       that case. And since you can't evacuate to the same host, the
       elif block in _move_operation_alloc_request is dead code.

    Note that change I1c9442eed850a3eb7ac9871fafcb0ae93ba8117c was
    effectively a change in behavior for resize to the same host
    because the scheduler sums the old/new flavor resource allocations
    which could result in a false NoValidHost error when in reality
    the total allocation for the instance is going to be the maximum
    of the resource class allocations between the old and new flavor,
    e.g. if the compute has 8 VCPUs total and the instance is using
    6 VCPUs with the old flavor, and then resized to a new flavor with
    8 VCPUs, the scheduler is trying to "claim" 14 VCPUs when really the
    instance will only use 8 and that causes a NoValidHost error.
    Comparing to the pre-Pike scheduling and ResourceTracker.resize_claim
    behavior for resize to the same host, the scheduler would only filter
    on the new flavor resources and the resize_claim would only claim
    based on the new_flavor resources as well. The update_available_resource
    periodic would account for old_flavor usage in
    _update_usage_from_migration() after the instance was resized and
    before it was confirmed/reverted.

Change-Id: I8c6b6c46b2587ee727653dafadbcb08b99ed7d35
Related-Bug: #1790204

Reviewed:  https://review.openstack.org/638700
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4363b10f5b9eaa7be2df36a94b6bbad5f4674c57
Submitter: Zuul
Branch:    master

commit 4363b10f5b9eaa7be2df36a94b6bbad5f4674c57
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Fri Feb 22 11:04:14 2019 -0500

Remove misleading code from _move_operation_alloc_request()
    
    Change I1c9442eed850a3eb7ac9871fafcb0ae93ba8117c in Pike
    made the scheduler "double up" the old_flavor and new_flavor
    allocations when resizing to the same host. If there were
    Ocata computes still in the deployment at the time, which would
    be normal in Pike for rolling upgrades, the ResourceTracker
    would overwrite the instance allocations in placement based on
    the instance.flavor when the update_available_resource periodic
    task would run.
    
    If that periodic ran after scheduling but before finish_resize(),
    the old_flavor would be used to report allocations. If the periodic
    ran after finish_resize(), the new_flavor would be used to report
    allocations.
    
    That Ocata-compute auto-heal code was removed in Rocky with change
    I39d93dbf8552605e34b9f146e3613e6af62a1774, but should have effectively
    been vestigial since Queens when nova-compute should be at most N-1 so
    there should be no Ocata compute services.
    
    This change removes the misleading Pike-era code in the
    _move_operation_alloc_request() which sums the allocations for the
    old and new flavor when resizing to the same host since:
    
    1. The compute service no longer does what the comment says.
    2. Since Queens, conductor swaps the instance-held allocations
       on the source node to the migration record and the
       _move_operation_alloc_request method is only called if the
       instance has allocations, which it won't during resize. So the
       only time _move_operation_alloc_request is called now is during
       an evacuate because conductor doesn't do the allocation swap in
       that case. And since you can't evacuate to the same host, the
       elif block in _move_operation_alloc_request is dead code.
    
    Note that change I1c9442eed850a3eb7ac9871fafcb0ae93ba8117c was
    effectively a change in behavior for resize to the same host
    because the scheduler sums the old/new flavor resource allocations
    which could result in a false NoValidHost error when in reality
    the total allocation for the instance is going to be the maximum
    of the resource class allocations between the old and new flavor,
    e.g. if the compute has 8 VCPUs total and the instance is using
    6 VCPUs with the old flavor, and then resized to a new flavor with
    8 VCPUs, the scheduler is trying to "claim" 14 VCPUs when really the
    instance will only use 8 and that causes a NoValidHost error.
    Comparing to the pre-Pike scheduling and ResourceTracker.resize_claim
    behavior for resize to the same host, the scheduler would only filter
    on the new flavor resources and the resize_claim would only claim
    based on the new_flavor resources as well. The update_available_resource
    periodic would account for old_flavor usage in
    _update_usage_from_migration() after the instance was resized and
    before it was confirmed/reverted.
    
    Change-Id: I8c6b6c46b2587ee727653dafadbcb08b99ed7d35
    Related-Bug: #1790204