OpenStack Compute (nova)

Bug #1781710
Comment #15

Comment 15 for bug 1781710

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-07-24: Fix merged to nova (master)

#15

Reviewed: https://review.openstack.org/583347
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e
Submitter: Zuul
Branch: master

commit c22b53c2481bac518a6b32cdee7b7df23d91251e
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 17 17:43:37 2018 -0400

Update RequestSpec.instance_uuid during scheduling

    Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb
    the ServerGroupAntiAffinityFilter did not rely on the
    HostState.instances dict to determine **within the same
    request** how many members of the same anti-affinity
    group were on a given host. In fact, at that time, the
    HostState.instances dict wasn't updated between filter
    runs in the same multi-create request. That was fixed with
    change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that
    as we select a host for each group member being created
    within a single request, we also update the HostState.instances
    dict so the ServerGroupAntiAffinityFilter would have
    accurate tracking of which group members were on which
    hosts.

    However, that did not account for a wrinkle in the filter
    added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd
    which is needed to allow resizing a server to the same host
    when that server is in an anti-affinity group. That wrinkle,
    combined with the fact the RequestSpec the filter is acting
    upon for a given instance in a multi-create request might
    not actually have the same instance_uuid can cause the filter
    to think it's in a resize situation and accept a host which
    already has another member from the same anti-affinity group
    on it, which breaks the anti-affinity policy.

    For background, during a multi-create request, we create a
    RequestSpec per instance being created, but conductor only
    sends the first RequestSpec for the first instance to the
    scheduler. This means RequestSpec.num_instances can be >1
    and we can be processing the Nth instance in the list during
    scheduling but working on a RequestSpec for the first instance.
    That is what breaks the resize check in ServerGroupAntiAffinityFilter
    with regard to multi-create.

    To resolve this, we update the RequestSpec.instance_uuid when
    filtering hosts for a given instance but we don't persist that
    change to the RequestSpec.

    This is a bit clunky, but it kind of comes with the territory of
    how we hack scheduling requests together using a single RequestSpec
    for multi-create requests. An alternative to this is found in change
    I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the
    RequestSpec.num_instances field to determine if we're in a multi-create
    situation, but that has its own flaws because num_instances is
    persisted with the RequestSpec which might cause us to re-introduce
    bug 1558532 if we're not careful. In that case we'd have to either
    (1) stop persisting RequestSpec.num_instances or (2) temporarily
    unset it like we do using RequestSpec.reset_forced_destinations()
    during move operations.

Co-Authored-By: Sean Mooney <email address hidden>

Closes-Bug: #1781710

Change-Id: Icba22060cb379ebd5e906981ec283667350b8c5a

Reviewed:  https://review.openstack.org/583347
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e
Submitter: Zuul
Branch:    master

commit c22b53c2481bac518a6b32cdee7b7df23d91251e
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Tue Jul 17 17:43:37 2018 -0400

Update RequestSpec.instance_uuid during scheduling
    
    Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb
    the ServerGroupAntiAffinityFilter did not rely on the
    HostState.instances dict to determine **within the same
    request** how many members of the same anti-affinity
    group were on a given host. In fact, at that time, the
    HostState.instances dict wasn't updated between filter
    runs in the same multi-create request. That was fixed with
    change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that
    as we select a host for each group member being created
    within a single request, we also update the HostState.instances
    dict so the ServerGroupAntiAffinityFilter would have
    accurate tracking of which group members were on which
    hosts.
    
    However, that did not account for a wrinkle in the filter
    added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd
    which is needed to allow resizing a server to the same host
    when that server is in an anti-affinity group. That wrinkle,
    combined with the fact the RequestSpec the filter is acting
    upon for a given instance in a multi-create request might
    not actually have the same instance_uuid can cause the filter
    to think it's in a resize situation and accept a host which
    already has another member from the same anti-affinity group
    on it, which breaks the anti-affinity policy.
    
    For background, during a multi-create request, we create a
    RequestSpec per instance being created, but conductor only
    sends the first RequestSpec for the first instance to the
    scheduler. This means RequestSpec.num_instances can be >1
    and we can be processing the Nth instance in the list during
    scheduling but working on a RequestSpec for the first instance.
    That is what breaks the resize check in ServerGroupAntiAffinityFilter
    with regard to multi-create.
    
    To resolve this, we update the RequestSpec.instance_uuid when
    filtering hosts for a given instance but we don't persist that
    change to the RequestSpec.
    
    This is a bit clunky, but it kind of comes with the territory of
    how we hack scheduling requests together using a single RequestSpec
    for multi-create requests. An alternative to this is found in change
    I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the
    RequestSpec.num_instances field to determine if we're in a multi-create
    situation, but that has its own flaws because num_instances is
    persisted with the RequestSpec which might cause us to re-introduce
    bug 1558532 if we're not careful. In that case we'd have to either
    (1) stop persisting RequestSpec.num_instances or (2) temporarily
    unset it like we do using RequestSpec.reset_forced_destinations()
    during move operations.
    
    Co-Authored-By: Sean Mooney <work@seanmooney.info>
    
    Closes-Bug: #1781710
    
    Change-Id: Icba22060cb379ebd5e906981ec283667350b8c5a