Placement APIs with missing conflict detection

Bug #1746373 reported by Eric Fried
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Eric Fried

Bug Description

Placement has a few APIs which affect resource provider generation, but which do not accept the current resource provider generation and therefore cannot ensure consistency. They are as follows:

DELETE /resource_providers/{u}/inventories
DELETE /resource_providers/{u}/traits

POST /allocations

PUT /allocations/{c}
DELETE /allocations/{c}

As an example of how this is broken:

- X wants to remove all of provider {u}'s inventory in resource class VGPU. He GETs inventory at generation 1, which happens to contain *only* VGPU.
- Y wants to add some SRIOV_NET_VF inventory to {u}. He GETs inventory at generation 1, adds the SRIOV_NET_VF inventory, and PUTs it back. The server increments the generation to 2, and the provider now has both VGPU and SRIOV_NET_VF inventory.
- X, thinking the provider only has VGPU resource, invokes DELETE /resource_providers/{u}/inventories, which succeeds, thereby blowing away the SRIOV_NET_VF inventory.

Note that in the case of DELETE /resource_providers/{u}/inventories and .../traits, there is an alternative, PUT <empty>, which *does* accept the provider generation.

For the allocations APIs, there is no alternative. Though it could be argued that it should not be necessary to send generation with the allocations APIs.

Tags: placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/539712

Matt Riedemann (mriedem)
Changed in nova:
status: New → In Progress
assignee: nobody → Eric Fried (efried)
Eric Fried (efried)
Changed in nova:
status: In Progress → New
Revision history for this message
Eric Fried (efried) wrote :

cdent and edleafe convinced me [1] that we don't need generation handling for allocations in Nova.

As for the DELETE APIs, we're fine as long as the consumer has *some* generation-supporting mechanism by which to perform removals (in the case of inventories & traits, PUT with {}). Any consumer that knows it can be one of many threads managing a single provider just needs to be aware of this caveat and behave accordingly.

So perhaps it would be prudent to enhance the documentation for those DELETE APIs to mention this point.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-02-05.log.html#t2018-02-05T15:42:37

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/540919

Changed in nova:
status: New → In Progress
Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → Critical
importance: Critical → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/540919
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f3b0cf39750f455f33d2e259690ce9050cd05236
Submitter: Zuul
Branch: master

commit f3b0cf39750f455f33d2e259690ce9050cd05236
Author: Eric Fried <email address hidden>
Date: Mon Feb 5 10:27:24 2018 -0600

    placement doc: Conflict caveat for DELETE APIs

    Since the DELETE /resource_providers/{u}/inventories and .../traits APIs
    don't have a way to accept generation, they're not "threadsafe" in the
    sense of multiple client threads managing traits/inventories for the
    same provider. This change adds a note to the documentation for these
    APIs to this effect, suggesting the use of PUT with empty
    traits/inventories instead.

    Change-Id: Icfd79cc1f5a912131845a22b4fe900147b19f934
    Closes-Bug: #1746373

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc1

This issue was fixed in the openstack/nova 17.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/539712
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26c593c91f008caab92ed52156dfe2d898955d3f
Submitter: Zuul
Branch: master

commit 26c593c91f008caab92ed52156dfe2d898955d3f
Author: Eric Fried <email address hidden>
Date: Tue Jan 30 21:53:52 2018 -0600

    Avoid inventory DELETE API (no conflict detection)

    SchedulerReportClient._delete_inventory used the DELETE
    /resource_providers/{u}/inventories API, which does not provide a way to
    send the generation down (see bug 1746373), and is therefore subject to
    concurrency errors.

    This change set removes the _delete_inventory method and uses PUT
    /resource_providers/{u}/inventories with an empty 'inventories' dict
    instead, because that API *does* take the generation in the payload.

    Doing this makes moot part of bug 1746075 by removing one of the code
    paths that exploits undocumented knowledge of how placement uses the
    generation counter.

    Change-Id: Ia8131b32debd56b28315ef430ee4e8888b6f56e7
    Closes-Bug: #1746374
    Related-Bug: #1746373
    Related-Bug: #1746075

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.