remove devices before min-part-hours requires --force

Bug #1558754 reported by clayg
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Jethro Sun

Bug Description

This script fails:

cd /tmp
rm test.builder
swift-ring-builder test.builder create 10 3 1
swift-ring-builder test.builder add r1z1-127.0.0.1:6000/sdb 1
swift-ring-builder test.builder add r1z1-127.0.0.1:6000/sdc 1
swift-ring-builder test.builder add r1z1-127.0.0.1:6000/sdd 1
swift-ring-builder test.builder add r1z1-127.0.0.1:6000/sde 1
swift-ring-builder test.builder rebalance
swift-ring-builder test.builder remove d0
swift-ring-builder test.builder rebalance

The error is:

No partitions could be reassigned.
The time between rebalances must be at least min_part_hours: 1 hours (0:59:59 remaining)

Which seems contrary to the statement from the removal:

d0r1z1-127.0.0.1:6000R127.0.0.1:6000/sdb_"" marked for removal and will be removed next rebalance.

You can work around the issue, and get second rebalance that fails to remove the device with the --force option.

There seems to be a pre-rebalance check in the cli code, which seems to circumvent expected behavior (regression). At a minimum we could probably detect the builder has devices marked for removal and allow the rebalance to proceed. But it'd be better to leave this kind of validation in the Builder code and not have pre-rebalance checks like this in the cli code - i'm sure existing tests for the builder would have detected this sort of mistake but not all scenarios like this get hit through the rinbuilder cli tests.

clayg (clay-gerrard)
description: updated
Jethro Sun (shwsun)
Changed in swift:
assignee: nobody → Jethro Sun (shwsun)
Revision history for this message
Jethro Sun (shwsun) wrote :

I just reproduced the bug. Working on it

Revision history for this message
Jethro Sun (shwsun) wrote :

I am a newbie so please point out anything that is not following the requirement. Basically I followed the suggestion and modified the swift/cli/ringbuilder.py and swift/common/ring/builder.py to fix the bug.

I already tested it on my SAIO and it is fine.

Jethro Sun (shwsun)
Changed in swift:
status: New → Fix Released
Revision history for this message
Thiago da Silva (thiagodasilva) wrote :

Jethro, thanks for your patch. In openstack, patches are always sent to gerrit for code review. You can learn more about the development process here: http://docs.openstack.org/infra/manual/developers.html

Revision history for this message
Jethro Sun (shwsun) wrote :

Thanks for the info, I believe now the patch is correctly submitted.

Revision history for this message
clayg (clay-gerrard) wrote :

> But it'd be better to leave this kind of validation in the Builder code and
> not have pre-rebalance checks like this in the cli code - i'm sure existing
> tests for the builder would have detected this sort of mistake but not all
> scenarios like this get hit through the rinbuilder cli tests.

This is still broken, there's other situations besides device removal that might make it *look* like rebalance isn't going to do anything - but if we would just let it *try* it would work as expected without the --force flag.

Remove the check in the cli code, and expose the error by detecting if

1) no parts have moved
2) the min part hours looks locked (same check as we have already sans the bit about the remove_devices)

If we just let the Builder do it's job this bug would be fixed.

Changed in swift:
status: Fix Released → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :
Changed in swift:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/326967
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=47901ea4d7fe5ffbbb826d561ad4a02dc4ea8f4a
Submitter: Jenkins
Branch: master

commit 47901ea4d7fe5ffbbb826d561ad4a02dc4ea8f4a
Author: Christian Schwede <email address hidden>
Date: Wed Jun 8 09:20:26 2016 +0000

    Rebalance with min_part_seconds_left > 0

    As described in bug #1558754 a rebalance after removing a device fails
    if min_part_seconds_left > 0, despite the note that a rebalance should
    remove partitions from removed devices on the next run.

    This patch skips the early exit, and lets the builder itself handling
    the rebalance. Partitions that shouldn't move now are still not moved
    (except for removed devices), and there is still a warning if no
    partition has been moved due to the fact that min_part_hours did not yet
    pass.

    A small test has been added to ensure rebalancing after removing a
    device works without using the --force option (tests fails on current
    master). Another test ensures that a rebalance after a recent change
    (for example increasing a device's weight) does not move partitions and
    still reports the former warning message.

    Closes-Bug: 1558754
    Change-Id: I083022d066338cbe6234bab491c7a8e8e0a7b517

Changed in swift:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/433567

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.0

This issue was fixed in the openstack/swift 2.13.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/newton)

Reviewed: https://review.openstack.org/433567
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=9c8c7e8a55d30a84407a7c55ec83292c2b836df6
Submitter: Jenkins
Branch: stable/newton

commit 9c8c7e8a55d30a84407a7c55ec83292c2b836df6
Author: Christian Schwede <email address hidden>
Date: Wed Jun 8 09:20:26 2016 +0000

    Rebalance with min_part_seconds_left > 0

    As described in bug #1558754 a rebalance after removing a device fails
    if min_part_seconds_left > 0, despite the note that a rebalance should
    remove partitions from removed devices on the next run.

    This patch skips the early exit, and lets the builder itself handling
    the rebalance. Partitions that shouldn't move now are still not moved
    (except for removed devices), and there is still a warning if no
    partition has been moved due to the fact that min_part_hours did not yet
    pass.

    A small test has been added to ensure rebalancing after removing a
    device works without using the --force option (tests fails on current
    master). Another test ensures that a rebalance after a recent change
    (for example increasing a device's weight) does not move partitions and
    still reports the former warning message.

    Closes-Bug: 1558754
    Change-Id: I083022d066338cbe6234bab491c7a8e8e0a7b517

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.10.2

This issue was fixed in the openstack/swift 2.10.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.