Bug #1605301 “min_part_hours should not larger than 255” : Bugs : OpenStack Object Storage (swift)

Cheng Li (shcli) on 2016-07-21

Changed in swift:
assignee:	nobody → Cheng Li (shcli)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-21: Fix proposed to swift (master)

#1

Fix proposed to branch: master
Review: https://review.openstack.org/345524

Changed in swift:
status:	New → In Progress

clayg (clay-gerrard) on 2016-07-21

Changed in swift:
importance:	Undecided → Medium

Revision history for this message

clayg (clay-gerrard) wrote on 2016-07-21:

#2

I can confirm we do crap validating that value and potentially not what you would expect if it's bogus

ideally we can do more checks in the RingBuilder code itself - preventing you from even creating new rings with bogus values, or setting it too high via the set_min_part_hours command - so that even if you're using the RingBuilder instance directly (not via cli utility) you still get the same protection.

I will say that I have in some rare cases because of other issues in a cluster set min_part_hours to 99999 to *prevent* any additional unneeded rebalance going on in a ring. Like if you want to fail a device w/o having any *other* parts get moved off. I think that *usecase* is worth considering - although it could be serviced in other means (a --dont-move-parts-you-dont-have-to option to rebalance? or maybe a set_min_part_hours "infinity"? or ... w/e)

Honestly I had *assumed* that after a decade or so it *would* actually start moving parts again, but of course I always change it back to a sane value before that ;)

Revision history for this message

Cheng Li (shcli) wrote on 2016-07-22:

#3

Not to my knowledge, min_part_hours affect nothing except for rebalance.
My scenario is that:
1. People can create a ring with bogus min_part_hours, this will not affect anything.
But they can't rebalance with this bogus min_part_hours.
If they do, we give them a msg for setting a sane value.

2. People can set min_part_hours a bogus value to prevent future rebalance.

Revision history for this message

clayg (clay-gerrard) wrote on 2016-07-22:

#4

Well, the use-case I was describing was not to set min_part_hours to a bogus value to make it so you *can not* rebalance a ring - rather setting it really high to prevent moving parts that don't *need* to move.

There's some types of _gather_parts_* that *ignore* min_part_hours - so even with a value of min_part_hours 9999 you *can* still successfully gather and place parts with a rebalance command even when they've moved more recently than min_part_hours e.g. a device has failed!

I think it's a use-case worth considering. But I don't buy that we should let you set a min_part_hours to something that would prevent the rebalance command from even running - we should just not let you set that min_part_hours. If OTOH we *are* going to let the rebalance continue then a super high min_part_hours has a known (but unfortunately arcane and undocumented) utility!

Negative min_part_hours is never useful - and we should prevent it via set_min_part_hours same as in create - we could check it *again* in rebalance to deal with legacy rings. Little to argue about there I think.

But *only* checking min_part_hours in rebalance, i.e. letting the user set a bad value that will later prevent a rebalance seems actively user-hostile.

Revision history for this message

clayg (clay-gerrard) wrote on 2016-07-29:

#5

Download full text (3.5 KiB)

An easy way to understand this behavior is to create a ring with min_part_hours of 300 then add some devices and rebalance, then add another device, and pretend_min_part_hours and rebalance.

... nothing moves

With min_part_hours > 255 nothing will ever get gathered if it requires a last_part_moves check, no matter how far you are from the epoch apparently.

This is probably bad, it's a confusing semantic. Unfortunately I've leveraged this in the past to minimize unneeded partition movement during a gradual capacity increase that gets interrupted by a failed device:

    #!/bin/bash
    rm forever.builder || true
    swift-ring-builder forever.builder create 8 3 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d0 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d1 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d2 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d3 10
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d4 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d5 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d6 1
    echo "*** no parts can get reassigned!"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** gradually parts move"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder set_weight d4 10
    swift-ring-builder forever.builder set_weight d5 10
    swift-ring-builder forever.builder set_weight d6 10
    echo "*** now it's un-balanced"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** and a device *fails!*"
    swift-ring-builder forever.builder remove d2
    cp forever.builder fast.builder
    echo "*** here can we do a little trick to prevent unneeded movement"
    swift-ring-builder forever.builder set_min_part_hours 300
    echo "*** ... and failed deivce parts reassign no matter what!"
    swift-ring-builder forever.builder rebalance | grep -i Reassigned
    echo "*** compared to not doing the trick"
    swift-ring-builder forever.builder set_min_part_hours 254
    swift-ring-builder fast.builder rebalance | grep -i Reassigned

In this scenario it's not unreasonable I think to want to ask rebalance to reassign *only* the parts it *has* to for the failed device. With the plan of waiting for that replication cycle to finish before continuing the normal gradual capacity adjustment and rebalance cycle.

It's possible this use-case is just an example of https://xkcd.com/1172/ ;)

But I think there's a subtle mistake in the bug report. It says "if min_part_hours is larger that 0xff, rebalance will never work" - but that's *only* true for some types of gather parts - the common ones people deal with when "playing" with rings. But in the real world there's...

An easy way to understand this behavior is to create a ring with min_part_hours of 300 then add some devices and rebalance, then add another device, and pretend_min_part_hours and rebalance.

... nothing moves

With min_part_hours > 255 nothing will ever get gathered if it requires a last_part_moves check, no matter how far you are from the epoch apparently.

This is probably bad, it's a confusing semantic.  Unfortunately I've leveraged this in the past to minimize unneeded partition movement during a gradual capacity increase that gets interrupted by a failed device:

#!/bin/bash
    rm forever.builder || true
    swift-ring-builder forever.builder create 8 3 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d0 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d1 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d2 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d3 10
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d4 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d5 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d6 1
    echo "*** no parts can get reassigned!"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** gradually parts move"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder set_weight d4 10
    swift-ring-builder forever.builder set_weight d5 10
    swift-ring-builder forever.builder set_weight d6 10
    echo "*** now it's un-balanced"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** and a device *fails!*"
    swift-ring-builder forever.builder remove d2
    cp forever.builder fast.builder
    echo "*** here can we do a little trick to prevent unneeded movement"
    swift-ring-builder forever.builder set_min_part_hours 300
    echo "*** ... and failed deivce parts reassign no matter what!"
    swift-ring-builder forever.builder rebalance | grep -i Reassigned
    echo "*** compared to not doing the trick"
    swift-ring-builder forever.builder set_min_part_hours 254
    swift-ring-builder fast.builder rebalance | grep -i Reassigned

In this scenario it's not unreasonable I think to want to ask rebalance to reassign *only* the parts it *has* to for the failed device.  With the plan of waiting for that replication cycle to finish before continuing the normal gradual capacity adjustment and rebalance cycle.

It's possible this use-case is just an example of https://xkcd.com/1172/  ;)

But I think there's a subtle mistake in the bug report.  It says "if min_part_hours is larger that 0xff, rebalance will never work" - but that's *only* true for some types of gather parts - the common ones people deal with when "playing" with rings.  But in the real world there's some type of gather parts that must avoid min_part_hours checks, and when they do your goals as an operator can come into conflict with other priorities in the ring, so the only escape hatch we have gets used.

I'd really like to get some additional opinions to help align my thinking based on input from other operators and developers familiar with the ring and thinking about how its going to continue to evolve.

clayg (clay-gerrard) on 2016-10-11

tags:

added: fixed-in-gerrit
removed: min-part-hours

OpenStack Object Storage (swift)

min_part_hours should not larger than 255

Bug Description

Other bug subscribers

Remote bug watches