min_part_hours should not larger than 255

Bug #1605301 reported by Cheng Li
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
In Progress
Medium
Cheng Li

Bug Description

    def _update_last_part_moves(self):
        """
        Updates how many hours ago each partition was moved based on the
        current time. The builder won't move a partition that has been moved
        more recently than min_part_hours.
        """
        elapsed_hours = int(time() - self._last_part_moves_epoch) / 3600
        if elapsed_hours <= 0:
            return
        for part in range(self.parts):
            # The "min(self._last_part_moves[part] + elapsed_hours, 0xff)"
            # which was here showed up in profiling, so it got inlined.
            last_plus_elapsed = self._last_part_moves[part] + elapsed_hours
            if last_plus_elapsed < 0xff:
                self._last_part_moves[part] = last_plus_elapsed
            else:
                self._last_part_moves[part] = 0xff
        self._last_part_moves_epoch = int(time())

this function make self._last_part_moves[part] always <= 0xff ,
if min_part_hours is larger that 0xff, rebalance will never work.

Cheng Li (shcli)
Changed in swift:
assignee: nobody → Cheng Li (shcli)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/345524

Changed in swift:
status: New → In Progress
clayg (clay-gerrard)
Changed in swift:
importance: Undecided → Medium
Revision history for this message
clayg (clay-gerrard) wrote :

I can confirm we do crap validating that value and potentially not what you would expect if it's bogus

ideally we can do more checks in the RingBuilder code itself - preventing you from even creating new rings with bogus values, or setting it too high via the set_min_part_hours command - so that even if you're using the RingBuilder instance directly (not via cli utility) you still get the same protection.

I will say that I have in some rare cases because of other issues in a cluster set min_part_hours to 99999 to *prevent* any additional unneeded rebalance going on in a ring. Like if you want to fail a device w/o having any *other* parts get moved off. I think that *usecase* is worth considering - although it could be serviced in other means (a --dont-move-parts-you-dont-have-to option to rebalance? or maybe a set_min_part_hours "infinity"? or ... w/e)

Honestly I had *assumed* that after a decade or so it *would* actually start moving parts again, but of course I always change it back to a sane value before that ;)

Revision history for this message
Cheng Li (shcli) wrote :

Not to my knowledge, min_part_hours affect nothing except for rebalance.
My scenario is that:
1. People can create a ring with bogus min_part_hours, this will not affect anything.
   But they can't rebalance with this bogus min_part_hours.
   If they do, we give them a msg for setting a sane value.

2. People can set min_part_hours a bogus value to prevent future rebalance.

Revision history for this message
clayg (clay-gerrard) wrote :

Well, the use-case I was describing was not to set min_part_hours to a bogus value to make it so you *can not* rebalance a ring - rather setting it really high to prevent moving parts that don't *need* to move.

There's some types of _gather_parts_* that *ignore* min_part_hours - so even with a value of min_part_hours 9999 you *can* still successfully gather and place parts with a rebalance command even when they've moved more recently than min_part_hours e.g. a device has failed!

I think it's a use-case worth considering. But I don't buy that we should let you set a min_part_hours to something that would prevent the rebalance command from even running - we should just not let you set that min_part_hours. If OTOH we *are* going to let the rebalance continue then a super high min_part_hours has a known (but unfortunately arcane and undocumented) utility!

Negative min_part_hours is never useful - and we should prevent it via set_min_part_hours same as in create - we could check it *again* in rebalance to deal with legacy rings. Little to argue about there I think.

But *only* checking min_part_hours in rebalance, i.e. letting the user set a bad value that will later prevent a rebalance seems actively user-hostile.

Revision history for this message
clayg (clay-gerrard) wrote :
Download full text (3.5 KiB)

An easy way to understand this behavior is to create a ring with min_part_hours of 300 then add some devices and rebalance, then add another device, and pretend_min_part_hours and rebalance.

... nothing moves

With min_part_hours > 255 nothing will ever get gathered if it requires a last_part_moves check, no matter how far you are from the epoch apparently.

This is probably bad, it's a confusing semantic. Unfortunately I've leveraged this in the past to minimize unneeded partition movement during a gradual capacity increase that gets interrupted by a failed device:

    #!/bin/bash
    rm forever.builder || true
    swift-ring-builder forever.builder create 8 3 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d0 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d1 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d2 10
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d3 10
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d4 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d5 1
    swift-ring-builder forever.builder add r1z1-127.0.0.1:6001/d6 1
    echo "*** no parts can get reassigned!"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** gradually parts move"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    swift-ring-builder forever.builder set_weight d4 10
    swift-ring-builder forever.builder set_weight d5 10
    swift-ring-builder forever.builder set_weight d6 10
    echo "*** now it's un-balanced"
    swift-ring-builder forever.builder rebalance
    swift-ring-builder forever.builder
    echo "*** ... time passes"
    swift-ring-builder forever.builder pretend_min_part_hours_passed
    echo "*** and a device *fails!*"
    swift-ring-builder forever.builder remove d2
    cp forever.builder fast.builder
    echo "*** here can we do a little trick to prevent unneeded movement"
    swift-ring-builder forever.builder set_min_part_hours 300
    echo "*** ... and failed deivce parts reassign no matter what!"
    swift-ring-builder forever.builder rebalance | grep -i Reassigned
    echo "*** compared to not doing the trick"
    swift-ring-builder forever.builder set_min_part_hours 254
    swift-ring-builder fast.builder rebalance | grep -i Reassigned

In this scenario it's not unreasonable I think to want to ask rebalance to reassign *only* the parts it *has* to for the failed device. With the plan of waiting for that replication cycle to finish before continuing the normal gradual capacity adjustment and rebalance cycle.

It's possible this use-case is just an example of https://xkcd.com/1172/ ;)

But I think there's a subtle mistake in the bug report. It says "if min_part_hours is larger that 0xff, rebalance will never work" - but that's *only* true for some types of gather parts - the common ones people deal with when "playing" with rings. But in the real world there's...

Read more...

clayg (clay-gerrard)
tags: added: fixed-in-gerrit
removed: min-part-hours
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.