some rings won't rebalance

Bug #1699636 reported by clayg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

The problem is documented here:

https://github.com/openstack/swift/blob/2cb74b1b843eef3894e61ed3480ad07d4b16a06a/test/unit/common/ring/test_builder.py#L963

I saw it again recently and it looks like this:

https://gist.github.com/charz/5d9d9fb62d531eebef03bb98b5fec6cb

I kicked up some better rebalance --debug logging to make it easier to see:

https://review.openstack.org/#/c/476301/1

Basically you see a bunch of this:

DEBUG: Gathered 5814/0 from dev r3z5-127.0.0.58/d737 [weight disperse]
DEBUG: Gathered 5855/1 from dev r3z5-127.0.0.60/d898 [weight disperse]
DEBUG: Gathered 5916/0 from dev r3z5-127.0.0.60/d913 [weight disperse]
DEBUG: Gathered 5956/1 from dev r3z5-127.0.0.57/d638 [weight disperse]
DEBUG: Gathered 5962/2 from dev r3z5-127.0.0.57/d655 [weight disperse]
DEBUG: Gathered 5971/0 from dev r3z5-127.0.0.58/d713 [weight disperse]
DEBUG: Gathered 5983/0 from dev r3z5-127.0.0.60/d915 [weight disperse]
DEBUG: Gathered 5987/0 from dev r3z5-127.0.0.60/d891 [weight disperse]
DEBUG: Gathered 5988/1 from dev r3z5-127.0.0.57/d653 [weight disperse]
DEBUG: Gathered 6029/1 from dev r3z5-127.0.0.58/d719 [weight disperse]
DEBUG: Gathered 6041/0 from dev r3z5-127.0.0.57/d642 [weight disperse]

Followed by this:

DEBUG: Placed 5586/2 onto dev r3z5-127.0.0.57/d655
DEBUG: Placed 5597/0 onto dev r3z5-127.0.0.58/d707
DEBUG: Placed 5752/0 onto dev r3z5-127.0.0.60/d889
DEBUG: Placed 4970/0 onto dev r3z5-127.0.0.59/d777
DEBUG: Placed 5916/0 onto dev r3z5-127.0.0.57/d639
DEBUG: Placed 5472/0 onto dev r3z5-127.0.0.58/d711
DEBUG: Placed 5358/0 onto dev r3z5-127.0.0.60/d916
DEBUG: Placed 5962/2 onto dev r3z5-127.0.0.59/d800
DEBUG: Placed 5056/0 onto dev r3z5-127.0.0.57/d648
DEBUG: Placed 5235/2 onto dev r3z5-127.0.0.58/d726

but meanwhile this isn't the zone that needs the parts!

r1z1 49152 0.00 1 16384 49152 0 0
r1z2 49152 0.00 1 16384 49152 0 0
r3z5 49321 0.00 1 16215 49321 0 0
r3z6 48983 0.00 1 16553 48983 0 0

IIRC the issue is that by weight we want to pull parts out of r3z5, and we tend to take ones that are over-represented in r3 (two replicas) when we go to set them down we see both regions have one copy (we're holding the third) but r3z6 is hungry so head back into r3 - only we can't put it on r3z6, so we land back on r3z5 rather than putting extra parts in r1

The problem is in the implementation phase, not the planning phase - we know each server should hold ~0.75 replicanths - but we can't seem to notice that the ones we want to move from r3z5 need swap places with other parts in r3z6. I think I had some idea about overloading gather so that we pickup some extra % of parts... but like I think there's only a few 100 out of ~50K parts that *could* move from r3z5 to r3z6 and we find a few every time we rebalance - but we have to move 1K's of parts to do it. NOT GREAT!

Tim Burke (1-tim-z)
tags: added: ring
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/503152
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=23219664564d1b5a7ba02bbf8309ec699ab7a4cb
Submitter: Jenkins
Branch: master

commit 23219664564d1b5a7ba02bbf8309ec699ab7a4cb
Author: Kota Tsuyuzaki <email address hidden>
Date: Fri Jun 30 02:03:48 2017 -0700

    Accept a trade off of dispersion for balance

    ... but only if we *have* to!

    During the initial gather for balance we prefer to avoid replicas on
    over-weight devices that are already under-represented in any of it's
    tiers (i.e. if a zone has to have at least one, but may have as many of
    two, don't take the only replica). Instead we hope by going for
    replicas on over-weight devices that are at the limits of their
    dispersion we might have a better than even chance we find a better
    place for them during placement!

    This normally works on out - and especially so for rings which can
    disperse and balance. But for existing rings where we'd have to
    sacrifice dispersion to improve balance the existing optimistic gather
    will end up refusing to trade dispersion for balance - and instead get
    stuck without solving either!

    You should always be able to solve for *either* dispersion or balance.
    But if you can't solve *both* - we bail out on our optimistic gather
    much more quickly and instead just focus on improving balance. With
    this change, the ring can get into balanced (and un-dispersed) states
    much more quickly!

    Change-Id: I17ac627f94f64211afaccad15596a9fcab2fada2
    Related-Change-Id: Ie6e2d116b65938edac29efa6171e2470bb3e8e12
    Closes-Bug: 1699636
    Closes-Bug: 1701472

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/508700

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (7.2 KiB)

Reviewed: https://review.openstack.org/508700
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c75ddf6fe5a4843fe60836b402f27cb1b83d8c5
Submitter: Zuul
Branch: feature/deep

commit 93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 16:35:27 2017 +0100

    Add cautionary note re delay_reaping in account-server.conf-sample

    Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
    Related-Bug: #1514528

commit c6aea4b3730c937c41815831a7b4d60ff2899fcb
Author: Tim Burke <email address hidden>
Date: Wed Sep 27 19:19:53 2017 +0000

    Fix intermittent failure in test_x_delete_after

    X-Delete-After: 1 is known to be flakey; use 2 instead.

    When the proxy receives an X-Delete-After header, it automatically
    converts it to an X-Delete-At header based on the current time. So far,
    so good. But in normalize_delete_at_timestamp we convert our

        time.time() + int(req.headers['X-Delete-After'])

    to a string representation of an integer and in the process always round
    *down*. As a result, we lose up to a second worth of object validity,
    meaning the object server can (rarely) respond 400, complaining that the
    X-Delete-At is in the past.

    Change-Id: Ib5e5a48f5cbed0eade8ba3bca96b26c82a9f9d84
    Related-Change: I643be9af8f054f33897dd74071027a739eaa2c5c
    Related-Change: I10d3b9fcbefff3c415a92fa284a1ea1eda458581
    Related-Change: Ifdb1920e5266aaa278baa0759fc0bfaa1aff2d0d
    Related-Bug: #1597520
    Closes-Bug: #1699114

commit 5c76b9e691166acc1f7b8483aaa3980ebc70bd3a
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 14:11:14 2017 +0100

    Add concurrent_gets to proxy.conf man page

    Change-Id: Iab1beff4899d096936c0e5915f3ec32364b3e517
    Closes-Bug: #1559347

commit b4f08b6090057897ac647ba6331a4ec867b8e3b8
Author: Jens Harbott <email address hidden>
Date: Wed Sep 27 09:10:54 2017 +0000

    Fix functest for IPv6 endpoints

    Currently the functional tests fail if the storage_url contains a quoted
    IPv6 address because we try to split on ':'.

    But actually we don't need to split hostname and port only in order to
    combine it back together lateron. Use the standard urlparse() function
    instead and work with the 'netloc' part of the URL which keeps hostname
    and port together.

    Change-Id: I64589e5f2d6fb3cebc6768dc9e4de6264c09cbeb
    Partial-Bug: 1656329

commit 53ab6f2907eff2bb90528010d881f2f87ee02505
Author: Alistair Coles <email address hidden>
Date: Tue Sep 26 11:43:53 2017 +0100

    Assert memcached connection error is logged

    Follow up to [1] - change logger mocking so that we can
    assert the memcached connection error is logged.

    [1] Related-Change: I97c5420b4b4ecc127e9e94e9d0f91fbe92a5f623

    Change-Id: I87cf4245082c5e0f0705c2c14ddfc0b5d5d89c06

commit e501ac7d2be5c11b2ed0005885c84023054ec041
Author: Matthew Oliver <email address hidden>
Date: Thu Sep 3 12:19:05 2015 +1000

    Fix memcached exception out of range stacktrace

    When a memecached server goes offline in the middle of a
    MemcahceRing...

Read more...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512283

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/s3api)

Change abandoned by Alistair Coles (<email address hidden>) on branch: feature/s3api
Review: https://review.openstack.org/512283
Reason: I was just trying to get sensible topic

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (23.2 KiB)

Reviewed: https://review.openstack.org/512277
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f94d6567a7e2e8b3ca1168b4a41c42c1ee371af5
Submitter: Zuul
Branch: feature/s3api

commit 24188beb81d39790034fa0902246163a7bf54c91
Author: Samuel Merritt <email address hidden>
Date: Thu Oct 12 16:13:25 2017 -0700

    Remove some leftover threadpool cruft.

    Change-Id: I43a1a428bd96a2e18aac334c03743a9f94f7d3e1

commit 1d67485c0b935719e0c8999eb353dfd84713add6
Author: Samuel Merritt <email address hidden>
Date: Fri Apr 15 12:43:44 2016 -0700

    Move all monkey patching to one function

    Change-Id: I2db2e53c50bcfa17f08a136581cfd7ac4958ada2

commit 407f5394f0f5cb422c06b4e5b2f9fbfdb07782d1
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Oct 12 08:12:38 2017 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

    Change-Id: I628cb09aa78d8e339b4762a3c9ed8aed43941261

commit 45ca39fc68cdb42b382c1638a92cc8d3cec5529a
Author: Clay Gerrard <email address hidden>
Date: Tue Oct 10 11:47:50 2017 -0700

    add mangle_client_paths to example config

    Change-Id: Ic1126fc95e8152025fccf25356c253facce3e3ec

commit 94bac4ab2fe65104d602378e8e49c37b8187a75d
Author: Tim Burke <email address hidden>
Date: Fri May 12 10:55:21 2017 -0400

    domain_remap: stop mangling client-provided paths

    The root_path option for domain_remap seems to serve two purposes:
     - provide the first component (version) for the backend request
     - be an optional leading component for the client request, which
       should be stripped off

    As a result, we have mappings like:

       c.a.example.com/v1/o -> /v1/AUTH_a/c/o

    instead of

       c.a.example.com/v1/o -> /v1/AUTH_a/c/v1/o

    which is rather bizarre. Why on earth did we *ever* start doing this?

    Now, this second behavior is managed by a config option
    (mangle_client_paths) with the default being to disable it.

    Upgrade Consideration
    =====================

    If for some reason you *do* want to drop some parts of the
    client-supplied path, add

       mangle_client_paths = True

    to the [filter:domain_remap] section of your proxy-server.conf. Do this
    before upgrading to avoid any loss of availability.

    UpgradeImpact
    Change-Id: I87944bfbf8b767e1fc36dbc7910305fa1f11eeed

commit a4a5494fd2fe8a43a5d50a21a1951266cc7c4212
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 11:33:28 2017 +0100

    test account autocreate listing format

    Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
    Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0

commit 8b7f15223cde4c19fd9cbbd97e8ad79a1b4afa8d
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 10:06:19 2017 +0100

    Add example to container-sync-realms.conf.5 man page

    Related-Change: I0760ce149e6d74f2b3f1badebac3e36da1ab7e77

    Change-Id: I129de42f91d7924c7bcb9952f17fe8a1a10ae219

commit 816331155c624c444ed123bcab412...

tags: added: in-feature-s3api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.16.0

This issue was fixed in the openstack/swift 2.16.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.