Reconstructor should not hash suffixes after failure

Bug #1665141 reported by clayg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Unassigned

Bug Description

During a big rebalance, the SSYNC replication server can suffer a lot from eventlet hub starvation (i.e. a blocking stat/write in one coro can cause a timeout in another) - especially when there is no IO limits per device.

The SSYNC protocol *is* limited per device by default (replication_one_per_device) but the REPLICATE suffix hashing requests are *unlimited*.

Even when running in handoffs_first - any reconstructor ssync request (even ones that get rejected for concurrency) will trigger a rehash [1]

In the happy path, this unbounded IO is optimistic at best; at worst its maybe only marginally helpful - but unlike rsync; ssync invalidates hashes as it goes - so it is *not* required. In the error path, it is actively harmful committing valuable IOps to zero-value work and activly contributing to the very IO conention that is causing the problem. It there a word for a positive feedback loop with a negative outcome? Downward death spiral?

Fix is trivial - attached.

1. https://github.com/openstack/swift/blob/cdd72dd34f879c7340d4c379db955ea82db3cfe2/swift/obj/reconstructor.py#L668

Revision history for this message
clayg (clay-gerrard) wrote :
clayg (clay-gerrard)
Changed in swift:
importance: Undecided → High
clayg (clay-gerrard)
summary: - Reconstructor should not sync after failure
+ Reconstructor should not hash suffixes after failure
Revision history for this message
clayg (clay-gerrard) wrote :
clayg (clay-gerrard)
Changed in swift:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/435152
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Submitter: Jenkins
Branch: master

commit a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Author: Clay Gerrard <email address hidden>
Date: Thu Feb 16 14:14:09 2017 -0800

    Do not sync suffixes when remote rejects reconstructor revert

    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention. The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.

    Suffix rehashing via REPLICATE verbs takes two forms:

    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.

    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid. This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.

    SSYNC does not *need* to send a post-sync REPLICATE request. Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing. It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.

    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash. It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.

    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.

    Closes-Bug: #1665141

    Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/464980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/464982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/ocata)

Reviewed: https://review.openstack.org/464980
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e127f2277c4436a97f5b2d74307a31af2c98297f
Submitter: Jenkins
Branch: stable/ocata

commit e127f2277c4436a97f5b2d74307a31af2c98297f
Author: Clay Gerrard <email address hidden>
Date: Thu Feb 16 14:14:09 2017 -0800

    Do not sync suffixes when remote rejects reconstructor revert

    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention. The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.

    Suffix rehashing via REPLICATE verbs takes two forms:

    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.

    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid. This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.

    SSYNC does not *need* to send a post-sync REPLICATE request. Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing. It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.

    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash. It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.

    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.

    Closes-Bug: #1665141

    Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.1

This issue was fixed in the openstack/swift 2.13.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (stable/newton)

Change abandoned by John Dickinson (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/464982
Reason: This backport depends on a feature that was landed after newton, so we're not going to backport this to newton.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.15.0

This issue was fixed in the openstack/swift 2.15.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/662735

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/662735
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=37fa12cd83849a3ae8374ff07861d1d710d53174
Submitter: Zuul
Branch: master

commit 37fa12cd83849a3ae8374ff07861d1d710d53174
Author: Kuan-Lin Chen <email address hidden>
Date: Mon Jun 3 18:39:51 2019 +0800

    Do not sync suffixes when remote rejects reconstructor sync

    The commit a0fcca1e makes reconstructor not sync suffixes when remote
    reject reconstructor revert. However, the exact same logic should
    be applied to SYNC job as well. REPLICATE requests aren't generally
    needed when using SSYC (which the reconstructor always does).

    If a ssync_sender fails to finish a sync the reconstructor should skip
    the REPLICATE call entirely and move on to the next partition without
    causing any useless remote IO.

    Change-Id: Ida50539e645ea7e2950ba668c7f031a8d10da787
    Closes-Bug: #1665141

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/losf)

Fix proposed to branch: feature/losf
Review: https://review.opendev.org/665170

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/losf)
Download full text (3.5 KiB)

Reviewed: https://review.opendev.org/665170
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=b1ad1bcec95dbf898764b61d063490a36ae75c29
Submitter: Zuul
Branch: feature/losf

commit aa2f1db1b71c1b2bf746b72515e3efd15598b6aa
Author: Tim Burke <email address hidden>
Date: Tue Jun 11 14:50:49 2019 -0700

    Ensure get_*_info keys are native strings

    Change-Id: I29bbea48ae38cfabf449a9f4cca1f5f27769405a

commit b7b92b97b12f2a5c0e1beed59b6ffd4791cec896
Author: Tim Burke <email address hidden>
Date: Mon Jun 10 15:40:58 2019 -0700

    Bump up minimum cryptography version

    ...not because we strictly *need* newer cryptography, but rather because
    distro packages have moved forward to the point where the 1.x series
    won't compile from source and PyPI doesn't have wheels for them.

    See changes like:

    - https://github.com/pyca/cryptography/commit/6e7ea2e
    - https://github.com/pyca/cryptography/commit/f88aea5

    Change-Id: I1ff5b61873cf382c7a89873ed4ba6153f299262a

commit dca658103a63d212bdf9195fcde6038557c13401
Author: Clay Gerrard <email address hidden>
Date: Thu Jun 6 14:25:22 2019 -0500

    Fix swift with python <2.7.9

    Closes-Bug: #1831932

    Change-Id: I0d33864f4bffa401082548ee9a52f6eb50cb1f39

commit d9cafca246bb15e706d9f7546e1f4bedda1b6c8b
Author: Tim Burke <email address hidden>
Date: Tue May 21 18:04:05 2019 -0700

    py3: port ssync

    Change-Id: I63a502be13f5dcda2a457d38f2fc5f1ca469d562

commit 98637dc1e7a6ef5641079a6226d12bf106436b35
Author: 翟小君 <email address hidden>
Date: Wed Jun 5 12:35:00 2019 +0800

    Bump openstackdocstheme to 1.30.0

    ...to pick up many improvements, including the return of table borders.

    Change-Id: I166211b690b08521171b489582fa419d756b1972

commit 37fa12cd83849a3ae8374ff07861d1d710d53174
Author: Kuan-Lin Chen <email address hidden>
Date: Mon Jun 3 18:39:51 2019 +0800

    Do not sync suffixes when remote rejects reconstructor sync

    The commit a0fcca1e makes reconstructor not sync suffixes when remote
    reject reconstructor revert. However, the exact same logic should
    be applied to SYNC job as well. REPLICATE requests aren't generally
    needed when using SSYC (which the reconstructor always does).

    If a ssync_sender fails to finish a sync the reconstructor should skip
    the REPLICATE call entirely and move on to the next partition without
    causing any useless remote IO.

    Change-Id: Ida50539e645ea7e2950ba668c7f031a8d10da787
    Closes-Bug: #1665141

commit 2e35376c6d6afb5aa2a36081861bab011c8c95c3
Author: Tim Burke <email address hidden>
Date: Thu May 30 11:55:58 2019 -0700

    py3: symlink follow-up

    - Have the unit tests use WSGI strings, like a real system.
    - Port the func tests.

    Change-Id: I3a6f409208de45ebf9f55f7f59e4fe6ac6fbe163

commit 82e446a8a0c0fd6a81f06717b76ed3d1be26a281
Author: Tim Burke <email address hidden>
Date: Mon May 20 11:44:21 2019 -0700

    s3api: Allow clients to upload with UNSIGNED-PAYLOAD

    (Some versions of?) awscli/boto3 will do v4 signatures but send a
    Content-MD5 for end-to-end validation. Sin...

Read more...

tags: added: in-feature-losf
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.22.0

This issue was fixed in the openstack/swift 2.22.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.