expiring objects warning if replica > 3

Bug #1733588 reported by Christopher Bartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Christopher Bartz

Bug Description

When using a storage-policy with an underneath ring using a replica count > 3 and a container ring with replica == 3, the object server logs following warning

Nov 20 12:20:04 server5 object-server: X-Delete-At-Container header must be specified for expiring objects background PUT to work properly. Making best guess as to the container name for now. (txn: txc70lshsaklhdalkhlkhlkhl)

for (total_count - replica_count) object servers. The reason lies in lines https://github.com/openstack/swift/blob/master/swift/proxy/controllers/obj.py#L441-L443 and https://github.com/openstack/swift/blob/master/swift/proxy/controllers/obj.py#L302-L312 : If the container ring has n replicas, only n headers with x-delete-at-container are generated. If the object-server uses more than n replicas (k), k -n headers are missing and therefore k - n object servers are complaining.

The bug is critical because https://github.com/openstack/swift/blob/master/swift/obj/server.py#L398-L399 mentions that the warning will get replaced by an exception in the future.

description: updated
Revision history for this message
Alistair Coles (alistair-coles) wrote :

This is straightforward to reproduce in a dev SAIO by putting an object to a 4+2 EC policy:

```
swift@vm-15:~/swift$ tail -f /var/log/swift/storage4.error
Nov 9 10:07:38 vm-15 object-server: STDERR: 127.0.0.1 - - [09/Nov/2017 10:07:38] "HEAD /sdb4/30/AUTH_test/c1/LICENSE HTTP/1.1" 404 141 0.000864 (txn: tx82fc8e48cd5a4e848d339-005a0428ea)
Nov 9 10:07:38 vm-15 object-server: STDERR: (23538) accepted ('127.0.0.1', 59694)
Nov 9 10:07:38 vm-15 container-server: STDERR: (23519) accepted ('127.0.0.1', 43880)
Nov 9 10:07:38 vm-15 container-server: STDERR: 127.0.0.1 - - [09/Nov/2017 10:07:38] "PUT /sdb8/509/AUTH_test/c1/LICENSE HTTP/1.1" 201 120 0.000587 (txn: tx20d4a7a566f940d4849cb-005a0428ea)
Nov 9 10:07:38 vm-15 container-server: STDERR: (23519) accepted ('127.0.0.1', 43882)
Nov 9 10:07:38 vm-15 container-server: STDERR: 127.0.0.1 - - [09/Nov/2017 10:07:38] "PUT /sdb4/734/.expiring_objects/1510185559/1510222068-AUTH_test/c1/LICENSE HTTP/1.1" 201 120 0.004376 (txn: tx20d4a7a566f940d4849cb-005a0428ea)
Nov 9 10:07:38 vm-15 object-server: X-Delete-At-Container header must be specified for expiring objects background PUT to work properly. Making best guess as to the container name for now. (txn: tx20d4a7a566f940d4849cb-005a0428ea)
Nov 9 10:07:38 vm-15 object-server: STDERR: 127.0.0.1 - - [09/Nov/2017 10:07:38] "PUT /sdb8/30/AUTH_test/c1/LICENSE HTTP/1.1" 201 181 0.055915 (txn: tx20d4a7a566f940d4849cb-005a0428ea)
Nov 9 10:07:38 vm-15 container-server: STDERR: (23519) accepted ('127.0.0.1', 43894)
Nov 9 10:07:38 vm-15 container-server: STDERR: 127.0.0.1 - - [09/Nov/2017 10:07:38] "PUT /sdb8/509/AUTH_test/c1/LICENSE HTTP/1.1" 201 120 0.001048 (txn: tx20d4a7a566f940d4849cb-005a0428ea)

```

Changed in swift:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Alistair Coles (alistair-coles) wrote :

The warning log was introduced in [1] (back in 2013) and reading the commentary on that review it seems to have been intended to warn about transient version mismatch when proxys had not been upgraded to same version as object servers i.e. in [1] object servers started to expect X-Delete-At-Container to be sent by proxy rather than calculate the container name in the object server, but if an older proxy server does not send the header then fall back to calculating the container name locally and log the warning.

It's now unlikely that the proxy vs object server version mismatch exists, but the warning manifests as reported above when number of backend object requests is > number container replicas, which is likely to occur with many EC policies, so it is just an annoying warning that has no significance.

Possible solutions:
1. drop the warning altogether
2. make the warning conditional on the other delete-at headers such as X-Delete-At-Partition being present (which would be the case if the proxy was older). AFAICT the object server only expects X-Delete-At-Container when X-Delete-At-Partition and X-Delete-At-Host are also present.

I prefer solution #2

[1] https://review.openstack.org/#/c/31584/

Revision history for this message
Alistair Coles (alistair-coles) wrote :

What I missed on my earlier reading of the object server code is that when there is no X-Delete-At-Partition, an async update is still written for the expiring objects account...so these warnings are not completely insignificant because they indicate those async updates will written [1] be using a delete_at_container calculated using the object server's expiring_objects_container_divisor value which is precisely what the original bug fix [2] was trying to avoid.

So a better solution might be to ensure all backend object requests actually have the x-container-delete-at-* headers, which will result in some duplicated updates, but that duplication is already happening async, and some duplication is necessary to avoid a bug similar to [3][4].

[1] https://gist.github.com/alistairncoles/19cab1495654c9c0e02f7222c5d1ce3e
[2] https://review.openstack.org/#/c/31584/
[3] https://bugs.launchpad.net/swift/+bug/1460920
[4] https://github.com/openstack/swift/commit/3f943cfcf2de26e51f1ace96f2c28a36ab105887

Changed in swift:
assignee: nobody → Christopher Bartz (bartz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/524548

Changed in swift:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/524548
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=d8f9045518035cbd1a40d0a94227952a384143ec
Submitter: Zuul
Branch: master

commit d8f9045518035cbd1a40d0a94227952a384143ec
Author: Christopher Bartz <email address hidden>
Date: Fri Dec 1 11:13:10 2017 +0100

    Send correct number of X-Delete-At-* headers

    Send just as many requests with X-Delete-At-* as we do X-Container-* to
    the object server. Furthermore, stop the object server on making an
    update to the expirer queue when it wasn't told to do so and remove the
    log warning which would have been produced.

    Reason:

    It can be the case that the number of object replicas (OR) is larger
    than the number of container replicas (CR) for a given storage policy
    (most likely in case of EC). Before this commit, only CR object servers
    received the x-delete-at-* headers, which means that OR - CR object
    servers did not receive the headers. The servers missing the header
    would produce a log warning and create the x-delete-at-container header
    and async update on their own, which could lead to a bug, if the
    expiring_objects_container_divisor option was misconfigured.

    Change-Id: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
    Closes-Bug: #1733588

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/535623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (33.3 KiB)

Reviewed: https://review.openstack.org/535623
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=271b80d0f51078719de35bf6f75b7e06ac3e5b91
Submitter: Zuul
Branch: feature/s3api

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit d924fa759967b7cdca0d91f21112725f6099a254
Author: Samuel Merritt <email address hidden>
Date: Tue Jan 16 22:19:09 2018 -0800

    Remove old post-as-copy leftovers from tests.

    Since commit 1e79f828, we don't need to test with post_as_copy=True
    any more since we haven't got post_as_copy at all.

    Change-Id: I9c96ce0b812d877bbe11bdb50eb160d6ffa5933d

commit dfa0c4e604fb931d232395599bd0e7b0f11441ee
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 12:04:45 2018 +0000

    Preserve expiring object behaviour with old proxy-server

    The related change [1] causes expiring object records to no longer be
    created if the X-Delete-At-Container header is not sent to the object
    server, but old proxies prior to [2] (i.e. releases prior to 1.9.0)
    did not send this header.

    The goal of [1] can be alternatively achieved by making expiring
    object record creation be conditional on the X-Delete-At-Host header.

    [1] Related-Change: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
    [2] Related-Change: Id0873a3f2198ce285fe0b0c777738eff38bc2438

    Change-Id: Ia0081693f01631d3f2a59612308683e939ced76a

commit d707fc7b6d0ceb4556dddfc258c5de8c4baff05c
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 16:30:13 2018 -0800

    DRY out tests until the stone bleeds

    Can we go deeper!?

    Change-Id: Ibd3b06542aa1bfcbcb71cc98e6bb21a6a67c12f4

commit ba8f1b1c3786df4e79fc3f9e4747d7cfb9072b6f
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 15:25:33 2018 +0000

    Fix intermittent unit test failure

    test_check_delete_headers_removes_delete_after was
    failing intermittently due to rounding of float time
    values.

    Change-Id: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b
    Closes-Bug: #1743804

commit e747f94313f315fdf8d8fc01fb0c5aac60c33897
Author: Kota Tsuyuzaki <email address hidden>
Date: Wed Dec 27 14:37:29 2017 +0900

    Fix InternalClient to drain response body if the request fails

    If we don't drain the body, the proxy logging in the internal client
    pipeline will log 499 client disconnect instead of actual error response
    code.

    For error responses, we try to do the most helpful thing using swob's
    closing and caching response body attribute. For non-error responses
    which are returned to the client, we endeavour to keep the app_iter
    intact and unconsumed, trusting expecting the caller to do the right
    thing is the only reasonable interface. We must cleanly close any WSGI
    app_iter which we do not return to the client rega...

tags: added: in-feature-s3api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/535990

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (32.4 KiB)

Reviewed: https://review.openstack.org/535990
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3122895118111b2b11f5ef9d0b3410b337624b1b
Submitter: Zuul
Branch: feature/deep

commit ddb13aa5eab03b6993887eb02260b4bc0b256922
Author: vxlinux <email address hidden>
Date: Sat Jan 20 17:23:35 2018 +0800

    Remove redundant blank space in README.rst

    Change-Id: If347476e3b9185921ff174d3f8170a1c4d0622e8

commit 12f874534925b52f9d1c91580794eb9e5e9a4589
Author: vxlinux <email address hidden>
Date: Fri Jan 19 16:54:26 2018 +0800

    Add Docstrings to validate_replicas_by_tier

    New common functions should have Docstrings

    Change-Id: Icbb3cdf38509fd6d034cbb2271786559780a7b68

commit d2034cd7b6946829a7d95c4d2c71d4322f80e855
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 17:03:38 2018 -0800

    Keep object-updater stats logging consistent

    If we're going to encapsulate the stats tracking it seems reasonable if
    we ever add any more metrics we can reduce the number of places we need
    to update log messages.

    Change-Id: I187cf6cfec1e0a9138b709fa298e1991aa809ec4

commit cd2c73fd955317a3f40758cef45ee48bef8fbc79
Author: Tim Burke <email address hidden>
Date: Tue Jan 16 01:07:35 2018 +0000

    internal_client: Don't retry when we expect the same reponse

    This boils down to 404, 412, or 416; or 409 when we provided an
    X-Timestamp.

    This means, among other things, that the expirer won't issue 3 DELETEs
    every cycle for every stale work item.

    Related-Change: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
    Change-Id: Ie5f2d3824e040bbc76d511a54d1316c4c2503732

commit 222df9185782f59ffdc96c3534afaa2fb1361235
Author: chengebj5238 <email address hidden>
Date: Thu Jan 18 17:03:11 2018 +0800

    Modify redirection URL and broken URL

    Change-Id: I9a04cb2fbe61e1fbd8185ab2fac9abbcea4d55cc

commit d1656e334959e09d13eea98c2696e58c77e4ab91
Author: Tim Burke <email address hidden>
Date: Fri Jan 12 13:17:45 2018 -0800

    slo: Send ETag header in 206 responses

    Why weren't we doing that before?? The etag should be the same as for
    GET/HEAD, and by sending it, we can assure resuming clients that they're
    downlading the same object even if they didn't include an If-Match
    header.

    Change-Id: I4ccbd1ae3a909ecb4606ef18211d1b868f5cad86
    Related-Change: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit f64c00b00aa8df31a937448917421891904abdc8
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 12 07:17:18 2018 -0800

    Improve object-updater's stats logging

    The object updater has five different stats, but its logging only told
    you two of them (successes and failures), and it only told you after
    finishing all the async_pendings for a device. If y...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.17.0

This issue was fixed in the openstack/swift 2.17.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.