OpenStack Object Storage (swift)

Bug #1655608
Comment #4

Comment 4 for bug 1655608

Revision history for this message

clayg (clay-gerrard) wrote on 2017-02-27:

!? although (2) *would* fix a lot of problems - no one has a time machine?

I think my sarcasm was lost.

To workaround/avoid this issue today - you have to (1) operationally avoid re-intruducing hardware that's been out a reclaim age. Unfortunately you also are probably stuck with this issue if you've been following along since early versions of EC (2).

Because we try to have some safe-guards in place to avoid dark data; I was able to pull off a manual cleanup feeding some parsed logs into a bulk delete - and consider the issue mostly solved.

At the design summit in ATL (2/2017) it was pointed out the I/O consumption pattern of re-introducing dark data in Replicated is very different from EC. The noisy messages in logs and continually trying to rebuild data that can't be rebuilt can be more annoying than just having wasted bytes on disk depending on your use-case.

The best idea to date is to aggressively search [1] for potential fragments on un-reconstructable fragments of data older than a reclaim age and quarantine the fragments if there's too many 404's.

1. very aggressively, easily request_node_counts in the 100's