XFS corruption can create zero-byte partition files instead of dirs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Darrell Bishop |
Bug Description
In the past, I had to run
# xfs_repair /dev/sdc
Later, after dropping that device's weight in the object ring to 0, I noticed the object-replicator was poisoned by a zero-byte file where a directory should have been:
# ll /srv/node/
...
-rw-r--r-- 1 swift swift 0 2012-08-17 11:26 188978
This causes the object-replicator to fail trying to handle this "partition" with the following traceback. Note that this is "benign" in the sense that all data (that didn't otherwise get screwed up when my XFS filesystem got a little mucked up) did get replicated off the drive. However, this still results in log spew and continuous incrementing of the object-
Sep 4 10:57:39 swift-test-01 object-replicator Error syncing handoff partition: #012Traceback (most recent call last):#012 File "/usr/lib/
This means there are some portions of Swift which are not robust to zero-byte files being where they normally shouldn't. That node's regular and zero-byte-file object-auditor processes are not reporting any errors (nor are they fixing this zero-byte-
I think collect_jobs() should verify that the partition paths it puts into jobs are directories and not zero-byte files. I think if collect_jobs notices a zero-byte file where a partition directory should be, it should log this (WARNING level?), remove the zero-byte file, and then move on, not creating a job for that partition path.
Changed in swift: | |
milestone: | none → 1.7.5 |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. openstack. org/12378
Review: https:/