Data corruption during parallel file copying with interruptions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
eCryptfs |
Fix Released
|
High
|
Unassigned |
Bug Description
When performing a lot of I/O operations in an ecryptfs file system and interrupting those operations, the involved files can get corrupted, even if they are only supposed to be *read* (not written) during the operation.
During a parallel build ("make -j8" or "make -j4") of a big C++ project making use of precompiled headers, I realized that by interrupting the build (CTRL+C) I was able to corrupt the PCH files pretty reliably. Since both GCC and GNU Make have safety mechanisms in place to deal with interruptions (they both independently delete partially written output files when interrupted), I investigated the issue and got suspicious about ecryptfs and tried to reproduce the problem in a simple environment.
Find attached an example and 2 test runs of a Makefile that creates a big file and copies it several times. When executed in parallel ("make -j*") and interrupted at the right moments, files can get corrupted, as verified by the "md5sum" command. Note that the corrupted files are not output files that have been written only partially (GNU Make deletes them reliably), but instead *input* files of copy operations. They weren't supposed to be opened for writing at all (just for reading).
I'm not entirely sure, if ecryptfs is responsible, but none of the effects presented here could be reproduced without ecryptfs.
The parallel execution is important to increase the probability of a corruption. It also seems to be important to have slow I/O (HDD instead of SSD; small or deactivated caches) for the corruptions to occur. Even so, several test runs might be necessary to reproduce the problem.
Attachment: First run