e2fsck forces reboot when clearing LARGE_FILE flag confusing user

Bug #83982 reported by rvjcallanan
4
Affects Status Importance Assigned to Milestone
e2fsprogs (Ubuntu)
Fix Released
Wishlist
Kyle McMartin
linux-source-2.6.15 (Ubuntu)
Invalid
Undecided
Unassigned
linux-source-2.6.17 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This is a VERY SERIOUS problem somewhere in the EXT3 codebase. I find it hard to believe that it has not yet been discovered. I have not had a chance to test it on other distros.

Steps to recreate problem:

Using a reliable system e.g. a Dell PowerEdge 400SC with 512MB RAM/120GB HDD.
Install Ubuntu Server Edition, 6.06 LTS Standard LAMP installation.
All installation options are defaults.
Use apt-get update and apt-get upgrade to get latest releases.

Login and carry out following steps:

1. Create a large file using an arbitrary utility such as DD, CP or TAR
   e.g. # DD if=/dev/zero of=0bits bs=10M count=220
   This example generates a 2.3GB file containing zeros

2. Delete the file immediately
   e.g. # rm 0bits

3. Schedule an fsck on next reboot
   e.g. # touch /forcefsck

4. Reboot immediately
   e.g. # reboot

5. Observe boot process

You will notice that fsck fails on reboot, forces a second reboot and fixes itself.
The file size threshold above which this problem occurs appears to be approx 2.2GB

This problem does NOT occur in a REISER installation (assuming that the Reiser file system checker is sufficiently thorough). I tested REISER with file sizes up to 50GB and it worked reliably. I might point out that, in the case of Reiser, issuing a "touch /forcefsck" does not force a full check (this is a distro bug). It was therefore necessary to force an fsck by booting up into recovery mode, unmounting hda1 and issuing an "fsck -f" manually.

The probem is NOT hard drive or partition size dependent as I have replicated it with smaller partitions and other hard drive models.

Revision history for this message
Kyle McMartin (kyle) wrote :

You didn't include the error messages fsck returned. Please include that.

Revision history for this message
Kyle McMartin (kyle) wrote :

Affects dapper kernel, and assigned to me.

Revision history for this message
rvjcallanan (vincent-callanan) wrote : RE: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

Thanks for quick reply Kyle,

fsck just shows a RED fail during boot check and restarts within 5 seconds, issuing a second fsck proving that it has fixed the problem. I couldn't find a log. Is there not supposed to be an fsck.log somewhere?

Anyway, I just replicated the problem and, instead of issuing a touch /forcefsck, I did a manual fsck -f in recovery mode with read-only mount (I don't know how safe this is or how close this matches a clean fsck in a pure unmount situation).

Here's what I got...

Pass 1: ...
Pass 2: ...
Pass 3: ...
Pass 4: ...
Pass 5: Checking group summary information

/dev/hda1: ***** FILE SYSTEM WAS MODIFIED ******
/dev/hda1: ***** REBOOT LINUX *****
/dev/hda1: 24129/610432 (1.0% non-contiguous), 181696/1220932

I have manually typed this from server screen as accurately as possible.

As I said, this happens with an EXT3 partition but not a REISER (I didn't check EXT2).

Rgds,

Vincent

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Kyle McMartin
Sent: 08 February 2007 15:39
To: <email address hidden>
Subject: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

You didn't include the error messages fsck returned. Please include
that.

--
Deleting large files corrupts EXT3 file system
https://launchpad.net/bugs/83982

Revision history for this message
Ben Collins (ben-collins) wrote : Re: Deleting large files corrupts EXT3 file system

I'm able to reproduce this even on 2.6.20, and as normal user.

No idea why fsck is showing an error return, it prints nothing wrong with the filesystem itself. This may just be a bug in libext3/fsck.ext3.

Will investigate further.

Changed in linux-source-2.6.15:
importance: Undecided → High
status: Needs Info → Confirmed
Revision history for this message
Ben Collins (ben-collins) wrote :

Even more interesting, I can't reproduce this except on the rootfs, and even then only when rebooting.

Doing this from busybox for the rootfs (mount, dd, rm, umount, fsck) or with a separate filesystem, doesn't work.

Revision history for this message
rvjcallanan (vincent-callanan) wrote : RE: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

Also affects current stable Debian release.
It can in fact be reproduced without a reboot
try booting into recovery mode
dd ...
rm ..
umount /dev/hda1
e2fsck -fv /dev/hda1

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Ben Collins
Sent: 09 February 2007 02:48
To: <email address hidden>
Subject: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

Even more interesting, I can't reproduce this except on the rootfs, and
even then only when rebooting.

Doing this from busybox for the rootfs (mount, dd, rm, umount, fsck) or
with a separate filesystem, doesn't work.

--
Deleting large files corrupts EXT3 file system
https://launchpad.net/bugs/83982

Revision history for this message
Kyle McMartin (kyle) wrote : Re: Deleting large files corrupts EXT3 file system

I'm inclined to agree with you Ben, this does seem like a bug in the userspace tools. Especially since it doesn't appear that anything changes.

Vincent, can you clarify that to replicate, there must be a fsck on the next reboot? If you don't force a fsck, does the filesystem remount alright? Does the problem occur if you force a fsck after that reboot?

I'll see if I can ask some knowledgable EXT people to shed a bit of light on this for us, because I'm quite stumped.

Revision history for this message
Kyle McMartin (kyle) wrote :

Ben, the only time you'll see the "REBOOT LINUX" message is when the filesystem is already mounted as /, since init is running there and, well, remounting properly would be tricky I imagine. This is mostly speculation, but makes sense from the bits of code I just read.

Revision history for this message
rvjcallanan (vincent-callanan) wrote : RE: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

This is my first bug report, I am replying to these mails on the assumption that they actually get to you.
Please kindly confirm that you have read this message (I sent previous replies but they appear to have fallen on deaf ears)

IMPORTANT
You don't actually have to do a reboot to see the problem. The reboot is just to facilitate an fsck.
If you start up in GRUB recovery mode and issue the dd..rm combination (to generate the problem) and then IMMEDIATELY unmount and then issue an fsck -f, you will get the same result (i.e.

Pass 1: ...
Pass 2: ...
Pass 3: ...
Pass 4: ...
Pass 5: Checking group summary information

/dev/hda1: ***** FILE SYSTEM WAS MODIFIED ******
/dev/hda1: ***** REBOOT LINUX *****
/dev/hda1: 24129/610432 (1.0% non-contiguous), 181696/1220932

It is obvious that the problem is NOT in the fsck (i.e. e2fsk) since it does actually fix something (if it didn't, it would keep failing fsck). The question is "What dos it fix??".

I have confirmed this in latest Debian. A quick test on a RED HAT installation would tell us if it is a debian/derivative distro problem or not.

If RED HAT also has bug, then I think you need to get the EXT3 people in the loop ASAP.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Kyle McMartin
Sent: 10 February 2007 14:58
To: <email address hidden>
Subject: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

I'm inclined to agree with you Ben, this does seem like a bug in the
userspace tools. Especially since it doesn't appear that anything
changes.

Vincent, can you clarify that to replicate, there must be a fsck on the
next reboot? If you don't force a fsck, does the filesystem remount
alright? Does the problem occur if you force a fsck after that reboot?

I'll see if I can ask some knowledgable EXT people to shed a bit of
light on this for us, because I'm quite stumped.

--
Deleting large files corrupts EXT3 file system
https://launchpad.net/bugs/83982

Revision history for this message
Theodore Ts'o (tytso) wrote : Re: Deleting large files corrupts EXT3 file system

What's happening is that if e2fsck does a complete scan of the filesystem, which in your case your are forcing by touching /forcefsck, and e2fsck notices that there are no more files > 2GB, and the large_file feature flag is set, e2fsck will clear the large_file feature flag. Currently it doesn't print a message when it does this, which is causing the confusion.

Since it has modified the filesystem, it will return an exit status of 1 (filesystem modified) or 3 (filesystem modified | system should be rebooted) if the filesystem in question is the root filesystem. This is because if parts of the root filesystem were modified, it's not safe to remount the filesystem read-write, since some of the inconsistent parts of the filesystem could have been cached in memory, and so when the filesystem is re-mounted read/write, the broken bits of the filesystem could get written back to disk, undoing e2fsck's work.

So it's not that deleting large file corrupts the EXT3 filesystem; it's that e2fsck noticed that the filesystem no longer had any large files, and did you a favor by clearing an RO compat bit which would allow the filesystem to be mounted read/write on really old Linux kernels (which didn't support large files). It probably should have written a message to explain what it was doing, and I could perhaps accept an argument that perhaps there should be a config parameter to suppress clearing the large_file flag, on the argument that very few filesystems are likely to be mounted on a Linux 1.0 or 1.2 systems these days. On the other hand, this happens rarely enough and the time to force reboot isn't that great, so maybe it's not worth the config parameter. In any case that's what's going on.

Revision history for this message
rvjcallanan (vincent-callanan) wrote : RE: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

Thank you for quick diagnosis. Call me a nit-picker, but I would argue that...

No fundamental operation (such as a large file delete) should invalidate the file system "state" (even if this particular state flag is legacy-related and not that critical). Surely, when a large file is deleted, the EXT3 delete primitive should check if this is the last large file and clear the flag accordingly...or more likely, rather than using a flag, it would maintain a large file count which is incremented when a large file is created and decremented when a large file is deleted, etc, etc.

In other words, it is a bug.

You need to be able to depend on fsck to give you correct information e.g. you might assume an fsck error was due to a large file removal (based on past form) but in fact you may have another problem i.e. "boy who cried wolf" syndrome.

I think this bug should at least be notified to the EXT3 project.

Rgds,

Vincent Callanan

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Theodore Ts'o
Sent: 11 February 2007 01:19
To: <email address hidden>
Subject: [Bug 83982] Re: Deleting large files corrupts EXT3 file system

What's happening is that if e2fsck does a complete scan of the
filesystem, which in your case your are forcing by touching /forcefsck,
and e2fsck notices that there are no more files > 2GB, and the
large_file feature flag is set, e2fsck will clear the large_file feature
flag. Currently it doesn't print a message when it does this, which is
causing the confusion.

Since it has modified the filesystem, it will return an exit status of 1
(filesystem modified) or 3 (filesystem modified | system should be
rebooted) if the filesystem in question is the root filesystem. This is
because if parts of the root filesystem were modified, it's not safe to
remount the filesystem read-write, since some of the inconsistent parts
of the filesystem could have been cached in memory, and so when the
filesystem is re-mounted read/write, the broken bits of the filesystem
could get written back to disk, undoing e2fsck's work.

So it's not that deleting large file corrupts the EXT3 filesystem; it's
that e2fsck noticed that the filesystem no longer had any large files,
and did you a favor by clearing an RO compat bit which would allow the
filesystem to be mounted read/write on really old Linux kernels (which
didn't support large files). It probably should have written a message
to explain what it was doing, and I could perhaps accept an argument
that perhaps there should be a config parameter to suppress clearing the
large_file flag, on the argument that very few filesystems are likely to
be mounted on a Linux 1.0 or 1.2 systems these days. On the other hand,
this happens rarely enough and the time to force reboot isn't that
great, so maybe it's not worth the config parameter. In any case that's
what's going on.

--
Deleting large files corrupts EXT3 file system
https://launchpad.net/bugs/83982

Revision history for this message
Kyle McMartin (kyle) wrote :

Arguably we could make the default behaviour not to clear the flag, and add a command-line argument that would enable extreme legacy compatibility. Patch pending.

Changed in e2fsprogs:
assignee: nobody → kyle
importance: Undecided → Wishlist
status: Unconfirmed → Confirmed
Changed in linux-source-2.6.15:
assignee: kyle → nobody
importance: High → Undecided
status: Confirmed → Rejected
Changed in linux-source-2.6.17:
status: Unconfirmed → Rejected
Revision history for this message
Theodore Ts'o (tytso) wrote :

E2fsprogs will no longer clear the LARGE_FILES feature starting with e2fsprogs 1.40.7

Revision history for this message
Theodore Ts'o (tytso) wrote :

The fix for this problem was released in e2fsprogs 1.40.7, and Hardy is shipping e2fsprogs 1.40.8. Hence this bug is fixed in the latest shipping version of Ubuntu.

Changed in e2fsprogs:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.