kernel lockups unmounting ext4 lvm2 snapshots

Bug #536994 reported by Kees Cook
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Invalid
Unknown
linux (Ubuntu)
Triaged
High
Unassigned
Lucid
Triaged
High
Unassigned

Bug Description

I have experienced IO lock-ups when unmounting lvm2 snapshots partitions containing ext4. All IO, sound, etc will hang, though alt-sysrq remains responsive (I can do alt-sysrq-b and it reboots, e.g.).

This problem is intermittent, unfortunately, but I see it most often when I have performing a package build via sbuild in an LVM based snapshot schroot.

Steps to reproduce (probably not the minimal test case...):
Terminal 1:
- create an MD device across two physical drives
- sudo -s
- vgcreate testvg /dev/md0
- lvcreate -L10G -n testlv testvg
- mkfs.ext4 /dev/testvg/testlv
- mkdir /mnt/test
- mount /dev/testvg/testlv /mnt/test
- apt-get install ubuntu-dev-tools
- while :; do cp -a /usr /mnt/test; sleep 1; rm -rf /mnt/test/usr; done

Then in Terminal 2:
- mk-sbuild --vg=VG karmic
- cd /tmp
- dget https://launchpad.net/ubuntu/+archive/primary/+files/dpkg_1.15.4ubuntu2.dsc
- while :; do sbuild -d karmic dpkg*.dsc"; done &
- while :; do sbuild -d karmic dpkg*.dsc"; done

ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: kees 3851 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xe0420000 irq 22'
   Mixer name : 'Realtek ALC268'
   Components : 'HDA:10ec0268,80860000,00100003'
   Controls : 17
   Simple ctrls : 11
Date: Wed Mar 10 15:28:56 2010
DistroRelease: Ubuntu 10.04
HibernationDevice: RESUME=/dev/md1
Package: linux-image-2.6.32-16-generic 2.6.32-16.24
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-16-generic root=/dev/mapper/systemvg-root2lv ro quiet splash
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-16.24-generic
Regression: Yes
RelatedPackageVersions: linux-firmware 1.32
Reproducible: Yes
RfKill:

SourcePackage: linux
TestedUpstream: No
Uname: Linux 2.6.32-16-generic x86_64
WpaSupplicantLog:

dmi.bios.date: 09/22/2008
dmi.bios.vendor: Intel Corp.
dmi.bios.version: JOQ3510J.86A.0954.2008.0922.2331
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: DQ35JO
dmi.board.vendor: Intel Corporation
dmi.board.version: AAD82085-800
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCorp.:bvrJOQ3510J.86A.0954.2008.0922.2331:bd09/22/2008:svn:pn:pvr:rvnIntelCorporation:rnDQ35JO:rvrAAD82085-800:cvn:ct3:cvr:

Revision history for this message
Kees Cook (kees) wrote :
Revision history for this message
Kees Cook (kees) wrote :

Sometimes I'll survive a lock up if I just wait long enough. I see my load hit 1 solid for a long time, and I see in ps output:

$ ps auwwx | grep " D "
root 7143 0.0 0.0 0 0 ? D 15:51 0:00 [kdmflush]
root 7146 0.1 0.0 0 0 ? D 15:51 0:00 [kcopyd]
root 7239 0.0 0.0 0 0 ? D 15:51 0:00 [jbd2/dm-33-8]
root 7518 0.0 0.0 0 0 ? D 15:52 0:00 [flush-252:33]

Revision history for this message
Kees Cook (kees) wrote :

Oops, missed a line:

root 8547 0.0 0.0 8208 712 pts/3 D+ 15:52 0:00 umount /var/lib/schroot/mount/hardy-abfa271e-3dbb-4462-90b5-635912bd55c1

Revision history for this message
Kees Cook (kees) wrote :

Here's the view of latencytop when the umount actually finishes (when I'm lucky).

Changed in linux (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-2
Kees Cook (kees)
description: updated
Revision history for this message
Chase Douglas (chasedouglas) wrote :

@Kees:

I found an upstream bug that seems related to this. It's currently marked closed because it was unreproducible. If you have a scenario that you can reproduce, I suggest reopening the bug with more information.

Changed in linux (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → Low
Changed in linux:
status: Unknown → Invalid
Revision history for this message
Surbhi Palande (csurbhi) wrote :

@Kees, Can you please do the following:

1) echo 1 >/proc/sys/kernel/lock_stat
fire the unmount command.
2) less /proc/lock_stat
If you could attach the output of the previous command then we could check if there is some lock contention occurring.

You could then disable lock_stat as follows:
3) echo 0 >/proc/sys/kernel/lock_stat
4) echo 0 > /proc/lock_stat

Thanks!

Surbhi Palande (csurbhi)
Changed in linux (Ubuntu Lucid):
importance: Low → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.