[public] armadaxp kernel slow to umount

Bug #939240 reported by mahmoh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-armadaxp (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Build Version/Date: Linux armadaxp2 3.0.0-1500-armadaxp #3-Ubuntu SMP Wed Feb 22 18:12:33 UTC 2012 armv7l armv7l armv7l GNU/Linux, DISTRIB_ID=Ubuntu, DISTRIB_RELEASE=12.04, DISTRIB_CODENAME=precise, DISTRIB_DESCRIPTION="Ubuntu precise (development branch)"
Environment used for testing: armadaxp2

Summary: armadaxp kernel slow to unmount

Steps to Reproduce: mount /ext2fs, perform operations within the fs (ls -al /extfs, etc.), umount /ext2fs

Expected result: umount should be quick (unloaded system, no open file handles, etc.)

Actual result: umount takes 10s or longer (60s during shutdown)

***

/dev/sda2 on /boot type ext2 (rw)

/dev/sda2 4805760 89464 4472176 2% /boot

strace -t umount /boot:
...
04:21:51 rt_sigprocmask(SIG_BLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0
04:21:51 umount("/boot", 0) = 0
04:22:01 getuid32() = 0
04:22:01 geteuid32() = 0
...

***

...
[ 73.583710] init: plymouth-upstart-bridge main process (1602) terminated with status 1
 * Stopping web server apache2 apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName
 ... waiting . [ OK ]
Checking for running unattended-upgrades:
 * Asking all remaining processes to terminate... [ OK ]
 * All processes ended within 1 seconds.... [ OK ]
 * Deconfiguring network interfaces... [ 77.828405] eth0: stopped
                                                                         [ OK ]
 * Deactivating swap... [ OK ]
 * Unmounting local filesystems... [ OK ]
[ 140.182382] EXT4-fs (sda3): re-mounted. Opts: (null)
...

mahmoh (mahmoh)
description: updated
Revision history for this message
mahmoh (mahmoh) wrote :

Update: I thought initially this was related to ext2fs fs only but it appears that one system with ext3 & ext4 only appears to have the same problem:

/dev/sda3 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
cgroup on /sys/fs/cgroup type tmpfs (rw,uid=0,gid=0,mode=0755)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/sda2 on /boot type ext3 (rw)

u@armadaxp1:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 72093008 11032700 57398180 17% /
udev 1512272 12 1512260 1% /dev
tmpfs 607148 204 606944 1% /run
cgroup 1517864 0 1517864 0% /sys/fs/cgroup
none 5120 0 5120 0% /run/lock
none 1517864 0 1517864 0% /run/shm
/dev/sda2 4805760 230144 4331496 6% /boot

ls -al /boot ; ls -la /boot ; strace -t umount /boot
...
05:14:37 getegid32() = 0
05:14:37 prctl(PR_GET_DUMPABLE) = 1
05:14:37 lstat64("/etc/mtab", {st_mode=S_IFREG|0644, st_size=550, ...}) = 0
05:14:37 open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 3
05:14:37 fstat64(3, {st_mode=S_IFREG|0644, st_size=550, ...}) = 0
05:14:37 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6c87000
05:14:37 read(3, "/dev/sda3 / ext4 rw,errors=remou"..., 4096) = 550
05:14:37 read(3, "", 4096) = 0
05:14:37 close(3) = 0
05:14:37 munmap(0xb6c87000, 4096) = 0
05:14:37 stat64("/sbin/umount.ext3", 0xbe9d44c0) = -1 ENOENT (No such file or directory)
05:14:37 rt_sigprocmask(SIG_BLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0
05:14:37 umount("/boot", 0) = 0
05:14:41 getuid32() = 0
05:14:41 geteuid32() = 0
05:14:41 getgid32() = 0
05:14:41 getegid32() = 0
05:14:41 prctl(PR_GET_DUMPABLE) = 1
05:14:41 lstat64("/etc/mtab", {st_mode=S_IFREG|0644, st_size=550, ...}) = 0
05:14:41 open("/etc/mtab", O_RDWR|O_CREAT|O_LARGEFILE, 0644) = 3
05:14:41 close(3) = 0
...

Revision history for this message
mahmoh (mahmoh) wrote :

(goes with above)

armadaxp1: ls -al /boot ; ls -la /boot ; reboot
...
[ 587.707576] init: Handling deconfiguring-networking event
 * Deconfiguring network interfaces... [ 587.777705] init: Connection from private client
[ 587.790240] init: Handling net-device-down event
[ 588.107208] eth0: stopped
                                                                         [ OK ]
 * Deactivating swap... [ OK ]
[ 920.136713] EXT4-fs (sda3): re-mounted. Opts: (null)
 * Will now restart
[ 920.287518] md: stopping all md devices.
[ 921.291469] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 921.313803] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 921.319243] Shutting Down Marvell Ethernet Driver
[ 921.323987] Restarting system.
[ 921.339168] Reseting...

summary: - armadaxp kernel slow to unmount ext2
+ armadaxp kernel slow to unmount
Revision history for this message
mahmoh (mahmoh) wrote : Re: armadaxp kernel slow to unmount
Download full text (19.3 KiB)

Another shutdown hang with full strace below (from system above)- note that I believe the root fs was also slow to dismount after this partition (/boot):

root@armadaxp1:/home/u# reboot

Broadcast message from u@armadaxp1
        (/dev/ttyS0) at 5:40 ...

The system is going down for reboot NOW!
root@armadaxp1:/home/u# No apache MPM package installed
Checking for running unattended-upgrades:
 * Asking all remaining processes to terminate... [ OK ]
 * All processes ended within 1 seconds.... [ OK ]
 * Deconfiguring network interfaces... [ OK ]
 * Deactivating swap... [ OK ]
 * Unmounting local filesystems... 05:40:51 execve("/sbin/fstab-decode", ["fstab-decode", "umount", "-f", "-r", "-v", "-d", "/boot"], [/* 11 vars */]) = 0
05:40:51 brk(0) = 0x10c7000
05:40:51 uname({sys="Linux", node="armadaxp1", ...}) = 0
05:40:51 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
05:40:51 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6fc3000
05:40:51 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
05:40:51 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
05:40:51 fstat64(3, {st_mode=S_IFREG|0644, st_size=24184, ...}) = 0
05:40:51 mmap2(NULL, 24184, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb6fa1000
05:40:51 close(3) = 0
05:40:51 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
05:40:51 open("/lib/arm-linux-gnueabihf/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
05:40:51 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0Mp\1\0004\0\0\0"..., 512) = 512
05:40:51 lseek(3, 876468, SEEK_SET) = 876468
05:40:51 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1360) = 1360
05:40:51 lseek(3, 876028, SEEK_SET) = 876028
05:40:51 read(3, "A4\0\0\0aeabi\0\1*\0\0\0\0057-A\0\6\n\7A\10\1\t\2\n\4\22"..., 53) = 53
05:40:51 fstat64(3, {st_mode=S_IFREG|0755, st_size=877828, ...}) = 0
05:40:51 mmap2(NULL, 918904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6ec0000
05:40:51 mprotect(0xb6f93000, 32768, PROT_NONE) = 0
05:40:51 mmap2(0xb6f9b000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd3) = 0xb6f9b000
05:40:51 mmap2(0xb6f9e000, 9592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6f9e000
05:40:51 close(3) = 0
05:40:51 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6fc2000
05:40:51 set_tls(0xb6fc24c0, 0xb6fc2b98, 0xb6fc6048, 0xb6fc24c0, 0xb6fc4568) = 0
05:40:51 mprotect(0xb6f9b000, 8192, PROT_READ) = 0
05:40:51 mprotect(0x10000, 4096, PROT_READ) = 0
05:40:51 mprotect(0xb6fc5000, 4096, PROT_READ) = 0
05:40:51 munmap(0xb6fa1000, 24184) = 0
05:40:51 execve("/sbin/umount", ["umount", "-f", "-r", "-v", "-d", "/boot"], [/* 11 vars */]) = -1 ENOENT (No such file or directory)
05:40:51 execve("/usr/sbin/umount", ["umount", "-f", "-r", "-v", "-d", "/boot"], [/* 11 vars */]) = -1 ENOENT (No such file or di...

mahmoh (mahmoh)
no longer affects: eilt
visibility: private → public
mahmoh (mahmoh)
Changed in eilt:
assignee: nobody → Bryan Wu (cooloney)
mahmoh (mahmoh)
summary: - armadaxp kernel slow to unmount
+ [public] armadaxp kernel slow to umount
Revision history for this message
dann frazier (dannf) wrote :

To successfully reproduce this, mahmoh and I discovered that you seem to have to run it from a root shell - running it under sudo produces reasonable (i.e. immediate) umount times. I therefore suspected some obscure capability thing, but I don't see any such differences in the kernel code. Further, I've found that this only seems to be a problem when there's one runnable process on the system. If you start another program, the umount will immediately exit.

Profiling turned up no cpu obvious cpu intensive kernel functions. I took a few samples of a hung umount using sysrq+t, and found that we seem to always be stuck w/ the following backtrace:

[ 5093.862717] umount D c0557538 0 11174 2118 0x00000000
[ 5093.862729] Backtrace:
[ 5093.862741] [<c0557214>] (__schedule+0x0/0x5bc) from [<c0557aec>] (schedule+0x50/0x68)
[ 5093.862756] [<c0557a9c>] (schedule+0x0/0x68) from [<c0558070>] (schedule_timeout+0x1d4/0x254)
[ 5093.862770] [<c0557e9c>] (schedule_timeout+0x0/0x254) from [<c0557938>] (wait_for_common+0xc4/0x17c)
[ 5093.862785] [<c0557874>] (wait_for_common+0x0/0x17c) from [<c0557a98>] (wait_for_completion+0x18/0x1c)
[ 5093.862802] [<c0557a80>] (wait_for_completion+0x0/0x1c) from [<c00e8140>] (_rcu_barrier.isra.31+0x98/0xb8)
[ 5093.862817] [<c00e80a8>] (_rcu_barrier.isra.31+0x0/0xb8) from [<c00e8178>] (rcu_barrier_sched+0x18/0x1c)
[ 5093.862827] r6:c06eb748 r5:c06fb8c4 r4:ed68c000 r3:c01397b4
[ 5093.862847] [<c00e8160>] (rcu_barrier_sched+0x0/0x1c) from [<c00e818c>] (rcu_barrier+0x10/0x14)
[ 5093.862864] [<c00e817c>] (rcu_barrier+0x0/0x14) from [<c0139d58>] (deactivate_locked_super+0x54/0x68)
[ 5093.862879] [<c0139d04>] (deactivate_locked_super+0x0/0x68) from [<c013a4a8>] (deactivate_super+0x68/0x70)
[ 5093.862888] r5:ed68c000 r4:ed68c000
[ 5093.862904] [<c013a440>] (deactivate_super+0x0/0x70) from [<c01548cc>] (mntput_no_expire+0xc8/0x11c)
[ 5093.862913] r4:ed6c6780 r3:db28e34c
[ 5093.862927] [<c0154804>] (mntput_no_expire+0x0/0x11c) from [<c0155a74>] (sys_umount+0x68/0xc8)
[ 5093.862936] r7:00000034 r6:00efe8b0 r5:00000000 r4:00000000
[ 5093.862955] [<c0155a0c>] (sys_umount+0x0/0xc8) from [<c0047c80>] (ret_fast_syscall+0x0/0x30)
[ 5093.862964] r5:0001b320 r4:00efe8b0

Reading up on RCU, it looks like we are waiting for all rcu callbacks to complete on all CPUs before moving forward - but I'm not sure how to identify why that has not occurred.

Revision history for this message
mahmoh (mahmoh) wrote :

Tested Precise userspace with 3.0.0.-12-generic-pae i686 kernel as sudo against an ext2 (/boot) umount and it was faster than the same umount using the 3.2 kernel (0.079s vs. 0.089s), all in a VM.

mahmoh (mahmoh)
Changed in eilt:
importance: Undecided → Low
Revision history for this message
Tobin Davis (gruemaster) wrote :

This appears to be fixed with the 3.2 kernel. Marking as fix-committed. Will change once the 3.2 kernel hits the pool.

Changed in linux-armadaxp (Ubuntu):
status: New → Fix Committed
Revision history for this message
mahmoh (mahmoh) wrote :

Build Version/Date: Linux armadaxp1 3.2.0-1600-armadaxp #3-Ubuntu SMP Fri Mar 16 19:14:51 UTC 2012 armv7l armv7l armv7l GNU/Linux
Release: 12.04
Environment used for testing: armadaxp1

# mount | tail -1
/dev/sda6 on /mnt type ext2 (rw)
# cp /boot/* /mnt/ ; ls -la /mnt/ ; time umount /mnt

...

real 0m1.509s
user 0m0.000s
sys 0m0.110s

Changed in eilt:
status: New → Fix Committed
status: Fix Committed → Fix Released
Changed in linux-armadaxp (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.