6.06.02 2.6.15-51-server soft lockup on cpu#0 on shutdown/umount xfs partition on /dev/md0

Bug #191182 reported by Mark Carey
4
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.15-51-server

Have /home on xfs partition on /dev/md0, RAID1 array which is shared via nfs.

Get sporadic soft lockups when trying to shutdown my 6.06.02 machine. BUG: Soft lockup detected on CPU#0!

root@jersey:/var/cache/apt/archives# uname -a
Linux jersey 2.6.15-51-server #1 SMP Thu Dec 6 21:37:18 UTC 2007 i686 GNU/Linux

root@jersey:/var/cache/apt/archives# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 2000+
stepping : 2
cpu MHz : 1674.468
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3351.78

root@jersey:/var/cache/apt/archives# cat /proc/meminfo
MemTotal: 1035672 kB
MemFree: 331988 kB
Buffers: 9960 kB
Cached: 621828 kB
SwapCached: 0 kB
Active: 73628 kB
Inactive: 565964 kB
HighTotal: 131008 kB
HighFree: 168 kB
LowTotal: 904664 kB
LowFree: 331820 kB
SwapTotal: 979956 kB
SwapFree: 979956 kB
Dirty: 152 kB
Writeback: 0 kB
Mapped: 14332 kB
Slab: 54516 kB
CommitLimit: 1497792 kB
Committed_AS: 20124 kB
PageTables: 480 kB
VmallocTotal: 118776 kB
VmallocUsed: 5136 kB
VmallocChunk: 113164 kB

I can sometimes reproduce if I try to umount /home from the terminal rather than letting the shutdown scripts do it for me, e.g sync; then /etc/init.d/nfs-kernel-server stop; umount /home

However sometimes the machine manages to shutdown on its own.

System appears responsive to Alt+SysReq+..... but doesnt always give the output expected.

Have twice gotten a backtrace out of the machine which looks like (written down from photos);

Pid: 6345, comm: umount
EIP: 0060:[<f8c67217>] CPU: 0
EIP is at xfs_iflush_all+0x57/0x90 [xfs]
 EFLAGS: 00000202 Not tainted (2.6.15-51-server)
EAX: f3fd9104 EBX: c1bf7130 ECX: 00000000 EDX: f3fd9104
ESI: c1bf7000 EDI: f79b4de8 EBP: 00000001 DS: 007b ES: 007b
CR0: 8005003b CR2: 08056400 CR3: 1fec72c0 CR4: 000006b0
 [<f8c78414>] cfs_umountfs+0x14/0xf0 [xfs]
 [<f8c9573b>] linvfs_destroy_inode+0x1b/0x20 [xfs]
 [<f8c7ff22>] xfs_umount+0x132/0x1c0 [xfs]
 [<f8c95e6b>] linvfs_put_super+0x4b/0x90 [xfs]
 [<c0179a63>] generic_shutdown_super+0x93/0x140
 [<c017a69e>] kill_block_super+0x2e/0x50
 [<c017991a>] deactivate_super+0x7a/0xa0
 [<c019253f>] sys_umount+0x3f/0xa0
 [<c01925b5>] sys_oldumount+0x15/0x20
 [<c0103313>] sysenter_past_esp+0x54/0x75

Not sure if this is because I tried to umount the fs to help the machine shutdown with Alt+SysReq+U.

Once I managed to get a list of running tasks which showed that only;

xfsbufd
rc
S40umountfs
S40umountfs
umount

were running.

Any more information I can supply to help?

Revision history for this message
Mark Carey (careym) wrote :

Have been having some success lately with;

sync
/etc/init.d/nfs-kernel-server stop
sync
umount /home
halt

I dont recall getting the crash when shutting down this way, the crucial element (I suspect) being the sync between stopping nfs and unmounting /home

Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Mark,
There is a possible kernel race condition which may occur on this kernel with nfs mounts.
See Bug #58170
Compare your kernel log with the one in this bug.

Changed in linux-source-2.6.15:
status: New → Incomplete
Revision history for this message
Mark Carey (careym) wrote : Re: [Bug 191182] Re: 6.06.02 2.6.15-51-server soft lockup on cpu#0 on shutdown/umount xfs partition on /dev/md0

Looking at comment 0 the backtrace appears to be quite different? the
backtrace in link
http://www.uwsg.iu.edu/hypermail/linux/kernel/0603.0/0378.html in
comment 12 appears to be similar.

Is this fix going to appear in a 6.06 kernel or is it too close to
8.06 which I am planning on upgrading to when released anyway.

Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Mark,
I can't say. It's for the Kernel Team to make that assessment.
If you do move to Hardy, can you report here whether this bug is still an issue on Hardy?
Thanks.

Revision history for this message
Mark Carey (careym) wrote :

Does not appear to occur in Hardy Heron Server

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Michele Mangili (mangilimic) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in linux:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.