Ubuntu
linux package

xen virtual Machines and Dom0 crashes BUG: soft lockup - CPU#0 stuck for 11s! [savelog:]; EIP is at _spin_lock+0x7/0x10

Bug #259487 reported by Maiquel on 2008-08-19

This bug affects 10 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Incomplete	Undecided	Unassigned
	linux-meta (Debian)	Fix Released	Unknown	debbugs #478765

Bug Description

Ubuntu 8.04
uname -r
-2.6.24-19-xen

syslogd.conf
[...]
kernel: BUG: soft lockup - CPU#0 stuck for 11s! [savelog:]
kernel: BUG: soft lockup - CPU#1 stuck for 11s! [postgres:]
kernel: BUG: soft lockup - CPU#2 stuck for 11s! [mysql:]
kernel: BUG: soft lockup - CPU#3 stuck for 11s! [syslog:]
Pid: 11194, comm: savelog Tainted: G B D (2.6.24-19-xen #2)
EIP: 0061:[dm_mod:_spin_lock+0x7/0x10] EFLAGS: 00000282 CPU: 0
EIP is at _spin_lock+0x7/0x10
EAX: c1daf2ec EBX: 00000000 ECX: 17097000 EDX: 00000000

#cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
stepping : 7
cpu MHz : 2400.029
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 4803.37
clflush size : 64

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
stepping : 7
cpu MHz : 2400.029
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 4800.11
clflush size : 64

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
stepping : 7
cpu MHz : 2400.029
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 4800.11
clflush size : 64

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
stepping : 7
cpu MHz : 2400.029
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 4800.12
clflush size : 64

#lspci
00:00.0 Host bridge: nVidia Corporation C55 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:00.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:00.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:00.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:00.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a2)
00:00.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:00.7 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:01.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:02.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:02.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:02.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
00:03.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)
00:07.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
01:00.0 VGA compatible controller: ATI Technologies Inc RV380 0x3e50 [Radeon X600]
01:00.1 Display controller: ATI Technologies Inc RV380 [Radeon X600] (Secondary)
03:08.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev c0)

The system freezes when it has high load of I / O, then lock the virtual machines and Dom0 following.

Tags:

Revision history for this message

Bastian Mäuser (mephisto-mephis) wrote on 2008-08-28:

I have the Same problem XEN/Hardy/i386:

dom0: Linux dom0-1 2.6.24-19-xen #1 SMP Thu Aug 21 03:09:02 UTC 2008 i686 GNU/Linux
domU: Linux nmail.XXXX 2.6.24-19-xen #1 SMP Thu Aug 21 03:09:02 UTC 2008 i686 GNU/Linux

crash every night ..

kern.log:Aug 28 11:33:56 nmail kernel: [55657.356419] BUG: soft lockup - CPU#1 stuck for 11s! [courierpop3logi:29319]
kern.log:Aug 28 11:34:08 nmail kernel: [55668.995642] BUG: soft lockup - CPU#1 stuck for 11s! [courierpop3logi:29319]
kern.log:Aug 28 11:34:19 nmail kernel: [55680.677142] BUG: soft lockup - CPU#1 stuck for 11s! [courierpop3logi:29319]

meanwhile i installed a xen several times, as well i reinstalled the domU's, i used 3 different HP servers (one brandnew), so it must be a problem with hardy.

obviously the xen-kernel shipped with hardy is totally unusable for production.

i have plenty other XEN Systems running - reliable - but not with hardy.

Revision history for this message

Bastian Mäuser (mephisto-mephis) wrote on 2008-08-28:

Download full text (5.1 KiB)

Additional Crash Info (domU):

Additional Crash Info (domU):

Aug 28 12:14:08 nmail kernel: [ 1109.970322] smtpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Aug 28 12:14:09 nmail kernel: [ 1109.970328] Pid: 9500, comm: smtpd Not tainted 2.6.24-19-xen #1
Aug 28 12:14:09 nmail kernel: [ 1109.970335]  [<c01606ca>] oom_kill_process+0x10a/0x120
Aug 28 12:14:09 nmail kernel: [ 1109.970344]  [<c0160ac7>] out_of_memory+0x167/0x1a0
Aug 28 12:14:09 nmail kernel: [ 1109.970348]  [<c016313c>] __alloc_pages+0x35c/0x390
Aug 28 12:14:09 nmail kernel: [ 1109.970352]  [<c016528d>] __do_page_cache_readahead+0x11d/0x250
Aug 28 12:14:09 nmail kernel: [ 1109.970355]  [<c015d370>] sync_page+0x0/0x40
Aug 28 12:14:09 nmail kernel: [ 1109.970359]  [<c01657cc>] do_page_cache_readahead+0x4c/0x70
Aug 28 12:14:09 nmail kernel: [ 1109.970362]  [<c015fbc4>] filemap_fault+0x2f4/0x420
Aug 28 12:14:09 nmail kernel: [ 1109.970366]  [<c016b9cf>] __do_fault+0x6f/0x6b0
Aug 28 12:14:09 nmail kernel: [ 1109.970372]  [<c0170c69>] handle_mm_fault+0x249/0x1350
Aug 28 12:14:09 nmail kernel: [ 1109.970377]  [<c0162456>] __pagevec_free+0x26/0x30
Aug 28 12:14:09 nmail kernel: [ 1109.970381]  [<c0329346>] do_page_fault+0x366/0xe90
Aug 28 12:14:09 nmail kernel: [ 1109.970387]  [<c01165fb>] check_pgt_cache+0x1b/0x20
Aug 28 12:14:09 nmail kernel: [ 1109.970391]  [<c0173667>] unmap_region+0x107/0x120
Aug 28 12:14:09 nmail kernel: [ 1109.970395]  [<c0174250>] do_munmap+0x180/0x1f0
Aug 28 12:14:09 nmail kernel: [ 1109.970398]  [<c0328fe0>] do_page_fault+0x0/0xe90
Aug 28 12:14:09 nmail kernel: [ 1109.970401]  [<c0327c85>] error_code+0x35/0x40
Aug 28 12:14:09 nmail kernel: [ 1109.970405]  [<c0320000>] vcc_getsockopt+0xc0/0x170
Aug 28 12:14:09 nmail kernel: [ 1109.970409]  =======================
Aug 28 12:14:09 nmail kernel: [ 1109.970410] Mem-info:
Aug 28 12:14:09 nmail kernel: [ 1109.970412] DMA per-cpu:
Aug 28 12:14:09 nmail kernel: [ 1109.970414] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Aug 28 12:14:09 nmail kernel: [ 1109.970416] CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Aug 28 12:14:09 nmail kernel: [ 1109.970418] Normal per-cpu:
Aug 28 12:14:09 nmail kernel: [ 1109.970420] CPU    0: Hot: hi:  186, btch:  31 usd:  96   Cold: hi:   62, btch:  15 usd:  57
Aug 28 12:14:09 nmail kernel: [ 1109.970423] CPU    1: Hot: hi:  186, btch:  31 usd: 130   Cold: hi:   62, btch:  15 usd:  50
Aug 28 12:14:09 nmail kernel: [ 1109.970424] HighMem per-cpu:
Aug 28 12:14:09 nmail kernel: [ 1109.970426] CPU    0: Hot: hi:   90, btch:  15 usd:  16   Cold: hi:   30, btch:   7 usd:  23
Aug 28 12:14:09 nmail kernel: [ 1109.970428] CPU    1: Hot: hi:   90, btch:  15 usd:  82   Cold: hi:   30, btch:   7 usd:   9
Aug 28 12:14:09 nmail kernel: [ 1109.970432] Active:141460 inactive:105254 dirty:0 writeback:2 unstable:0
Aug 28 12:14:09 nmail kernel: [ 1109.970433]  free:4339 slab:2328 mapped:10 pagetables:1966 bounce:0
Aug 28 12:14:09 nmail kernel: [ 1109.970436] DMA free:4088kB min:72kB low:88kB high:108kB active:4056kB inactive:3180kB present:16256kB pages_scanned:12705 all_unreclaimable? yes
Aug 28 12:14:09 nmail kernel: [ 1109.970438] lowmem_reserve[]: 0 706 1008 1008
Aug 28 12:14:09 nmail kernel: [ 1109.970442] Normal free:12976kB min:3364kB low:4204kB high:5044kB active:345628kB inactive:333388kB present:723392kB pages_scanned:1494031 all_unreclaimable? yes
Aug 28 12:14:09 nmail kernel: [ 1109.970444] lowmem_reserve[]: 0 0 2413 2413
Aug 28 12:14:09 nmail kernel: [ 1109.970448] HighMem free:292kB min:300kB low:656kB high:1016kB active:216156kB inactive:84448kB present:308864kB pages_scanned:1060951 all_unreclaimable? yes
Aug 28 12:14:09 nmail kernel: [ 1109.970451] lowmem_reserve[]: 0 0 0 0
Aug 28 12:14:09 nmail kernel: [ 1109.970453] DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4088kB
Aug 28 12:14:09 nmail kernel: [ 1109.970460] Normal: 144*4kB 50*8kB 12*16kB 9*32kB 8*64kB 2*128kB 0*256kB 3*512kB 1*1024kB 2*2048kB 1*4096kB = 12976kB
Aug 28 12:14:09 nmail kernel: [ 1109.970466] HighMem: 17*4kB 0*8kB 2*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 292kB
Aug 28 12:14:09 nmail kernel: [ 1109.970474] Swap cache: add 528366, delete 528366, find 5029/5406, race 0+0
Aug 28 12:14:09 nmail kernel: [ 1109.970475] Free swap  = 0kB
Aug 28 12:14:09 nmail kernel: [ 1109.970477] Total swap = 2097144kB
Aug 28 12:14:09 nmail kernel: [ 1109.970478] Free swap:            0kB
Aug 28 12:14:09 nmail kernel: [ 1109.973038] 264192 pages of RAM
Aug 28 12:14:09 nmail kernel: [ 1109.973040] 77824 pages of HIGHMEM
Aug 28 12:14:09 nmail kernel: [ 1109.973042] 3894 reserved pages
Aug 28 12:14:09 nmail kernel: [ 1109.973043] 540 pages shared
Aug 28 12:14:09 nmail kernel: [ 1109.973044] 0 pages swap cached
Aug 28 12:14:09 nmail kernel: [ 1109.973045] 0 pages dirty
Aug 28 12:14:09 nmail kernel: [ 1109.973046] 2 pages writeback
Aug 28 12:14:09 nmail kernel: [ 1109.973047] 10 pages mapped
Aug 28 12:14:09 nmail kernel: [ 1109.973048] 2328 pages slab
Aug 28 12:14:09 nmail kernel: [ 1109.973049] 1966 pages pagetables
Aug 28 12:14:09 nmail kernel: [ 1109.973052] Out of memory: kill process 3882 (mysqld) score 31834 or a child

[...]

Revision history for this message

forall (forall-stalowka) wrote on 2008-09-17:

I look this site http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478765 and ther is information about this bug is fixed in kernel 2.6.25, but When kernel which support XEN is be fixed? Now is only available 2.6.24-X in ubuntu with support XEN.

Bug Watch Updater (bug-watch-updater) on 2008-09-23

Changed in linux-meta:
status:	Unknown → Fix Released

Revision history for this message

forall (forall-stalowka) wrote on 2008-11-04:

Everybody who have problem with kernel-xen 2.6.24, to suggest installed kernel from debian lenny repository
http://packages.debian.org/lenny/xen-linux-system-2.6.26-1-xen-686

Today I installed this kernel from debian repository and until this time I don't have any problem, the system not crashed when I upgrade the installaed pakcages. I will see after long time of using and load system, if system will be not crashed.

Revision history for this message

AlexKent (lbf-dragonrising) wrote on 2008-11-30:

Hi Forall,

This is one annoying bug!

Just wondering how you installed the package from lenny into etch?

I added the repository to my etch source.list and that generated 'merge' errors.

I've downloaded the deb file itself from the url you gave, but when I went to install (with dpkg -i) it produced a lot of dependency errors. Am I meant to just keep download deb files and going through dependencies until they eventually resolve, or is there a better way?

Ta,

Alex

Revision history for this message

whs (wolfram-heinen) wrote on 2008-12-13:

Hi,

I get this bug since the 2.6.24-16-xen kernel mostly in some of my running domU's. Today one domU with the 2.6.24-22-xen kernel stopped running while executing 'apt-get update'. This bug ist not CPU specific. My systems are running on ML110 XEON DualCore, ML110 P4, ML115 Opteron and DL160 QuadCore Systems.
I noticed, that increasing the assigned memory size reduces the chance of running into this bug.

Revision history for this message

Lily (starlily) wrote on 2009-01-18:

I have a Dell 6650 running whatever the latest xen server image is (and I run update/upgrade/dist-upgrade frequently). One DomU locks up pretty regularly with "BUG: soft lockup - CPU#1 stuck for 11s!", usually during large file transfers. DomU Kernel version is 2.6.24-19. The bug *requires* destroying the DomU and restarting it.

Its pretty clear after reading many bug reports about this that it is in the Kernel somewhere (and the kernel team has responded by changing their policy about bug reporting). It is clearly NOT hardware or application specific, as this is reported on many platforms and appears to not have a consistent trigger.

Potentially, this is related to SMP, or PAE, although I find that listing these as an area of issue is an easy scapegoat, even if it may be true.

Id really like someone who KNOWS what this bug is caused by to provide a definitive answer somewhere that can be seen by the public, and if possible provide a workaround or targeted date for release of fix.

Thanks!
Lily

Revision history for this message

John Leach (johnleach) wrote on 2009-01-20:

I managed to reproduce this quite reliably so did some trials to find out how to improve things.

This is 64bit dom0 on Xen 3.3.0 (On Centos) on Dell 2940s The domU is a 32bit Hardy box with 1G ram. With all the available hardy Xen kernels, this soft lockup kept happening. I tried also tried "clocksource=jiffies". Then I tried the latest Intrepid kernels and the problem was solved.

As I understand, the Intrepid kernel has the proper kernel.org upstream xen support (rather than the forward ported patch from 2.6.18 as is Hardy iirc). So whilst this solves the problem (for me), it's a pretty big change and isn't something I'd expect to see "backported" to Hardy.

The new upstream Xen stuff changes the way block devices and the console are done, so you can't switch without some tweaks to your Xen configs (and guest OS config) but other than that it seemed to work fine with Hardy.

Incidentally, I replaced this domU with a 64bit Hardy install, with the standard 64bit Hardy kernel, and that also solved the soft lockups.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2009-06-02:

[This is an automated message. Apologies if it has reached you inappropriately.]

This bug was reported against the linux-meta package when it likely should have been reported against the linux package instead. We are automatically transitioning this to the linux kernel package so that the appropriate teams are notified and made aware of this issue. Thanks.

affects:

linux-meta (Ubuntu) → linux (Ubuntu)

Leann Ogasawara (leannogasawara) on 2009-07-29

tags:

added: xen

Revision history for this message

Vikram Dhillon (dhillon-v10) wrote on 2010-01-23:

#10

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Lucid Lynx 10.04 release - http://cdimage.ubuntu.com/releases/lucid/alpha-2/. If the issue remains in Lucid, please test the latest 2.6.32 upstream kernel build - https://wiki.ubuntu.com/KernelMainlineBuilds . Let us know your results. Thanks.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Emmanuel Kasper (emmanuel-kasper) wrote on 2011-11-02:

#11

Got hit by this bug as well on a 08.04 server:

root@zimbra:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 8.04.4 LTS
Release: 8.04
Codename: hardy

root@zimbra:~# uname -a
Linux zimbra 2.6.24-29-xen #1 SMP Tue Oct 11 15:58:37 UTC 2011 i686 GNU/Linux

As a workaround I disabled SMP in the domU config as suggested here: https://bugs.launchpad.net/ubuntu/hardy/+source/linux/+bug/240071/comments/14

Now seems to work stable.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

penalvch (penalvch) on 2012-06-08

tags:	added: hardy needs-upstream-testing removed: xen
tags:	added: xen

Revision history for this message

penalvch (penalvch) wrote on 2012-06-08:

#12

Maiquel, thank you for reporting this bug and helping make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect -p linux 259487

As well, could you please capture the oops following https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Capturing_OOPs ? In addition, according to this report, you are not using the most recent version of this package for your Ubuntu release. Please upgrade to the most recent version as per https://launchpad.net/ubuntu/hardy/+source/linux and let us know if you are still having this issue.

Thanks!

summary:	- xen virtual Machines and Dom0 crashes with "BUG: soft lockup - - CPU#0,CPU#1,CPU#2,CPU#3 stuck for 11s!" + xen virtual Machines and Dom0 crashes BUG: soft lockup - CPU#0 stuck for + 11s! [savelog:]; EIP is at _spin_lock+0x7/0x10
Changed in linux (Ubuntu):
status:	Confirmed → Incomplete
tags:	added: kernel-bug

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #478765
[done normal] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

xen virtual Machines and Dom0 crashes BUG: soft lockup - CPU#0 stuck for 11s! [savelog:]; EIP is at _spin_lock+0x7/0x10

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package