system freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in sock_aio_read

Bug #917668 reported by Jani Uusitalo
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

I've had sporadic freezes on this system before since upgrading to Precise, but I haven't had the resources to try and ssh in prior to this, so I can't tell if this has occurred before. Now that I have an additional laptop I can hopefully catch this again if it occurs.

I screwed up with the logs first, but then managed to salvage dmesg with a Trace from terminal buffer. I'll attach it separately below.

I wasn't active on the desktop when the freeze happened (had my hands on the laptop). At least Chromium, Transmission and Google Music were running at the time.

(I'm not familiar with Traces so I'll just pick something from the first line as title. Feel free to change it if needed.)

[19199.961223] ------------[ cut here ]------------
[19199.961223] WARNING: at /build/buildd/linux-3.2.0/kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0()
[19199.961223] Hardware name: System Product Name
[19199.961223] Watchdog detected hard LOCKUP on cpu 0
[19199.961223] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat sit tunnel4 bnep rfcomm binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi usbhid snd_rawmidi snd_seq_midi_event btusb bluetooth hid snd_seq snd_timer snd_seq_device snd sp5100_tco soundcore snd_page_alloc i2c_piix4 edac_core shpchp dm_multipath ppdev asus_atk0110 edac_mce_amd k10temp mac_hid parport_pc lp parport usb_storage uas dm_raid45 xor dm_mirror dm_region_hash dm_log radeon ttm drm_kms_helper drm r8169 wmi i2c_algo_bit pata_atiixp
[19199.961223] Pid: 1387, comm: Xorg Not tainted 3.2.0-9-generic #16-Ubuntu
[19199.961223] Call Trace:
[19199.961223] <NMI> [<ffffffff810651df>] warn_slowpath_common+0x7f/0xc0
[19199.961223] [<ffffffff810652d6>] warn_slowpath_fmt+0x46/0x50
[19199.961223] [<ffffffff8101adf9>] ? sched_clock+0x9/0x10
[19199.961223] [<ffffffff810d5fba>] watchdog_overflow_callback+0x9a/0xc0
[19199.961223] [<ffffffff81110736>] __perf_event_overflow+0x96/0x1e0
[19199.961223] [<ffffffff8110dcc1>] ? perf_event_update_userpage+0x11/0xc0
[19199.961223] [<ffffffff81110c84>] perf_event_overflow+0x14/0x20
[19199.961223] [<ffffffff81023fc7>] x86_pmu_handle_irq+0xd7/0x120
[19199.961223] [<ffffffff816559c1>] perf_event_nmi_handler+0x21/0x30
[19199.961223] [<ffffffff81655289>] default_do_nmi+0x69/0x220
[19199.961223] [<ffffffff816554c0>] do_nmi+0x80/0x90
[19199.961223] [<ffffffff816548b0>] nmi+0x20/0x30
[19199.961223] [<ffffffff8103ba95>] ? __ticket_spin_lock+0x25/0x30
[19199.961223] <<EOE>> [<ffffffff8103bb29>] default_spin_lock_flags+0x9/0x10
[19199.961223] [<ffffffff816540fe>] _raw_spin_lock_irqsave+0x2e/0x40
[19199.961223] [<ffffffff81088cc0>] remove_wait_queue+0x20/0x70
[19199.961223] [<ffffffff811890d8>] poll_freewait+0x98/0xe0
[19199.961223] [<ffffffff81189c17>] do_select+0x557/0x600
[19199.961223] [<ffffffff8105d99e>] ? try_to_wake_up+0x18e/0x200
[19199.961223] [<ffffffff81189120>] ? poll_freewait+0xe0/0xe0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189210>] ? __pollwait+0xf0/0xf0
[19199.961223] [<ffffffff81189eac>] core_sys_select+0x1ec/0x370
[19199.961223] [<ffffffff815241ad>] ? sock_aio_read+0x2d/0x40
[19199.961223] [<ffffffff81175c02>] ? do_sync_read+0xd2/0x110
[19199.961223] [<ffffffff8130fc24>] ? timerqueue_del+0x34/0x90
[19199.961223] [<ffffffff8108c6a0>] ? __remove_hrtimer+0x60/0xc0
[19199.961223] [<ffffffff8108cc80>] ? hrtimer_try_to_cancel+0x50/0xc0
[19199.961223] [<ffffffff8101a6f9>] ? read_tsc+0x9/0x20
[19199.961223] [<ffffffff81092f3d>] ? ktime_get_ts+0xad/0xe0
[19199.961223] [<ffffffff8118a280>] sys_select+0xc0/0x100
[19199.961223] [<ffffffff8165c3c2>] system_call_fastpath+0x16/0x1b
[19199.961223] ---[ end trace 104ea42b28f227b7 ]---

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: xserver-xorg-video-radeon 1:6.14.99~git20110811.g93fc084-0ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-9.16-generic 3.2.1
Uname: Linux 3.2.0-9-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 1.90-0ubuntu2
Architecture: amd64
CheckboxSubmission: 09ae689090491ca53449589269e4bfd8
CheckboxSystem: edda5d4f616ca792bf437989cb597002
CompizPlugins: [core,bailer,detection,composite,opengl,decor,resize,compiztoolbox,place,gnomecompat,regex,grid,mousepoll,snap,move,wall,imgpng,vpswitch,session,unitymtgrabhandles,animation,fade,workarounds,expo,scale,ezoom,unityshell]
CompositorRunning: None
Date: Tue Jan 17 16:09:07 2012
DistUpgraded: Log time: 2011-11-20 23:52:49.660633
DistroCodename: precise
DistroVariant: ubuntu
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GraphicsCard:
 ATI Technologies Inc RS780 [Radeon HD 3200] [1002:9610] (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Device [1043:82f1]
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: System manufacturer System Product Name
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-9-generic root=UUID=bf0f4d7e-d08c-4aad-8143-b6e3b16049d6 ro quiet splash radeon.audio=1 vt.handoff=7
SourcePackage: xserver-xorg-video-ati
UpgradeStatus: Upgraded to precise on 2012-01-17 (0 days ago)
dmi.bios.date: 08/12/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2101
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M4A78-EM
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2101:bd08/12/2010:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM4A78-EM:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu8
version.ia32-libs: ia32-libs 20090808ubuntu33
version.libdrm2: libdrm2 2.4.30-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu4
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu4
version.xserver-xorg-core: xserver-xorg-core 2:1.10.4-1ubuntu6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1

Revision history for this message
Jani Uusitalo (uusijani) wrote :
Revision history for this message
Jani Uusitalo (uusijani) wrote :
Revision history for this message
Jani Uusitalo (uusijani) wrote :

I can't tell if you can see it from the Apport-attached files, so I'll add that I'm running Unity 2D here.

Bryce Harrington (bryce)
description: updated
summary: - Xorg freeze ("hard LOCKUP"), warn_slowpath_common+0x7f
+ Xorg freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in sock_aio_read
summary: - Xorg freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in sock_aio_read
+ system freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in
+ sock_aio_read
Revision history for this message
Bryce Harrington (bryce) wrote :

While it may feel like an Xorg freeze, actually from your logs I'm not seeing anything that indicates X is complicit. Retargeting to the kernel.

sock_aio_read() sounds like it might be happening in some asynchronous I/O. However, the dmesg you posted only starts at 19199 sec, so might be missing some pertinent information. I would suggest reproducing it again and collecting dmesg quickly after the problem starts, so it show the period both before and after the fault started.

affects: xserver-xorg-video-ati (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jani Uusitalo (uusijani) wrote :

Alright, I will. Thanks Bryce.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

While waiting for this to reoccur, now that I came to think of it: I have radeon.audio=1 on my kernel parameters due to Bug #864735. Radeon audio is considered too buggy by developers to be enabled by default, so my force-enabling it definitely makes it a suspect here.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you know if this issue happened in previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.3 kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.3-rc2 fixed the issue, the tag would be: 'kernel-fixed-upstream-v3.3-rc2'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-rc2-precise/

tags: added: needs-upstream-testing
Revision history for this message
Jani Uusitalo (uusijani) wrote :

@jsalisbury: On this setup, there were seemingly similar freezes before Precise (I was using Lucid until then), but being so far apart and without a reliable recipe for reproducing, I mostly just ignored the issue. To give a clue as to the rarity, an (again seemingly) similar freeze happened just now, for the first time since I reported the bug, so if it's the same issue, it's been in hiding for over a month.

Unfortunately the logs didn't have anything about this crash, and I couldn't ssh in either. As I have yet to gather any substantial data apart from the little I posted above, there's no way of knowing whether it's always been the same issue or not. The symptom on the surface has always been very similar, but I guess that's true for most freezes are even if brought on by unrelated causes.

I'm not afraid of testing the mainline kernel per se, but I'm hesitant because with this occurence rate, wouldn't I be trying to prove a negative? Would 2 months without the issue constitute a 'kernel-fixed-upstream'? 6 months? Also, should I install v3.3-rc2-precise as you suggested, or the more recent 3.3-rc4-precise now? If the more recent one, should I then stick to it, or keep upgrading as new mainline kernels are built?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It would be great if you could test with 3.3-rc4-precise. This could also test if the issue in bug 938894 is resolved upstream.

Revision history for this message
Ming Lei (tom-leiming) wrote : Re: [Bug 917668] Re: system freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in sock_aio_read
Download full text (8.1 KiB)

I think the bug has been fixed in 3.3-rc4, and the commit is below:

commit bced76aeaca03b45e3b4bdb868cada328e497847
Author: Peter Zijlstra <email address hidden>
Date: Wed Jan 11 13:11:12 2012 +0100

    sched: Fix lockup by limiting load-balance retries on lock-break

Bryce, so could you feedback the test result with -rc4?

On Thu, Feb 23, 2012 at 5:11 AM, Joseph Salisbury
<email address hidden> wrote:
> It would be great if you could test with 3.3-rc4-precise.  This could
> also test if the issue in bug 938894 is resolved upstream.
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/917668
>
> Title:
>  system freeze ("hard LOCKUP"), warn_slowpath_common+0x7f in
>  sock_aio_read
>
> Status in “linux” package in Ubuntu:
>  Incomplete
>
> Bug description:
>  I've had sporadic freezes on this system before since upgrading to
>  Precise, but I haven't had the resources to try and ssh in prior to
>  this, so I can't tell if this has occurred before. Now that I have an
>  additional laptop I can hopefully catch this again if it occurs.
>
>  I screwed up with the logs first, but then managed to salvage dmesg
>  with a Trace from terminal buffer. I'll attach it separately below.
>
>  I wasn't active on the desktop when the freeze happened (had my hands
>  on the laptop). At least Chromium, Transmission and Google Music were
>  running at the time.
>
>  (I'm not familiar with Traces so I'll just pick something from the
>  first line as title. Feel free to change it if needed.)
>
>  [19199.961223] ------------[ cut here ]------------
>  [19199.961223] WARNING: at /build/buildd/linux-3.2.0/kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0()
>  [19199.961223] Hardware name: System Product Name
>  [19199.961223] Watchdog detected hard LOCKUP on cpu 0
>  [19199.961223] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat sit tunnel4 bnep rfcomm binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi usbhid snd_rawmidi snd_seq_midi_event btusb bluetooth hid snd_seq snd_timer snd_seq_device snd sp5100_tco soundcore snd_page_alloc i2c_piix4 edac_core shpchp dm_multipath ppdev asus_atk0110 edac_mce_amd k10temp mac_hid parport_pc lp parport usb_storage uas dm_raid45 xor dm_mirror dm_region_hash dm_log radeon ttm drm_kms_helper drm r8169 wmi i2c_algo_bit pata_atiixp
>  [19199.961223] Pid: 1387, comm: Xorg Not tainted 3.2.0-9-generic #16-Ubuntu
>  [19199.961223] Call Trace:
>  [19199.961223]  <NMI>  [<ffffffff810651df>] warn_slowpath_common+0x7f/0xc0
>  [19199.961223]  [<ffffffff810652d6>] warn_slowpath_fmt+0x46/0x50
>  [19199.961223]  [<ffffffff8101adf9>] ? sched_clock+0x9/0x10
>  [19199.961223]  [<ffffffff810d5fba>] watchdog_overflow_callback+0x9a/0xc0
>  [19199.961223]  [<ffffffff81110736>] __perf_event_overflow+0x96/0x1e0
>  [19199.961223]  [<ffffffff8110dcc1>] ? perf_event_update_userpage+0x11/0xc0
>  [19199.961223]  [<ffffffff81110c84>] perf_event_overflow+0x14/0x20
>  [19199.961223]  [<ffffffff81023fc7>] x86_pmu_handle_irq+0xd7/0x120
>  [19199.961223]  [<ffffffff816559c1>] perf_event_nmi_handl...

Read more...

Revision history for this message
Jani Uusitalo (uusijani) wrote :

@jsalisbury: Alright, thanks. I'm running 3.3.0-030300rc4-generic now. Let's see how it works out!

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Confirmed
tags: added: kernel-da-key
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-17.26)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-17.26
Revision history for this message
Jani Uusitalo (uusijani) wrote :

I've just had a GPU lockup with 3.3.0-030300rc4-generic. Would this be yet another issue (i.e. not this bug nor bug #938894)? If so, where (if anywhere) should I file it? I'm pasting below here what was in syslog about it.

Feb 26 19:53:38 saegusa kernel: [ 8031.040107] radeon 0000:01:05.0: GPU lockup CP stall for more than 419884msec
Feb 26 19:53:38 saegusa kernel: [ 8031.040113] GPU lockup (waiting for 0x00103023 last fence id 0x00103022)
Feb 26 19:53:38 saegusa kernel: [ 8031.040125] [drm] Disabling audio support
Feb 26 19:53:38 saegusa kernel: [ 8031.041548] radeon 0000:01:05.0: GPU softreset
Feb 26 19:53:38 saegusa kernel: [ 8031.041550] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030
Feb 26 19:53:38 saegusa kernel: [ 8031.041551] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003
Feb 26 19:53:38 saegusa kernel: [ 8031.041553] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20000040
Feb 26 19:53:38 saegusa kernel: [ 8031.041559] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Feb 26 19:53:38 saegusa kernel: [ 8031.056444] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
Feb 26 19:53:38 saegusa kernel: [ 8031.072335] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030
Feb 26 19:53:38 saegusa kernel: [ 8031.072337] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003
Feb 26 19:53:38 saegusa kernel: [ 8031.072339] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20008040
Feb 26 19:53:38 saegusa kernel: [ 8031.073333] radeon 0000:01:05.0: GPU reset succeed
Feb 26 19:53:38 saegusa kernel: [ 8031.094096] [drm] PCIE GART of 512M enabled (table at 0x00000000C0040000).
Feb 26 19:53:38 saegusa kernel: [ 8031.094153] radeon 0000:01:05.0: WB enabled
Feb 26 19:53:38 saegusa kernel: [ 8031.094156] [drm] fence driver on ring 0 use gpu addr 0xa0000c00 and cpu addr 0xffff88020f609c00
Feb 26 19:53:38 saegusa kernel: [ 8031.127631] [drm] ring test on 0 succeeded in 1 usecs
Feb 26 19:53:38 saegusa kernel: [ 8031.127656] [drm] ib test on ring 0 succeeded in 1 usecs
Feb 26 19:53:38 saegusa kernel: [ 8031.127659] [drm] Enabling audio support

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test with RC6 disabled?

To disable RC6, do the following:
Hold down Left-Shift key to enter the Grub menu on boot
Hit 'e' to edit the kernel command line
Append i915.i915_enable_rc6=0 as a kernel boot parameter
For example: linux /boot/vmlinuz-3.2.0-17-generic <...> quiet splash vt.handoff=7 i915.i915_enable_rc6=0
Press Ctrl-x to boot

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This is a test for the GPU lockup you saw.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Actually no need to test the RC6 option. It doesn't look like you have Intel graphics.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

@jsalisbury: Yeah, my Intel hardware's got its own set of problems. :) I'll get back to doing tests on those later this week.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I've been experiencing the same thing on my Lenovo W510. It has an nVidia card as opposed to the Radeon above, but I've had the same symptoms and the same type of dmesg errors listed. My dmesg is attached below.

It got REALLY bad after the last Ubuntu kernel update (3.2.0-20.32) - like the system is almost unusable. I am updating to 3.2.0-20.33 now, will see if it is any better.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

@d3mia7, maybe you could try 3.3 too? FWIW, I haven't seen this since installing 3.3.0-030300rc4-generic a month ago now. If your system's prone to freeze this way, testing 3.3 could provide further evidence as to whether the issue's fixed upstream.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I tried 3.2.0-20.33 and it was just as bad as 3.2.0-20.32 - couldn't run for more than an hour or so before I got a lockup.

I then went to 3.3.0-030300-generic (the released one as opposed to the RC) and I haven't had an issue yet. Been over 2 hours now, though with some of the earlier 3.2.0's I could run this long without issue.

I will update in a few days if things are continuing to work!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible to test the latest 3.2 upsteam stable kernel? That kernel can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.13-precise/

tags: added: kernel-fixed-upstream
removed: needs-upstream-testing
Revision history for this message
Bryn Hughes (linux-nashira) wrote :

@Joseph - sure thing. I've loaded up 3.2.13-030213-generic and will run with it for a while.

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

After 17 hours on 3.2.13-030213-generic I have had no issues. Whatever the problem was I suspect it is fixed. If I have any further issues with 3.2.13-030213 I'll report back.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-20.33)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-20.33
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Bryn Hughes (linux-nashira) wrote :

I just got a hard lock with 3.2.13-030213-generic... I'm back on 3.3 now.

Changed in linux (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Bryn Hughes (linux-nashira) wrote :

While stability is much better on 3.3.0, I did experience this same hard lock today. The machine had been online for only a few hours. I ran several days previously without issue though.

DMESG output after the crash is attached.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

In your latest dmesg output, I see the following:
RIP: 0010:[<ffffffff8103ec22>] [<ffffffff8103ec22>] __ticket_spin_lock+0x22/0x30

This appears to be a duplicate of bug 922906

Revision history for this message
Bryn Hughes (linux-nashira) wrote :

Hi Joseph, are you sure they are related? The other bug appears to be focused around removal of USB devices.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It could be that you are hitting both bugs. I will un-mark this as a duplicate to further investigate.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

Just to note here also that I've today switched to apw's build of 3.2.0-23 [1] linked to from bug 922906. I had been running the upstream 3.3 for more than 7 weeks without hitting this (#917668). The GPU lockup I mentioned in #13 also never reoccurred since that one time.

I'll report back, should #917668 resurface with the kernel I'm currently running.

* [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/922906/comments/11

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Jani. Please mark this bug "Fix released" if you no longer hit the bug with the new kernel.

Revision history for this message
Jani Uusitalo (uusijani) wrote :

Sorry, I had forgot about this one after it vanished and was only reminded by a private email from someone suffering something similar.

I went through my collection of panic photos and (as my recollection also was) there seem to have been none of this 'warn_slowpath_common' kind since I last commented.

Except for one just a week ago, on completely new hardware: this one with 3.8.0 rc2 when I was testing it wrt Bug #1096802, which turned out to be caused by bad card reader firmware. It was tied to usb-storage as most if not all of the panics caused by the firmware problem, so it was most likely another symptom of that, but I'm posting that one here too just in case it still contains a hint of the conditions under which 'warn_slowpath_common' can occur.

Meanwhile, I'm marking this as fixed as per Joseph's request above. For the record, as far as I'm concerned, a installing 3.3 or newer series kernel was a definite fix for this issue.

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Roman (m01brv) wrote :
Download full text (3.4 KiB)

I have experienced apparently the same hang-up today with kernel 3.5.0-22 of KUbuntu 12.10
I must note it happened previously from time to time, but I assumed this is some hardware issue
until I looked in the linux logs.
The relevant part of the ``syslog'' file looks like:

Jan 21 17:33:27 Eridanus kernel: [957585.389930] ------------[ cut here ]------------
Jan 21 17:33:27 Eridanus kernel: [957585.389939] WARNING: at /build/buildd/linux-3.5.0/kernel/watchdog.c:242 watchdog_overflow_callback+0x9a/0xc0()
Jan 21 17:33:27 Eridanus kernel: [957585.389941] Hardware name: To be filled by O.E.M.
Jan 21 17:33:27 Eridanus kernel: [957585.389942] Watchdog detected hard LOCKUP on cpu 4
Jan 21 17:33:27 Eridanus kernel: [957585.389943] Modules linked in: xt_recent btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 parport_pc ppdev bnep rfcomm bluetooth binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype arc4 xt_state ip6table_filter ip6_tables nf_conntrack_netbios_ns eeepc_wmi asus_wmi mxm_wmi sparse_keymap nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables kvm_amd x_tables kvm ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_codec_realtek psmouse edac_core edac_mce_amd serio_raw snd_seq_midi k10temp fam15h_power snd_rawmidi snd_hda_intel snd_hda_codec snd_seq_midi_event snd_hwdep rtl8192ce rtl8192c_common snd_seq rtlwifi snd_pcm nvidia(PO) snd_seq_device mac80211 snd_timer sp5100_tco snd i2c_piix4 cfg80211 soundcore snd_page_alloc wmi mac_hid it87 hwmon_vid lp parport
uas us
Jan 21 17:33:27 Eridanus kernel: b_storage hid_generic microcode r8169 usbhid hid
Jan 21 17:33:27 Eridanus kernel: [957585.390004] Pid: 1517, comm: wpa_supplicant Tainted: P W O 3.5.0-21-generic #32-Ubuntu
Jan 21 17:33:27 Eridanus kernel: [957585.390005] Call Trace:
Jan 21 17:33:27 Eridanus kernel: [957585.390006] <NMI> [<ffffffff81051c1f>] warn_slowpath_common+0x7f/0xc0
Jan 21 17:33:27 Eridanus kernel: [957585.390015] [<ffffffff81051d16>] warn_slowpath_fmt+0x46/0x50
Jan 21 17:33:27 Eridanus kernel: [957585.390018] [<ffffffff8108a255>] ? sched_clock_cpu+0xc5/0x120
Jan 21 17:33:27 Eridanus kernel: [957585.390021] [<ffffffff810de320>] ? touch_nmi_watchdog+0x80/0x80
Jan 21 17:33:27 Eridanus kernel: [957585.390024] [<ffffffff810de3ba>] watchdog_overflow_callback+0x9a/0xc0
Jan 21 17:33:27 Eridanus kernel: [957585.390028] [<ffffffff8111a63d>] __perf_event_overflow+0x9d/0x230
Jan 21 17:33:27 Eridanus kernel: [957585.390030] [<ffffffff81117b04>] ? perf_event_update_userpage+0x24/0x110
Jan 21 17:33:27 Eridanus kernel: [957585.390033] [<ffffffff8111b154>] perf_event_overflow+0x14/0x20
Jan 21 17:33:27 Eridanus kernel: [957585.390037] [<ffffffff81024643>] x86_pmu_handle_irq+0xe3/0x130
Jan 21 17:33:27 Eridanus kernel: [957585.390041] [<ffffffff8168539d>] perf_event_nmi_handler+0x1d/0x20
Jan 21 17:33:27 Eridanus kernel: [957585.390043] [<ffffffff81684b41>] nmi_handle.isra.0+0x51/0x80
Jan 21 17:33:27 Eridanus kernel: [957585.390142] [<ffffffffa0746a20>] ?...

Read more...

Revision history for this message
Roman (m01brv) wrote :

Please reopen this bug since it does not seem fixed.

Revision history for this message
mejean (jerome-bussiere) wrote :

I think I have the same issue :
[ 77761.6005] WARNING : at /build/buildd/Linux-3.2.0/kernel/watchdog:241 watchdog-overflow-callback +0x9a/0xc0()
[ 77761.6005] Hardware name : System Product Name
[ 77761.6005] Watchdog detected hard LOCK on cpu 2
[ 77761.6005] Module linked in: btrfs zlibdeflate libcrc32c ufc qnx4 ...

My configuration :
Ubuntu 12.04.1 Precise Pangolin (64bits)
Asus M4A89GTD PRO (with integrated Radeon HD 4290)
AMD Phenon II X4 955 processor

Revision history for this message
mejean (jerome-bussiere) wrote :

I confirm that bug come from ATI/AMD drivers.

With generic drivers I can stop PC!

Thanks.

Revision history for this message
Roman (m01brv) wrote :

The freezes still occure from time to time on my computer, although my kernel is of 3.5 series.
The reports in kern.log are similar to the one in my post #33 above, though they often are
truncated at some early position after the words "Modules linked in...".
The system hangs without any response and need to be rebooted.
My system is KUbuntu 12.10 with all current updates.
MB Asus M5A990FX Pro R2.0, CPU AMD FX 8150, GPU GeForce 210

I've stress-tested the system with prime95 several times, and it looked stable.
Besides, it is not important whether the CPU is on load or idle: the freezes may occur in any case.
So I believe it is not a hardware failure/instability.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.