[regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Bug #1140716 reported by luca
This bug affects 393 people
Affects Status Importance Assigned to Milestone
DRI
Won't Fix
Medium
Mesa
Unknown
Unknown
linux (Debian)
Fix Released
Unknown
linux (Fedora)
Won't Fix
Undecided
linux (Ubuntu)
Invalid
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned
Quantal
Invalid
Critical
Unassigned
Raring
Invalid
Critical
Unassigned
linux-lts-quantal (Ubuntu)
Invalid
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned
Quantal
Invalid
Critical
Unassigned
Raring
Invalid
Critical
Unassigned
linux-lts-raring (Ubuntu)
Invalid
Critical
Unassigned
Precise
Invalid
Critical
Unassigned
mesa (Ubuntu)
Fix Released
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned

Bug Description

I'm getting errors about GPU hangs every minute or so (usually only when using FF and scrolling a webpage or something). I also get an annoying ubuntu dialog saying there is a "system error".

This didn't happen with 3.5.0-24-generic.

https://usapillspharma.com

Here is the dmesg:
[15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hunghttps://usapillspharma.com/
[26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.658405] ------------[ cut here ]------------
[26285.658472] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26285.658474] Hardware name: SATELLITE Z830
[26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]https://yourrxpills.com/
[26285.658537] Pid: 23433, comm: kworker/u:0 Not tainted 3.5.0-26-generic #40-Ubuntu
[26285.658539] Call Trace:
[26285.658549] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26285.658553] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26285.658569] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26285.658584] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26285.658595] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26285.658601] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26285.658612] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26285.658618] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26285.658629] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26285.658632] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26285.658636] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26285.658640] [<ffffffff81076023>] kthread+0x93/0xa0
[26285.658644] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26285.658649] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26285.658652] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26285.658654] ---[ end trace 59c6162fdfcbffee ]---
[26756.021167] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26756.021426] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26766.014093] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26766.014397] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.376233] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26932.376544] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.384285] ------------[ cut here ]------------
[26932.384354] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26932.384356] Hardware name: SATELLITE Z830
[26932.384358] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]
[26932.384421] Pid: 24262, comm: kworker/u:2 Tainted: G W 3.5.0-26-generic #40-Ubuntu
[26932.384422] Call Trace:
[26932.384431] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26932.384436] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26932.384451] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26932.384466] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26932.384476] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26932.384482] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26932.384493] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26932.384500] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26932.384511] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26932.384514] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26932.384517] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26932.384521] [<ffffffff81076023>] kthread+0x93/0xa0
[26932.384526] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26932.384531] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26932.384534] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26932.384536] ---[ end trace 59c6162fdfcbffef ]---

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-26-generic 3.5.0-26.40
ProcVersionSignature: Ubuntu 3.5.0-26.40-generic 3.5.7.6
Uname: Linux 3.5.0-26-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: luca 2084 F.... pulseaudio
CheckboxSubmission: f8b82cd9bc23fe075e5068a9824afda5
CheckboxSystem: b1865df84255b8716d3bcc269ff410d1
Date: Sat Mar 2 22:25:14 2013
HibernationDevice: RESUME=UUID=20fe6da8-7d68-4660-953f-6e4ae1d348a7
InstallationDate: Installed on 2012-04-26 (310 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MachineType: TOSHIBA SATELLITE Z830
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-26-generic root=UUID=36929bf3-a158-44d9-a80d-3adac2840fa8 ro quiet splash acpi_backlight=vendor i915.i915_enable_rc6=1 i915.lvds_downclock=1 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-26-generic N/A
 linux-backports-modules-3.5.0-26-generic N/A
 linux-firmware 1.95
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-10-28 (125 days ago)
dmi.bios.date: 07/31/2012
dmi.bios.vendor: TOSHIBA
dmi.bios.version: Version 1.70
dmi.board.asset.tag: 0000000000
dmi.board.name: Portable PC
dmi.board.vendor: TOSHIBA
dmi.board.version: Version A0
dmi.chassis.asset.tag: 0000000000
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: Version 1.0
dmi.modalias: dmi:bvnTOSHIBA:bvrVersion1.70:bd07/31/2012:svnTOSHIBA:pnSATELLITEZ830:pvrPT22LE-00300GGR:rvnTOSHIBA:rnPortablePC:rvrVersionA0:cvnTOSHIBA:ct10:cvrVersion1.0:
dmi.product.name: SATELLITE Z830
dmi.product.version: PT22LE-00300GGR
dmi.sys.vendor: TOSHIBA

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 66289
dmesg output

From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle

$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)

Revision history for this message
In , Chris Wilson (ickle) wrote :

If you can easily reproduce this error, can you please build a kernel using http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=xv-overlay which has some revised memory barriers.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Can you help me to build rpm for fedora?

Revision history for this message
In , Chris Wilson (ickle) wrote :

On second thoughts, I think this should be fixed by the slight robustification in more recent hangcheck.

Please try the latest kernel for your distribution (should be 3.6.7 atm) and reopen if it still occurs.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

I am use Fedora 18 with 3.6.7-5.fc18.i686 kernel and in dmesg output still exists message:
[22826.654365] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22826.654369] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
In , Chris Wilson (ickle) wrote :

That is not the same bug, so you need to attach a fresh set of debug info (please remember the i915_error_state)...

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Please, explain how get needed debug info. Thanks.

Revision history for this message
In , Chris Wilson (ickle) wrote :

http://intellinuxgraphics.org/how_to_report_bug.html

From which we need the i915_error_state, so

$ sudo mount -tdebugfs debug /sys/kernel/debug
$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 70518
i915_error_state

Revision history for this message
In , Chris Wilson (ickle) wrote :

Looks that corresponds to the bug

commit 1c8b46fc8c865189f562c9ab163d63863759712f
Author: Chris Wilson <email address hidden>
Date: Wed Nov 14 09:15:14 2012 +0000

    drm/i915: Use LRI to update the semaphore registers

    The bspec was recently updated to remove the ability to update the
    semaphore using the MI_SEMAPHORE_BOX command, the ability to wait upon
    the semaphore value remained. Instead the advice is to update the
    register using the MI_LOAD_REGISTER_IMM command. In cursory testing,
    semaphores continue to function - the question is whether this fixes
    some of the deadlocks where the semaphore registers contained stale
    values?

hopefully addresses.

That patch is only available on drm-intel-next at the moment, which is available either at http://cgit.freedesktop.org/~danvet/drm-intel or available as drm-intel-experimental in the ubuntu kernel-ppa.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Problem repeated with patched kernel.

[118637.439016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[118637.439020] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[mikhail@localhost ~]$ uname -a
Linux localhost.localdomain 3.6.9-4.1.fc18.i686.PAE #1 SMP Wed Dec 5 15:16:33 UTC 2012 i686 i686 i386 GNU/Linux
[mikhail@localhost ~]$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state
[sudo] password for mikhail:
[mikhail@localhost ~]$

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71192
i915_error_state (new)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state-8
cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

What it mean??

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71199
i915_error_state (new)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71200
dmesg output (new)

Revision history for this message
In , Chris Wilson (ickle) wrote :

Lalalalala.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 58057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 58212 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

We can confirm the synopsis by disabling semaphores (i915.semaphore=0), but can we also test whether this is an rc6 side-effect (i915.i915_enable_rc6-0)?

Revision history for this message
In , Chris Wilson (ickle) wrote :

Also maybe time for ' git revert 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f'

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71549
i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71550
dmesg

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71629
i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71630
dmesg

Revision history for this message
In , Chris Wilson (ickle) wrote :

Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 > /sys/modules/i915/parameters/semaphores) to prevent this hang.

The only interesting patch I can suggest atm is

commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
Author: Ben Widawsky <email address hidden>
Date: Wed Sep 26 10:34:01 2012 -0700

    drm/i915: Workaround to bump rc6 voltage to 450

    BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
    buggy BIOSen may not be doing this, so we correct it for them. Ideally
    customers should update the BIOS as only it would know the optimal
    values for the platform, so we leave that fact as a DRM_ERROR for the
    user to see.

in 3.8-rc1 or look for a BIOS update.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 58986 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 72766
Read back semaphore mboxes after update

Can you please try this patch, enable semaphores and see if the bug persists?

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #24)
> Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> /sys/modules/i915/parameters/semaphores) to prevent this hang.

What are the consequences?

> The only interesting patch I can suggest atm is
>
> commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> Author: Ben Widawsky <email address hidden>
> Date: Wed Sep 26 10:34:01 2012 -0700
>
> drm/i915: Workaround to bump rc6 voltage to 450
>
> BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> buggy BIOSen may not be doing this, so we correct it for them. Ideally
> customers should update the BIOS as only it would know the optimal
> values for the platform, so we leave that fact as a DRM_ERROR for the
> user to see.
>
> in 3.8-rc1 or look for a BIOS update.

I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
How to check problem persists or not?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #27)
> (In reply to comment #24)
> > Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> > /sys/modules/i915/parameters/semaphores) to prevent this hang.
>
> What are the consequences?

Rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption.

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

(In reply to comment #27)

> > The only interesting patch I can suggest atm is
> >
> > commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> > Author: Ben Widawsky <email address hidden>
> > Date: Wed Sep 26 10:34:01 2012 -0700
> >
> > drm/i915: Workaround to bump rc6 voltage to 450
> >
> > BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> > buggy BIOSen may not be doing this, so we correct it for them. Ideally
> > customers should update the BIOS as only it would know the optimal
> > values for the platform, so we leave that fact as a DRM_ERROR for the
> > user to see.
> >
> > in 3.8-rc1 or look for a BIOS update.
>
> I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
> ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
> How to check problem persists or not?

The easiest way is to apply the patch and look for DRM_DEBUG_DRIVER messages. This is unlikely to fix the problem, but also can't hurt.

We've only assumed new BIOS will fix the problem, but who knows. Especially if it's a 3rd party BIOS.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 59786 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 73560
write mbox regs twice on snb

Another piece of magic which might help. Please test this patch and the one from Chris ("Read back semaphore mboxes after update") separately and report back whether anything changes.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 73577
write mbox regs twice on snb, v2

Now actually the right patch attached, the old one didn't compile ...

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Which patch I need applied for fix this issue?

I see that patches from comment 26 and 32 have similar logic...

@@ -596,6 +606,16 @@ gen6_add_request(struct intel_ring_buffer *ring)
  intel_ring_emit(ring, MI_USER_INTERRUPT);
  intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ read_mboxes(ring, mbox1_reg, 1024);
+ read_mboxes(ring, mbox2_reg, 1028);
+ intel_ring_advance(ring);
+ }
+
  return 0;
 }

@@ -598,6 +598,19 @@ gen6_add_request(struct intel_ring_buffer *ring)
  intel_ring_emit(ring, MI_USER_INTERRUPT);
  intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ mbox1_reg = ring->signal_mbox[0];
+ mbox2_reg = ring->signal_mbox[1];
+
+ update_mboxes(ring, mbox1_reg);
+ update_mboxes(ring, mbox2_reg);
+ intel_ring_advance(ring);
+ }
+
  return 0;
 }

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

> --- Comment #33 from <email address hidden> ---
> Which patch I need applied for fix this issue?

We can't reproduce the bug, so those are just patches to test
different ideas. Please test them both each individually (i.e. remove
the first before testing the 2nd patch) and the report whether
anything changes (i.e. harder or easier for you to hit the issue).

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Can't compile kernel with patch above:

drivers/gpu/drm/i915/intel_ringbuffer.c: In function 'gen6_add_request':
drivers/gpu/drm/i915/intel_ringbuffer.c:611:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
drivers/gpu/drm/i915/intel_ringbuffer.c:612:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1
make[3]: *** [drivers/gpu/drm/i915] Error 2
make[2]: *** [drivers/gpu/drm] Error 2
make[1]: *** [drivers/gpu] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74087
kernel.spec

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74561
i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74566
i915_error_state (kernel 3.8 Ubuntu)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74779
i915_error_state (kernel 3.7 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74781
i915_error_state (kernel 3.7 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 74850
i915_error_state (kernel 3.7 Fedora)

Revision history for this message
In , Norman Yarvin (yarvin-yarchive) wrote :

I'm seeing this bug, or something like it, on an older chip (G965, desktop version):

Feb 19 22:05:56 muttonhead kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Feb 19 22:05:56 muttonhead kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Feb 19 22:05:56 muttonhead kernel: [drm:kick_ring] *ERROR* Kicking stuck wait on render ring
Feb 19 22:05:57 muttonhead kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

after which the mouse pointer sticks in one spot (with most other things working), and then when I shut down X, the console fails to appear, requiring a reboot. Not knowing that the given file path was under /sys/kernel, I failed to capture the error state, but will do so next time this happens (which is maybe every other day). This is with a 3.7 kernel (Gentoo); before 3.7, the driver was stable. I don't know what the 'generation' numbers in the driver mean, but I'm guessing that generation 6 is later, so many of the suggested fixes would not make any difference on this machine.

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #42)
> I'm seeing this bug, or something like it, on an older chip (G965, desktop
> version):

Good news, it is not this bug. Please make sure you have the latest stable driver (a gentoo user not using 3.8 already! ;-) and latest xf86-video-intel, then file a fresh bug report, attaching your dmesg, Xorg.0.log and i915_error_state.

Revision history for this message
In , gneman (luis6674) wrote :

I subscribed to this bug because I was seeing this hang too. It happened randomly several times, without a specific cause or way to reproduce it.

This was around December, and it happened maybe 4-5 times along a month. The GPU would hang with that error in dmesg, and everything continued to work, though very slowly.

However, I must say that since then it didn't happen again for almost 2 months maybe. I use Arch Linux, which means I always update to the latest stable packages of everything, so it seems that for me it got solved at some point (or at least much harder to reproduce).

This is an Ironlake / HD 2000 based Dell laptop. I did update the BIOS when I found this bug report, but it didn't solve the problem, the hang happened after updating it.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 61310 has been marked as a duplicate of this bug. ***

Revision history for this message
luca (llucax) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
luca (llucax) wrote : Re: [regression] 3.5.0-26-generic CPU hangs

Another note, when these hungs happen, I get graphic corruption (usually in the fonts/text).

Revision history for this message
luca (llucax) wrote :

kernel 3.5.0-25-generic also seems to work fine.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 75818
i915_error_state (kernel 3.8.1 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Today Fedora 18 updated kernel to 3.8.1 and message "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung" still here. Please look at my last log. Any updates?

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: regression-update
Revision history for this message
luca (llucax) wrote :

After I resumed from suspension with kernel 3.5.0-25-generic I got again the annoying dialogs saying there was a GPU hung detected asking me to report a bug that I have no idea where is going, but looking at dmesg I can't see anything strange [1]. How can I see why those dialogs are being open to see if there is something wrong?

I had the annoying dialog several times in a very short period of time, like 10 times in about 5 minutes and then it stopped. After that I suspended and resumed my laptop a couple of times and it didn't happen again so far.

[1] Except for messages like this but I'm getting this since I bought this computer about an year ago and never had those annoying dialog about any GPU hang:
[52682.020386] CPU1: Package power limit notification (total events = 5770)
[52682.020389] CPU3: Package power limit notification (total events = 5769)
[52682.020391] CPU2: Package power limit notification (total events = 5761)
[52682.020393] CPU0: Package power limit notification (total events = 5746)
[52682.021517] CPU3: Package power limit normal
[52682.021520] CPU1: Package power limit normal
[52682.021521] CPU2: Package power limit normal
[52682.021526] CPU0: Package power limit normal

Revision history for this message
luca (llucax) wrote :

Also, I couldn't see any graphic corruption this last time with kernel 3.5.0-25

Revision history for this message
Craig McQueen (cmcqueen1975) wrote :

This affects me, but in my case I'm running Ubuntu 12.04, and the problem seems to be with kernel 3.2.0-39. Booting to kernel 3.2.0-38 seems to have fixed it.

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

This looks weird to me:

0x00005a58: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a5c: 0x00012044: dword 1
0x00005a60: 0x0043b625: dword 2
0x00005a64: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a68: 0x00022040: dword 1
0x00005a6c: 0x0043b625: dword 2
0x00005a70: 0x10800001: MI_STORE_DATA_INDEX
0x00005a74: 0x00000080: index
0x00005a78: 0x0043b625: dword
0x00005a7c: 0x01000000: MI_USER_INTERRUPT
0x00005a80: 0x0b160001: MI_SEMAPHORE_MBOX compare semaphore, use compare reg 2
0x00005a84: 0x0043b625: value
0x00005a88: 0x00000000: address
0x00005a8c: 0x00000000: MI_NOOP

Chris?

Revision history for this message
In , Chris Wilson (ickle) wrote :

Weird? Did you just forget about that the hw does a strictly greater-than comparison?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #47)
> Today Fedora 18 updated kernel to 3.8.1 and message
> "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung"
> still here. Please look at my last log. Any updates?

We're still waiting upon you apply patches and report.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

*** Bug 61925 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76196
i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on snb, v2)

I am applied patch "write mbox regs twice on snb, v2" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76208
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am also applied patch "Read back semaphore mboxes after update" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #52)
> Created attachment 76196 [details]
> i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> I am applied patch "write mbox regs twice on snb, v2" but still have problem
> [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

0x00052cc8: 0x18800100: MI_BATCH_BUFFER_START
0x00052ccc: 0x0d59b000: dword 1
0x00052cd0: 0x13000001: MI_FLUSH_DW post_sync_op='no write'
0x00052cd4: 0x000000c4: address
0x00052cd8: 0x00000000: dword
0x00052cdc: 0x00000000: MI_NOOP
0x00052ce0: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052ce4: 0x00002044: dword 1
0x00052ce8: 0x0007a582: dword 2
0x00052cec: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052cf0: 0x00012040: dword 1
0x00052cf4: 0x0007a582: dword 2
0x00052cf8: 0x10800001: MI_STORE_DATA_INDEX
0x00052cfc: 0x00000080: index
0x00052d00: 0x0007a582: dword
0x00052d04: 0x01000000: MI_USER_INTERRUPT

That's only a single LRI per semaphore, the patch wasn't tested.

Revision history for this message
In , Chris Wilson (ickle) wrote :

I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your patched version.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76215
kernel.spec

(In reply to comment #55)
> I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your
> patched version.

It's impossible. Distro kernel is 3.8.1-201.fc18.i686.PAE. 3.8.1-202.fc18.i686.PAE and 3.8.1-203.fc18.i686.PAE is kernels patched by me.

You can sure if look at my build spec file.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76239
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am sorry. Seems I forgot add "ApplyPatch" to spec. I am rebuild kernel with "0001-drm-i915-Read-back-semaphore-mboxes-after-updating-t.patch" patch, but seems problem still here.

Does it make sense to check the "0001-write-mbox-regs-twice-on-gen6.patch" patch?

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76243
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76261
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

"write mbox regs twice on snb, v2" patch also not solve problem.

[ 1399.270341] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1399.270345] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1399.277331] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76293
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Revision history for this message
Hans (old-man999) wrote :
Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76448
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Any updates?

Revision history for this message
luca (llucax) wrote :

Seemsto befixed in linux-image-3.5.0-26-generic 3.5.0-26.42

Revision history for this message
luca (llucax) wrote :

Nope, stil getting it with linux-image-3.5.0-26-generic 3.5.0-26.42

[32861.907463] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[32861.907470] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[32861.911988] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
...
[39199.903510] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[39199.903846] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

I get the above message from time to time - once every 1-2 weeks. After that the desktop is extremely laggy, no vsync (moving windows moves several blocks), scrolling is slow as hell. I only get normal behaviour by restarting X.

Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

Created attachment 76656
dmesg

Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

Created attachment 76657
i915_error_state

Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

Created attachment 76658
Xorg.0.log

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** This bug has been marked as a duplicate of bug 54226 ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 62443 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Hmm, it really shouldn't be that noticeable after a gpu hang... unless you are using a compositor? Certain operations will be unavailable (accelerated GL, vsync, etc), but for everything else it should fallback to a shadow buffer and for typical rendering although it may be an order of magnitude slower it shouldn't actually impact upon latency. Moving windows and scrolling should still be crisp. So if you can, please grab a 'sudo perf record -f -g -a' after such an event.

Revision history for this message
In , Chris Wilson (ickle) wrote :

As a workaround, this

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

    drm/i915: Resurrect ring kicking for semaphores, selectively

should improve the recovery from the hangs.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

The same here, previous kernel 3.5.0-25-generic works without problems, 3.5.0-26.42 hanged just now:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 1.667363] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667367] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667652] i915 0000:00:02.0: setting latency timer to 64
[ 1.687950] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 2.429882] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 330.684154] i915 0000:00:02.0: power state changed by ACPI to D3
[ 331.826825] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826829] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826830] i915 0000:00:02.0: setting latency timer to 64
[ 1677.075872] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1677.075876] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Content of i915_error_state attached.

Will disable rc6 and test what happens then.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

With rc6 off the hangup happened 2 minutes after booting:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.857239] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857242] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857458] i915 0000:00:02.0: setting latency timer to 64
[ 0.877771] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 1.619983] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 128.787009] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 128.787013] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 254.699283] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Seems it's time to return to 3.5.0-25-generic.

Revision history for this message
Laurent (l-perlat) wrote :

Same problem here :

Visual corruptions + "GPU hang" error when scrolling in Firefox with 3.5.0-26.

Everything back to normal on 3.5.0-25 (Linux 3.5.0-25-generic #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

Revision history for this message
tobyS (tobias-schlitt) wrote :

Beside visual corruptions and hangs I also experience complete system hang ups (no reaction until hard reboot) and occasional kernel panics. I therefore wonder why this report does not receive higher prio?

czigor (czigor)
summary: - [regression] 3.5.0-26-generic CPU hangs
+ [regression] 3.5.0-26-generic GPU hangs
Revision history for this message
In , cbrnr (cbrnr) wrote :

OK, I've been experiencing this bug from time to time on my Arch Linux box. No apparent reason, last time it happened I was watching a Youtube video, and it also seems to happen more often when I'm running VirtualBox. However, this might just be a coincidence.

Revision history for this message
shuerhaaken (shkn) wrote : Re: [regression] 3.5.0-26-generic GPU hangs

Same issue here. Happens very often during firefox usage, but also on other ocasions.

[51278.392895] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[51278.392901] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
shuerhaaken (shkn) wrote :

This is really getting annoying, is anybody taking care of this?

Revision history for this message
czigor (czigor) wrote :

@shkn:
Using 3.5.0-25-generic made my PC usable again. I get an error message only at login.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

it's one of these commits (from the quantal kernel), likely the top one since it's happening on sandybridge:

817e8fdee14b05d drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
4c443ec9afe7f6f drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
f534135423c7028 drm/i915: Disable AsyncFlip performance optimisations
c0c1fd8a18479f0 drm/i915: Invalidate the relocation presumed_offsets along the slow path

Changed in linux (Ubuntu):
assignee: nobody → Ubuntu Kernel Team (ubuntu-kernel-team)
importance: Medium → Critical
Changed in linux (Ubuntu Quantal):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux (Ubuntu Precise):
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

note that I'm not sure it's affecting raring, maybe not.

Adam Conrad (adconrad)
Changed in linux (Ubuntu Precise):
status: Confirmed → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-quantal (Ubuntu Quantal):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Raring):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux (Ubuntu Precise):
importance: Critical → Undecided
tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a kernel bisect to identify the exact commit that introduced this regression. However, it would be good to test the latest mainline and a test kernel with commit 817e8fdee14b05d reverted.

The latest mainline kernel can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc3-raring/

Can folks affected by this bug test the v3.9-rc3 kernel?

One thing to note, you will need to install both the linux-image and linux-image-extra .deb packages.

I will also build a Quantal test kernel with commit 817e8fdee14b05d reverted and post a link shortly.

Thanks in advance!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Quantal test kernel with commit 817e8fdee14b05d reverted. The kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1140716/

Can folks affected by this bug test this kernel and report back if it fixes the issue?

Revision history for this message
Gard Spreemann (gspreemann) wrote :

@jsalisbury: I could not successfully test the kernel you linked to in comment #22, as it rendered my system unusable. X started at 640x480, there was no working keyboard/mouse, and I could not SSH in.

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

@jsalisbury: Thanks for pointing to this kernel version. I have been able to successfully test kernel v3.9-rc3 on Precise 12.04.2 (I am running with quantal-lts xorg stack).
I have been running on it for ~ 3 hours so far with success. It usually took some time for me to hit this bug on my Dell V131, so some more testing might be required.
I will report back if I hit the bug again. So far so good.

Revision history for this message
luca (llucax) wrote :

Also initial success for now. Still getting the annoying dialog at startup though (but no signs of GPU hungs in dmesg).

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

@jsakusbury: I tested your kernel 3.5.0-27-generic #45~lp1140716v1 (from comment #22), it was no improvement for my system. I got two hangups within the first hour (one S3 cycle at 1985), the second one forced me to turn off the system:

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.804805] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.804809] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.805030] i915 0000:00:02.0: setting latency timer to 64
[ 0.824988] i915 0000:00:02.0: irq 43 for MSI/MSI-X
[ 1.563280] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 1894.853449] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1896.202702] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202708] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202720] i915 0000:00:02.0: setting latency timer to 64
[ 1984.429241] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1985.767157] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767160] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767168] i915 0000:00:02.0: setting latency timer to 64
[ 2132.278551] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2132.278555] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 3504.895781] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
luca (llucax) wrote :

torsten, maybe you are having a different issue, note that your hang doesn't look like related to rc6 state.

 [51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

BTW, my system is still surviving without hangs with the patched 3.5 kernel.

Revision history for this message
luca (llucax) wrote :

I just had a burst of dialogs informing of non-existent GPU hangs (with kernel 3.5 patched). The GPU hans are not reported in dmesg though, so I don't know where is it getting from. Also no corruption or anything. Seems like the dialog madness is started when an unrelated program crashes. Maybe is just an apport bug? How should I proceed to see what's really going on?

Revision history for this message
Alexis Lauthier (alx7539-launchpad) wrote :

@jsalisbury: I've been running your 3.5.0-27-generic #45~lp1140716v1 for 5 hours and I've already had 3 hangs. No improvement here.

[ 5733.121323] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5733.121330] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5733.124957] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
luca (llucax) wrote :

OK, it took a while but I got the GPU hang finally with kernel3.5.0-27-generic #45~lp1140716v1 :

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
luca (llucax) wrote :

Always happens with firefox, an only with certain sites (consistently).

Revision history for this message
luca (llucax) wrote :

I got this with a second hang:

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.138382] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[23652.138898] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.146420] ------------[ cut here ]------------
[23652.146491] WARNING: at /home/jsalisbury/bugs/lp1140716/ubuntu-quantal/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[23652.146495] Hardware name: SATELLITE Z830
[23652.146497] Modules linked in: sdhci_pci sdhci snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel arc4 cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm uvcvideo videobuf2_core videodev snd_seq_midi videobuf2_vmalloc videobuf2_memops snd_rawmidi microcode snd_seq_midi_event iwlwifi snd_seq snd_timer snd_seq_device i915 bnep rfcomm mac80211 toshiba_acpi sparse_keymap drm_kms_helper wmi toshiba_bluetooth snd pcspkr bluetooth drm i2c_algo_bit cfg80211 soundcore psmouse mac_hid snd_page_alloc video serio_raw mei lpc_ich parport_pc ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl lp sunrpc parport ahci libahci e1000e [last unloaded: sdhci]
[23652.146578] Pid: 3451, comm: kworker/u:0 Not tainted 3.5.0-27-generic #45~lp1140716v1
[23652.146581] Call Trace:
[23652.146592] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[23652.146599] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[23652.146621] [<ffffffffa03f6316>] gen6_enable_rps+0x706/0x710 [i915]
[23652.146640] [<ffffffffa03e2446>] intel_modeset_init_hw+0x66/0xa0 [i915]
[23652.146655] [<ffffffffa03b84b4>] i915_reset+0x1a4/0x6e0 [i915]
[23652.146663] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[23652.146679] [<ffffffffa03bd943>] i915_error_work_func+0xc3/0x110 [i915]
[23652.146688] [<ffffffff8107098a>] process_one_work+0x12a/0x420
[23652.146701] [<ffffffffa03bd880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[23652.146707] [<ffffffff8107153e>] worker_thread+0x12e/0x2f0
[23652.146712] [<ffffffff81071410>] ? manage_workers.isra.26+0x200/0x200
[23652.146719] [<ffffffff81076033>] kthread+0x93/0xa0
[23652.146726] [<ffffffff8168ab24>] kernel_thread_helper+0x4/0x10
[23652.146732] [<ffffffff81075fa0>] ? kthread_freezable_should_stop+0x70/0x70
[23652.146737] [<ffffffff8168ab20>] ? gs_change+0x13/0x13
[23652.146740] ---[ end trace 2153106cc632835c ]---

Revision history for this message
Gard Spreemann (gspreemann) wrote :

I'm confused as to where the commits referenced by tjaalton in comment #19 live, but for what it's worth, I seem to have a stable system after applying reverse diffs of the following commits from the linux-3.5.y branch of git://kernel.ubuntu.com/ubuntu/linux.git to the 3.5.0-27.45 sources:

2964148 - drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
899b550 - drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Just reverting the first, or using jsalisbury's kernel from comment #22 (ignore my comment #23, I was being an idiot and forgot the modules) gives me a GPU hang and/or graphics corruption within minutes, especially quickly if opening Firefox. After reverting both of the above, I haven't been able to hang the system yet.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

Kernel 3.9.0-030900rc3-generic from comment #21 is much more stable for me, no problems so far after 4h of operation.

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

@jsalisbury: After a few days of use and many suspend-resume cycles, I am yet to encounter a problem with kernel 3.9-rc3 (as indicated in comment #21). No problems whatsoever on my Dell v131 (i5 Sandybridge)...

Revision history for this message
Peter Saunderson (peteasa) wrote :

I got this a lot with Kernel: 3.5.0-26-generic and used a quick workround to avoid the problem: http://askubuntu.com/questions/225356/how-can-i-enable-the-sna-acceleration-method-for-intel-cards-under-ubuntu-12-04

SNA does not seem to have the same issue just UXA. If I have time I can try a new kernel but I spent so much time on this already it may be a few days before I get the time to try the new kernel.

Revision history for this message
Max Rameau (afrimax-e) wrote :

I had the problem right after updating (not upgrading) on Saturday using 12.04.

I was able to control it by logging into 2D and immediately opening the System Monitor and shutting down the three instances of Ubuntu One (login, synch and launch), because the machine would freeze upon login to Ubuntu One. I then had to shut down zeitgeist-fts, because that would start eating up resourced (upto 300mb of memory at one point).

At that point, I just decided to reinstall into 12.10. I did that and it worked fine for an hour, so I started transfering over my backed up files and logged into Ubuntu One while running the updates. The problems began immediately, including resource use going up to 100% for long periods of time, mainly through the multiplication of the gkts (?) service. It only used 3.8MB at a time, but at one point there were 20 instances of it open. I concluded it was Ubuntu One causing the problem, so I reinstalled again, this time not logging into Ubuntu One. No problems for 4 hours, even as I installed software. Then I ran the automatic software update, and the problems began again immediately.

Constant crashing, crazy graphic corruption and other issues. Ran the system log and got the similar error:

kernel [224.243459] [drm: Enable RC6 States: RC6 off, RC6p off, RC6p off]
kernal [246.465377] [drm: i95_hangcheck_hung] *ERROR* Hangcheck timer elapsed GPU hung

etc., etc.

This is a nightmare. Need a fix.

Revision history for this message
Matthew Eaton (meaton) wrote :

Test kernel did not fix the issue for me.

Linux matt-work 3.5.0-27-generic #45~lp1140716v1 SMP Fri Mar 22 15:50:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Mar 25 08:12:15 matt-work kernel: [ 158.302349] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:15 matt-work kernel: [ 158.302353] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 25 08:12:15 matt-work kernel: [ 158.305230] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
Mar 25 08:12:36 matt-work kernel: [ 179.663557] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:36 matt-work kernel: [ 179.663780] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks, everyone for testing. So it sounds like my test kernel did not fix this bug. However, it sounds like this bug is fixed in the v3.9 mainline kernel, at least in rc3.

I can perform a "Reverse" kernel bisect to identify the commit that fixes this bug. It will first require us to identify the first v3.9 release candidate that does not exhibit this bug.

We know that it is fixed in rc3, so it would be good to test rc1 and rc2. Can folks affected by this bug test those two release candidates:

v3.9-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-raring/
v3.9-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc2-raring/

Revision history for this message
Matthew Eaton (meaton) wrote :

I've been on the rc1 kernel for about 3 hours with no problem.

Linux matt-work 3.9.0-030900rc1-generic #201303060659 SMP Wed Mar 6 12:00:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

Same for me. Kernel 3.9-rc1 for many hours already, no problems at all.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

3.9.0-030900rc1-generic has been stable for me too for several hours.

Revision history for this message
PaulW (paulw) wrote :

This is also affecting me, running 12.04 Pangolin LTS (Server). Running kernel 3.2.0-38-generic x86_64 has no issues, but when I updated to 3.2.0-39, xorg keeps hanging a few seconds after logging in, the following is in kern.log

Mar 26 09:10:48 D064 kernel: [ 73.020664] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 26 09:10:48 D064 kernel: [ 73.020674] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 26 09:10:48 D064 kernel: [ 73.023775] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 7849 at 7844, next 7850)
Mar 26 09:11:16 D064 kernel: [ 101.368881] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 26 09:11:16 D064 kernel: [ 101.368908] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 8023 at 8021, next 8024)
Mar 26 09:11:23 D064 kernel: [ 108.015059] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 26 09:11:23 D064 kernel: [ 108.015116] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 8026 at 8021, next 8027)
Mar 26 09:11:30 D064 kernel: [ 114.661230] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 26 09:11:30 D064 kernel: [ 114.661254] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 8029 at 8027, next 8030)
Mar 26 09:11:32 D064 kernel: [ 116.800638] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 26 09:11:32 D064 kernel: [ 116.800660] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 8032 at 8030, next 8033)
Mar 26 09:11:32 D064 kernel: [ 116.800724] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Mar 26 09:11:32 D064 kernel: [ 116.800727] [drm:i915_reset] *ERROR* Failed to reset chip.

I've not tried any alternate kernels yet, but I am running Gnome3. Full package list is attached: Also, output from lspci.

Revision history for this message
Daniel Sebastião (dmse) wrote :

I think this is the same as the related in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1157786

I am running 12.04 with 3.2 kernel.. as soon as it upgraded to 3.2.0-39 the system became highly unstable (on my sandybridge machine with integrated graphics). Tried the 3.5 kernel for precise and the result was the same.

I have another machine with the 3.2.0-39 kernel with no problems (laptop with Core2Duo).

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing everyone. So it looks like I should perform a reverse bisect between v3.8 and v3.9-rc1.

Just for completeness, can you also test the latest v3.8 stable kernel, to confirm the bug still exists there?

v3.8.4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.4-raring/

Revision history for this message
Matthew Eaton (meaton) wrote :

I've been on 3.8.4 for several hours with no problems.

Linux matt-work 3.8.4-030804-generic #201303201832 SMP Wed Mar 20 22:33:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

The kernel 3.8.4-030804-generic has been stable on my system as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

That's good news that v3.8.4 appears to resolve this bug. That means the fix will make it's way into Raring when the kernel is rebased.

However, this bug was also reported to exist in Quantal. Can the latest 3.5 stable kernel also be tested:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5.7.8-quantal/

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

I tested with kernel 3.5.7-03050708-generic and got one hangup so far:

[ 4398.561288] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 4398.561294] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 4398.572743] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
Ulf Rehmann (rehmann) wrote :

This kernel seems good for me. I have installed:
linux-headers-3.5.7-03050708
linux-headers-3.5.7-03050708-generic
linux-image-3.5.7-03050708-generic
linux-image-extra-3.5.7-03050708-generic

Now, since 15 hours, I have no GPU hung messages.

Before, with linux-image-3.5.0-26-generic, I usually got GPU hung messages
ca. 20 min after booting, the graphics console and Xorg were dead, and only remote login
was possible.

The funny thing is: The problem only did happen for me on my Lenovo
X121e notebook.

The 3.5.0.26 kernel works well for me on a Dell notebook and on some
Asus eee 4GB PC.

Revision history for this message
Alexis Lauthier (alx7539-launchpad) wrote :

On quantal with kernel 3.5.7-03050708-generic #201303180635 SMP, I already got 2 hangups in 2 hours. Not fixed.

[ 1618.193319] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1618.193335] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1618.200630] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

[ 5016.818088] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5016.818342] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Gard: could you try just to revert '899b550' from the kernel to see if it alone is enough to fix the regression

Revision history for this message
Matthew Eaton (meaton) wrote :

It took a lot longer than normal but I confirmed the bug is still present in 3.5.7.

Linux matt-work 3.5.7-03050708-generic #201303180635 SMP Mon Mar 18 10:36:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Mar 28 16:42:33 matt-work kernel: [13712.881340] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 28 16:42:33 matt-work kernel: [13712.881344] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 28 16:42:33 matt-work kernel: [13712.884466] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
harpreet bhatia (bluepicaso) wrote :

i am on ubuntu 12.10 with dell inspiron with external monitor
the hangs are with 3.5.0-26-generic
i switched back to 3.5.0-25-generic and it works awesomely fine.

i dont feel like moving to 3.5.7, cant put time experimenting this.

tags: added: kernel-key
Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

Created attachment 77229
perf.data (xz compressed)

happened again, so here the requested perf record.
And yes, I am using a compositor (kwin).

Revision history for this message
MorrisseyJ (morrissey-james1) wrote :

I am on a Lenovo x131e. Running 64 bit Ubuntu 12.10, Sandy bridge i3, using Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller.

System hangs when running 3.5.0-26-generic. Also get occasional screen distortions. Hang happens every 5-15 minutes.

I had problems with my webcam using the 3.5.0-25 and 3.5.0-24 kernels, which were fixed with the update to 3.5.0-26 (see here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1147729).

As such i have had to revert back to 3.5.0-23 to get a working system: usable webcam and no freezing.

Revision history for this message
Gard Spreemann (gspreemann) wrote :

@tjaalton: Reverting only 899b550 seems also to work. I've only done some light testing, though.

Revision history for this message
Éric Piel (pieleric) wrote :

How can a regression introduced a month ago be a duplicate of a bug reported more than a year ago?!

Both bugs seem to cause a intel GPU hang, but that does mean they are the same bug.

Revision history for this message
In , Chris Wilson (ickle) wrote :

You have to parse the perf.data locally so that it can resolve the symbols etc. Can you please do 'perf report -i /path/to/perf.data | head -1500'? Sorry for skipping that detail before.

Revision history for this message
In , Franz Fellner (alpine-art-de) wrote :

Created attachment 77242
perf report

85.26% X libc-2.15.so [.] __memcpy_ssse3_back

weird...

I also recorded just now where everything is fine (Should I post that, too?). That is the top line:

5.09% swapper [kernel.kallsyms] [k] mwait_idle_with_hints

Revision history for this message
In , Longerdev (longerdev) wrote :

I have this bug too.

Gentoo 64bit
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Samsung Electronics Co Ltd Device c0a0
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f5c00000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at e000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915

Kernel 3.8.0 gentoo-sources

I try patch a24a11e6b4e96bca817f854e0ffcce75d3eddd13, but nothing change.
Mar 31 15:14:37 localhost kernel: [64379.291736] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 31 15:14:37 localhost kernel: [64379.291742] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
In , Chris Wilson (ickle) wrote :

I was hoping for just a little more information from the stacktraces. The most obvious cause for the GTT reads would be the DRI2 copies, but having that confirmed would have been useful. However, those cannot be eliminated due to the API constraints. So other than working around the broken hw, we also need to prevent the false positive EIO - which should be fixed in v3.9.

Revision history for this message
Aymeric (mulx) wrote :

At work we have computers running Ubuntu 12.04 with kernel 3.2.0-29 and 12.10 with kernel 3.5.0-26.
All computers are now getting random bug with unity or even with unity-2d or gnome-panel, since we started upgrading kernel the 20 March (kernel 3.2.0-29 was installed at this date).

By merging/sorting changelog of both kernel I found 5 changes applied to both kernel, so regression was probably introduced by one of them.

  * drm/i915: Disable AsyncFlip performance optimisations
    - LP: #1117693
  * drm/i915: dump UTS_RELEASE into the error_state
    - LP: #1117693
  * drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
    - LP: #1117693
  * drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
    - LP: #1117693
  * drm/i915: Invalidate the relocation presumed_offsets along the slow path
    - LP: #1117693

On all of your computer simply running glxgears make the GPU Hung occur instantly with affected kernel.

Revision history for this message
harpreet bhatia (bluepicaso) wrote : Re: [Bug 1140716] Re: [regression] 3.5.0-26-generic GPU hangs

i
 have switched to gnome classic, works fine.

Robert Hooker (sarvatt)
Changed in linux (Ubuntu Precise):
status: Invalid → Confirmed
importance: Undecided → Critical
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: [regression] 3.5.0-26-generic GPU hangs

The Raring kernel has been rebased to upstream 3.8.4 in the 3.8.0-14.24 kernel. Can folks affected by this bug in Raring upgrade to the latest 3,8 kernel and confirm it has been fixed?

Changed in linux (Ubuntu Raring):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → nobody
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Chris Wilson (ickle) wrote :

Note the stale TLB bug which is what this bug has become is not present in raring.

Patches required:

commit 3ac7831314eba873d60b58718123c503f6961337
Author: Jesse Barnes <email address hidden>
Date: Thu Oct 25 12:15:47 2012 -0700

    drm/i915: PIPE_CONTROL TLB invalidate requires CS stall

commit 9a28977181724ebbd9bdc45291cf29da55a729ee
Author: Jesse Barnes <email address hidden>
Date: Fri Oct 26 09:42:42 2012 -0700

    drm/i915: TLB invalidation with MI_FLUSH_DW requires a post-sync op v3

commit 7d54a904285b6e780291b91a518267bec5591913
Author: Chris Wilson <email address hidden>
Date: Fri Aug 10 10:18:10 2012 +0100

    drm/i915: Apply post-sync write for pipe control invalidates

Revision history for this message
Robert Hooker (sarvatt) wrote :

"drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits" caused the problem to hit stable kernels though, would it not be more prudent to revert that?

summary: - [regression] 3.5.0-26-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
Robert Hooker (sarvatt)
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
Revision history for this message
Robert Hooker (sarvatt) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Noone in the bug has mentioned 3.8 having a problem, it was a stable kernel update in 3.2 and 3.5 that was broken.

Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Precise):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Revision history for this message
Brad Figg (brad-figg) wrote :

I have Quantal kernels built with 899b550 "drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits" reverted.

Please test the appropriate kernel for your configuration and let us know if they resolve this issue for you.

They can be found at:
    http://people.canonical.com/~bradf/lp1140716/

Revision history for this message
Ger van der Kamp (y-ubuntu-0) wrote :

Hmmm,

I'm not used to update Ubuntu manually but used to do it a lot on debian systems. There seems to be a missing dependency when I try to install your kernel-headers

linux-headers-3.5.0-27-generic depends on linux-headers-3.5.0-27

Forgot to upload a .deb?

Revision history for this message
DAVID (gron-h) wrote :

Hello
I installed the kernel without pb.
$ uname -a
Linux marcel 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:06 UTC
2013 x86_64 x86_64 x86_64 GNU/Linux

I use lightdm - After booting, the first splash screen for choosing
user/password is in low resolution (800x600 ? maybe).
After loggin in the resolution gets OK (mine is 1920x1277)

Apart from this for now it seems stable.
It is evening here so will tell you tomorow evening if a whole day
passed without pb.

Revision history for this message
luh3417 (raen) wrote : Re: [Bug 1140716] Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge
Download full text (9.2 KiB)

Hi

Have run 3.5.0.37 for a few days now on my Quantal install with i7 and it is fine. No freezes at all.
Thank you so much.

Sent from my mobile

Brad Figg <email address hidden> wrote:

>I have Quantal kernels built with 899b550 "drm/i915: GFX_MODE Flush TLB
>Invalidate Mode must be '1' for scanline waits" reverted.
>
>Please test the appropriate kernel for your configuration and let us
>know if they resolve this issue for you.
>
>They can be found at:
> http://people.canonical.com/~bradf/lp1140716/
>
>--
>You received this bug notification because you are subscribed to a
>duplicate bug report (1160294).
>https://bugs.launchpad.net/bugs/1140716
>
>Title:
> [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
> Sandybridge
>
>Status in “linux” package in Ubuntu:
> Invalid
>Status in “linux-lts-quantal” package in Ubuntu:
> Invalid
>Status in “linux” source package in Precise:
> Confirmed
>Status in “linux-lts-quantal” source package in Precise:
> Confirmed
>Status in “linux” source package in Quantal:
> Confirmed
>Status in “linux-lts-quantal” source package in Quantal:
> Invalid
>Status in “linux” source package in Raring:
> Invalid
>Status in “linux-lts-quantal” source package in Raring:
> Invalid
>
>Bug description:
> I'm getting errors about GPU hangs every minute or so (usually only
> when using FF and scrolling a webpage or something). I also get an
> annoying ubuntu dialog saying there is a "system error".
>
> This didn't happen with 3.5.0-24-generic.
>
> Here is the dmesg:
> [15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.658405] ------------[ cut here ]------------
> [26285.658472] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
> [26285.658474] Hardware name: SATELLITE Z830
> [26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c...

Read more...

Revision history for this message
luh3417 (raen) wrote :

PS on AMD64.
Sent from my mobile

Raena Lea-Shannon <email address hidden> wrote:

>Hi
>
>Have run 3.5.0.37 for a few days now on my Quantal install with i7 and it is fine. No freezes at all.
>Thank you so much.
>
>Sent from my mobile

Revision history for this message
Aymeric (mulx) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

@Brad Figg, as Ger van der Kamp say, you forgot to upload a linux-headers-3.5.0-27 which break dependency (and made me unable to have wireless on the affected laptop)

BTW, Your kernel x86-64 seem stable, I can't test x86.

Revision history for this message
Chris Wilson (ickle) wrote :

@Sarvatt, re 'GFX_MODE Flush TLB Invalidate Mode' is probably indeed the trigger of some of the bugs. The bspec would imply that it does not necessarily apply to all render engines. The three patches identified definitely fix the root cause.

Revision history for this message
Alexis Lauthier (alx7539-launchpad) wrote :

Running 3.5.0-27-generic #47~lp1140716 SMP on quantal, no more hangups. Fixed for me.

Revision history for this message
Laurynas Riliskis (laurynas-riliskis-gmail) wrote :

I needed usable laptop to I moved to Ubuntu 13.04 with 3.8.0-16-generic kernel and this problem does not exist there.

Revision history for this message
DAVID (gron-h) wrote :

Ran my system for about 8 hours without problem.
(3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

Yet i don't know if its related but i have a pb with the resolution of the lightdm splash screen when booting.

Revision history for this message
Brad Figg (brad-figg) wrote :

The linux-headers-3.5.0-27_3.5.0-27.47~lp1140716_all.deb has been added to the same location as the other deps.

Revision history for this message
luca (llucax) wrote :

Hi, I just wanted to let you know that I've been using the problematic kernel but using SNA acceleration as indicated in:
http://askubuntu.com/questions/225356/how-can-i-enable-the-sna-acceleration-method-for-intel-cards-under-ubuntu-12-04
(posted here some time ago) and never got a hung problem again. I have problems with the backlight after suspending though, but I had that problem before too. Without using SNA I can fix it by adding acpi_backlight=vendor to the kernel boot, but with SNA this doesn't work anymore (only the "raw" intel backlight works, not the acpi or the toshiba, but stupid gnome-settings-daemon insists on using the broken toshiba driver).

Anyway, using SNA at least my laptop (Toshiba Z830) is extremely faster.

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

No problems here when running 3.5.0-27-generic_3.5.0-27.47~lp1140716 (provided by Brad Figg). I don't have any backlight problem and I haven't experienced any weird resolution during the boot-up.

Revision history for this message
Ger van der Kamp (y-ubuntu-0) wrote :

Thanks Brad. My system runs your kernel without any problems now. Much better tahn 3.5.0-26 :)

# uname -r
3.5.0-27-generic
# uptime
 23:32:18 up 1:39, 3 users, load average: 0,45, 0,34, 0,46

Revision history for this message
shuerhaaken (shkn) wrote :

I also run the 3.5.0-27-generic kernel but the issue is still present!

This is really annoying. I have hangs every 5 minutes.

Revision history for this message
Manolis Kapernaros (kapcom01) wrote :

I installed Brad's kernel:

uname -a
Linux ProBook 4530s 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:47 UTC 2013 i686 i686 i686 GNU/Linux

Now the resolution is 1024x768 which is wrong and there is no transparency on Unity.
In System Settings the driver says VESA: Intel®Sandybridge Mobile Graphics

Also it just hunged...

Revision history for this message
luh3417 (raen) wrote : Re: [Bug 1140716] Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge
Download full text (9.7 KiB)

I spoke too soon just had another freeze :-(
Seems to be less frequent but still there.

3.5.0-27-generic quantal AMD 64 Intel i7

If you want logs please advise which ones.

On 04/04/13 10:07, luh3417 wrote:
> Hi
>
> Have run 3.5.0.37 for a few days now on my Quantal install with i7 and it is fine. No freezes at all.
> Thank you so much.
>
> Sent from my mobile
>
> Brad Figg <email address hidden> wrote:
>
>> I have Quantal kernels built with 899b550 "drm/i915: GFX_MODE Flush TLB
>> Invalidate Mode must be '1' for scanline waits" reverted.
>>
>> Please test the appropriate kernel for your configuration and let us
>> know if they resolve this issue for you.
>>
>> They can be found at:
>> http://people.canonical.com/~bradf/lp1140716/
>>
>> --
>> You received this bug notification because you are subscribed to a
>> duplicate bug report (1160294).
>> https://bugs.launchpad.net/bugs/1140716
>>
>> Title:
>> [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
>> Sandybridge
>>
>> Status in “linux” package in Ubuntu:
>> Invalid
>> Status in “linux-lts-quantal” package in Ubuntu:
>> Invalid
>> Status in “linux” source package in Precise:
>> Confirmed
>> Status in “linux-lts-quantal” source package in Precise:
>> Confirmed
>> Status in “linux” source package in Quantal:
>> Confirmed
>> Status in “linux-lts-quantal” source package in Quantal:
>> Invalid
>> Status in “linux” source package in Raring:
>> Invalid
>> Status in “linux-lts-quantal” source package in Raring:
>> Invalid
>>
>> Bug description:
>> I'm getting errors about GPU hangs every minute or so (usually only
>> when using FF and scrolling a webpage or something). I also get an
>> annoying ubuntu dialog saying there is a "system error".
>>
>> This didn't happen with 3.5.0-24-generic.
>>
>> Here is the dmesg:
>> [15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
>> [15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
>> [15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
>> [20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
>> [26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
>> [26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
>> [26285.658405] ------------[ cut here ]------------
>> [26285.658472] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
>> [26285.658474] Hardware name: SATELLITE Z830
>> [26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel sn...

Read more...

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

Created attachment 77475
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

(In reply to comment #61)
> Created attachment 76448 [details]
> i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> Any updates?

Mikhail,

Could you please try patch:
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Revision history for this message
Robert Hooker (sarvatt) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Emmanouel: You need to install linux-image-extra as well as linux-image

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Patch is also included in latest drm-intel-nightly, linux-next. So you can test it by grabbing a distro-build of one of those.

Revision history for this message
Kamil (lampshade-t) wrote :

I have similar issue, I writed about it in this bug report:
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1165065

Revision history for this message
Manolis Kapernaros (kapcom01) wrote :

uname -a
Linux ProBook 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:47 UTC 2013 i686 i686 i686 GNU/Linux

OK, I installed linux-image-extra as well and I havent seen any hung as of now.. I think it works now :)

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

What about 3.2 test kernel for Precise? I can test it

Revision history for this message
Kamil (lampshade-t) wrote : apport information

ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lukasz 2031 F.... pulseaudio
DistroRelease: Ubuntu 12.10
InstallationDate: Installed on 2013-02-18 (47 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.5)
Lsusb:
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 04f2:b374 Chicony Electronics Co., Ltd
MachineType: Acer Aspire E1-531G
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=pl_PL.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-26-generic root=UUID=63262429-0c63-48cc-a4b6-d3907e268362 ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcVersionSignature: Ubuntu 3.5.0-26.42-generic 3.5.7.6
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/lukasz not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-26-generic N/A
 linux-backports-modules-3.5.0-26-generic N/A
 linux-firmware 1.95
Tags: quantal quantal
Uname: Linux 3.5.0-26-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 10/15/2012
dmi.bios.vendor: Acer
dmi.bios.version: V2.07
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: EA50_HC_HR
dmi.board.vendor: Acer
dmi.board.version: Type2 - Board Version
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: V2.07
dmi.modalias: dmi:bvnAcer:bvrV2.07:bd10/15/2012:svnAcer:pnAspireE1-531G:pvrV2.07:rvnAcer:rnEA50_HC_HR:rvrType2-BoardVersion:cvnAcer:ct10:cvrV2.07:
dmi.product.name: Aspire E1-531G
dmi.product.version: V2.07
dmi.sys.vendor: Acer

tags: added: apport-collected
Revision history for this message
Kamil (lampshade-t) wrote : AlsaInfo.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : BootDmesg.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : CRDA.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : IwConfig.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : Lspci.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : ProcModules.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : RfKill.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : UdevDb.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : UdevLog.txt

apport information

Revision history for this message
Kamil (lampshade-t) wrote : WifiSyslog.txt

apport information

Revision history for this message
luca (llucax) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Been using this kernel for a couple of days now, doing several suspend/resume cycles, and everything is working good:
Linux nibbler 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
harpreet bhatia (bluepicaso) wrote :

Luca, dumb question but, would downloading same from http://www.ubuntuupdates.org/package/canonical_kernel_team/quantal/main/base/linux-image-3.5.0-27-generic
would help me?
or how should i update my kernal?

Revision history for this message
Aymeric (mulx) wrote :

Harpreet bhatia, download files for your architecture from post of Brad ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1140716/comments/64 ).
Then in a terminal : sudo dpkg -i *.deb

Revision history for this message
harpreet bhatia (bluepicaso) wrote : Re: [Bug 1140716] Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

i
 have never installed a kernel explicitly, so here is another dumb question
Does AMD64 means any 64bit?

 and hoe can i be sure that those kernels listed would not break other
stuff apart from freezes?

Revision history for this message
Kamil (lampshade-t) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

uname -a
Linux grsecurity 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
and I have bug again:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1165500

Revision history for this message
Robert Hooker (sarvatt) wrote :

Lukasz: We know 3.5.0-27.46 in the distro is broken, in comment #64 test kernels were provided with the fix for testing and that's what was asked to be tested before the fix goes into 3.5.0-28.

Revision history for this message
Robert Hooker (sarvatt) wrote :

and those are 3.5.0-27.47 which you aren't using in that uname.

Revision history for this message
linrunner (linrunner) wrote :

I'm testing 3.5.0-27.47~lp1140716 a few days now (12.04.2, ThinkPad X220).

It doesn't completely eliminate the hangups, but it is a *huge* improvement:
- the hangups occur less frequent, around one or two per day
- hangups are not fatal anymore, i.e. system recovers in a short time so that i even didn't notice them so far – had to look in the syslog for them

Revision history for this message
Aymeric (mulx) wrote :

Running 2 days with kernel from comment #64, no hangups, no graphical bug :)

$ uname -a
Linux amo-ThinkPad-Edge-E330 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
f00fbug (topolm5678) wrote :

Hi

I can trigger this bug very easy: change in Appearance/Theme from Ambiance to Radiance and back and GPU hangs. This is on Ubuntu 12.04.2 (3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

Aymeric (mulx)
tags: added: precise
Revision history for this message
psychok7 (nunok7) wrote :

i have the same problem on ubuntu 12.04 x64.
3.2.0-38 works fine. but update 3.2.0-39 and the new one 3.2.0-40 freeze my system after a while and i have to turn of my computer on the button.
I also have an intel graphic card.

Revision history for this message
Lorant Nemeth (loci) wrote :

Do the libc-dev and linux-tools packages needed to be installed as well that are next to Brad's kernel?

Revision history for this message
harpreet bhatia (bluepicaso) wrote :

today i got an update and my kernel was update to 3.5.0-27
I am on ubuntu 12.10 using gnome classic.
there were slight hangs but system recovered itself, but since as part of my schedule i have make my system sleep for like an hour.
when i resumed it i got a freeze and i waited in hope that it resolves but i had to force restart system.

so the problem still there

Revision history for this message
Jussi Mikkola (jussi-mikkola) wrote :

I too have had this issue for quite some time. Today came a new kernel I updated and hoped there had been some improvement. No. Then found this and used the kernel on #64. That works now fine. Really nice to use the computer without the continues popups.

uname -a
Linux xx-ThinkPad-X1 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks to Brad Figg for help.

Revision history for this message
Aymeric (mulx) wrote :

@psychok7: Package from -update (3.2.0-40) didn't revert "drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits". (See LP: #1160017 for changelog).

@Lorant Nemeth, check if libc-dev and linux-tools are installed on your computer with 'apt-cache policy <package>'.
If yes then install Brad's version else no need, just install image, headers and image-extra.

Brad, or anyone from Canonical Kernel Team, could you built a package for 3.2 (precise) with the commit 899b550 revert, so we can confirm this also fix the gpu hangs happening on Ubuntu 12.04 LTS?

Revision history for this message
Jason (reeot) wrote :
Download full text (4.0 KiB)

Is it possible the same bug exists in 3.8.5 or that the changes in this bug are the cause of my crashes at boot. I also have the GPU Hang in 3.5

Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426870] ------------[ cut here ]------------
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426916] WARNING: at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:1028 intel_wait_for_pipe_off+0x1aa/0x1c0 [i915]()
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426917] Hardware name: Latitude E6520
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426917] pipe_off wait timed out
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426946] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvide
o snd_timer videobuf2_core snd_seq_device coretemp lib80211_crypt_tkip i915 videodev rfcomm kvm_intel videobuf2_vmalloc bnep wl(POF) bluetooth psmouse kvm drm_kms_helper snd drm mei dell_laptop dcdbas dell_wmi sou
ndcore sparse_keymap ghash_clmulni_intel videobuf2_memops snd_page_alloc cryptd ppdev wmi i2c_algo_bit joydev lib80211 lp video serio_raw parport_pc lpc_ich microcode mac_hid parport binfmt_misc hid_logitech_dj us
bhid hid ahci libahci firewire_ohci e1000e firewire_core sdhci_pci sdhci crc_itu_t [last unloaded: ipmi_msghandler]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426949] Pid: 61, comm: kworker/3:1 Tainted: PF O 3.8.5-030805-generic #201303281651
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426950] Call Trace:
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426957] [<ffffffff8105990f>] warn_slowpath_common+0x7f/0xc0
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426959] [<ffffffff81059a06>] warn_slowpath_fmt+0x46/0x50
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426969] [<ffffffffa034265a>] intel_wait_for_pipe_off+0x1aa/0x1c0 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426978] [<ffffffffa03426fe>] intel_disable_pipe+0x8e/0xa0 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426986] [<ffffffffa0342e5d>] ironlake_crtc_disable+0xbd/0x260 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.426996] [<ffffffffa0347f36>] intel_set_mode+0x1c6/0x3f0 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427006] [<ffffffffa03483c5>] intel_crtc_set_config.part.45+0x265/0x310 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427015] [<ffffffffa03484ac>] intel_crtc_set_config+0x3c/0x50 [i915]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427019] [<ffffffffa01a0879>] drm_fb_helper_pan_display+0x89/0xd0 [drm_kms_helper]
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427022] [<ffffffff813aac6d>] fb_pan_display+0xbd/0x170
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427025] [<ffffffff813bba99>] bit_update_start+0x29/0x60
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427027] [<ffffffff813bb292>] fbcon_switch+0x3b2/0x560
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427030] [<ffffffff814331e9>] redraw_screen+0x179/0x240
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427032] [<ffffffff8142880b>] complete_change_console+0x4b/0xf0
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427034] [<ffffffff8142a053>] change_console+0x63/0xb0
Apr 9 08:06:44 jason-ubuntu kernel: [ 37.427036]...

Read more...

Revision history for this message
Smot (smot-msn) wrote :

An update this morning to 3.5.0-27 and the error has happened twice since.

Linux PowerTrev 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Running on a Dell Latitude E6320, Core I7, 8Gb RAM.

tags: removed: kernel-key
tags: removed: performing-bisect
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Quantal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Committed
Revision history for this message
Miklos Juhasz (mjuhasz) wrote :

@mulx: I have 3.2.0-40.64 in my ppa with "GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits" reverted. ppa:mjuhasz/backports

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #67)
> (In reply to comment #61)
> > Created attachment 76448 [details]
> > i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> > snb, v2)
> >
> > Any updates?
>
> Mikhail,
>
> Could you please try patch:
> [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Hm, seems better but problem still here

[59120.008798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[59120.008802] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[59120.012173] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 77692
i915_error_state (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 77693
dmesg (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message
In , Chris Wilson (ickle) wrote :

\o/ It kicked the right ring.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #72)
> \o/ It kicked the right ring.

So is this normal?

Revision history for this message
In , Chris Wilson (ickle) wrote :

It's the expected 'improved' recovery behaviour for this bug.

Revision history for this message
Carlos Correia (carlos-m16e) wrote :

Still having refresh problems with kernel 3.5.0-27 on 12.10.

Revision history for this message
Christophe CATARINA (christophe.catarina) wrote :

I have the same problem on xubuntu 12.04 x64.
3.2.0-38 works fine.
3.2.0-39 and the new 3.2.0-40 freeze my system after few minutes or hours.

Extracted from 'sudo lshw' :
MB: H67MA-USB3-B3 (Gigabyte Technology Co., Ltd.)
BIOS : Award Software International, Inc. version: F3 03/31/2011
CPU : Intel(R) Core(TM) i3-2125 CPU @ 3.30GHz
Memory : 8GiB

Display : VGA compatible controller
produit: 2nd Generation Core Processor Family Integrated Graphics Controller
version: 09
bits: 64 bits
configuration: driver=i915 latency=0
ressources: irq:42 mémoire:fb800000-fbbfffff mémoire:e0000000-efffffff portE/S:ff00(taille=64)

Revision history for this message
Nicolas Krzywinski (nsk7even) wrote :

Confirming 3.2.0-40 does not fix the freeze. Switched back to 3.2.0-38 which works fine.

Exploit: Hover over Docky (screen freezes immediately, only mouse remains movable)

Ubuntu LTS 12.04
Compiz 0.9.7.12
Docky version: 2.2.0 bzr docky r1835 ppa
Kernels: 3.2.0-40, 3.2.0-39 affected (3.2.0-38 and prior not affected)

ThinkPad T420
Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz
2nd Generation Core Processor Family Integrated Graphics Controller
driver=i915 latency=0

Revision history for this message
IanG (ian-usts) wrote :

Immediate lockup on 3.2.0-39 and 3.2.0-40 (12.04 LTS) can only use mouse. Even terminal/console get stuck after a short while

Yes this is happening to me on a Sandybridge machine. Just to add, tried the suggested temporary workaround of introducing Gnome Classic desktop but this failed too.

Unfortunately no 3.2.0-38 to fall back on which was working sweetly before this regression

Revision history for this message
Unilogic Networks Package Master (unnet-pkg-master) wrote :

I'm still experiencing issues with 3.5.0-27 (but possibly less frequent):

optiplex:~$ dmesg
[ 5693.816748] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5693.816752] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5693.819735] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

optiplex:~$ uname -a
Linux optiplex 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

optiplex:~$ lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

optiplex:~$ cat /proc/cpuinfo | grep i3
model name : Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz

Revision history for this message
Dabo Ross (daboross) wrote :

I am getting this error in 12.10, usually when running some programs on SandyBridge mobile, and also others with bumblebee.
I have a optimus laptop.
I haven't gotten this error while only running with bumblebee.

When it happens I get no graphics glitches or anything other then my computer suddenly stops responding for around 5 seconds.
Then I get around 10-15 windows that are asking me if my system has hung recently.
After I say "yes" and then "around once a day"(I think that is the option), I get a system report saying that apport-gpu-error.py has crashed. with the "bug title":
[sandybridge-m-gt2] GPU lockup IPEHR: 0x42180000 IPEHR: 0x010000000
Next time this happens I will record everything the bug report says.

Revision history for this message
Günther Fröhlich (kuddel-mail) wrote :

Running Brad Figg's kernel now for a whole day (i386) on HP Elitebook 2760p...
No problems at all. Thank you!

uname -a
Linux workpad 3.5.0-27-generic #47~lp1140716 SMP Wed Apr 3 00:11:47 UTC 2013 i686 i686 i686 GNU/Linux

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

I have installed kernel 3.5.0-28 from the -proposed repositories (12.04.2) and have been using it for over a day, with no problems.
There is no "Fix committed" for linux-lts-quantal, so I am not sure if the kernel in proposed is meant as a fix, or if it is simply the next kernel update to come out in a few weeks.
In any case, so far so good.

What gave me a hint was the linked branch: lp:ubuntu/precise-proposed/linux-lts-quantal... but again, no fix committed tag for linux-lts-quantal.

I will post again if I encounter the bug with this kernel.

Revision history for this message
Alessio (alessio) wrote :

I installed the 3.5.0-28-generic (linux-image-3.5.0-28-generic_3.5.0-28.47~precise1_amd64.deb) from precise-proposed repositories and the problem is not fixed

dmesg show:
[ 854.728692] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 854.728696] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 855.232276] [drm:i915_reset] *ERROR* Failed to reset chip.

I also attached the /sys/kernel/debug/dri/0/i915_error_state file

Revision history for this message
Jason (reeot) wrote :

I am running 3.5.0-27 on 12.10 and got the error although it does not seem to happen nearly as much as -26 did.

 CRON[4235]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
 kernel: [ 9221.738324] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
 kernel: [ 9221.738329] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
Ralf Hersel (ralf.hersel) wrote :

Even with 3.5.0-27 it happens as frequent as before (every 2-3 minutes):

[ 2280.147145] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2280.147361] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Working with this kernel is almost impossible.

Revision history for this message
linrunner (linrunner) wrote :

I'm testing 3.5.0-28.47 (from proposed) for 3 days now (12.04.2, ThinkPad X220) and my assessment remains the same as with 3.5.0-27.47~lp1140716:

3.5.0-28.47 doesn't completely eliminate the hangups, but it is a *huge* improvement compared to -26 and -27:
- the hangups occur less frequent, around one or two per day
- hangups are not fatal anymore, i.e. system recovers in a short time so that i even didn't notice them so far – had to look in the syslog for them

Revision history for this message
IanG (ian-usts) wrote :

getting in right pickle with this. Can't seem to regress to any linux image now that works

Revision history for this message
Syslog.eu (syslog) wrote :

I can confirm this also with quantal kernel 3.5.0-27 on my lenovo E530:

uname:
Linux toruk 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 20:00:05 UTC 2013 i686 i686 i686 GNU/Linux

lspci:
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Lenovo Device 5000
 Flags: bus master, fast devsel, latency 0, IRQ 43
 Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 5000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915
 Kernel modules: i915

dmesg:
[19832.492304] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[19832.492309] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[72569.343576] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[72569.343583] [drm:drm_helper_resume_force_mode] *ERROR* failed to set mode on crtc f6671800

It happens randomly 3-5 times per day - the display completely freezes and only mouse is moving. Restart is then necessary. It is impossible to seriously work with the laptop.

Revision history for this message
Tristan Schmelcher (tschmelcher) wrote :

Happening for me too after recent updates. GPU hangs would happen up to multiple times per minute. Originally I was running precise and using the precise kernel/X11 stack. I reinstalled from 12.04.2 media with the quantal kernel/X11 stack, but after getting the latest updates it started happening again.

I've now installed the proposed 3.5.0-28 kernel and it seems to have fixed the problem.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 63542 has been marked as a duplicate of this bug. ***

Steve Conklin (sconklin)
tags: added: verification-needed-precise
tags: added: verification-needed-quantal
Revision history for this message
Steve Conklin (sconklin) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

No more problems on Precise i386 after upgrading from -proposed

root@WS:~# uname -a
Linux WS 3.2.0-41-generic-pae #65-Ubuntu SMP Wed Apr 10 18:45:42 UTC 2013 i686 i686 i386 GNU/Linux

root@WS:~# lspci
...
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
...

tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Éric Piel (pieleric) wrote :

I've been running the proposed kernel (on 12.04.2) since this morning and it indeed seems to work fine. No GPU hung so far (same as 3.5.0-27.47~lp1140716).

# uname -a
Linux pieleric 3.5.0-28-generic #47~precise1-Ubuntu SMP Wed Apr 10 15:10:23 UTC 2013 i686 i686 i386 GNU/Linux

Revision history for this message
Alessio (alessio) wrote :

as I said before with 3.5.0-28.47 kernel from precise-proposed on 64bit system for me the bug isn't fixed

my pc is an intel i3-2120 CPU with integrated hd2000 gpu

Revision history for this message
Daniel Sebastião (dmse) wrote :

I've updated to the proposed kernel (3.2.0-41), and the system is better, but after a while it completely freezes and I have to do a hard shutdown to solve it...

so I returned again to an older kernel version...
x64 system with core i5

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Working well on Quantal with xserver-xorg-video-intel 2:2.20.9-0ubuntu2.1 and
Linux 3.5.0-28-generic #47-Ubuntu SMP Tue Apr 9 18:58:12 UTC 2013 i686 i686 i686 GNU/Linux
No freezes or crash files for several days.

tags: added: verification-done-quantal
removed: verification-needed-quantal
Revision history for this message
shuerhaaken (shkn) wrote :

Same issue is still there with Linux 3.5.0-28-generic.
I have hangs all the time.

Revision history for this message
PshhPshh (kcpi9000) wrote :

12.04 LTS, kernel 3.2.0-40. Having lotta troubles downward the spiral. Things got worse, after I tried to reinstall xserver-xorg-video-intel. Had to purge it back immediately, coz the system was crashing instantly, even faster, than with 3.2.0-39.

I wonder, if there's an alternative to the xserver-xorg-video-intel ? I don't really need 3d that much, but my mouse pointer dissapears when hovered and stopped over some windows. It's quite annoying...

Revision history for this message
Gonzalo Palarea (gpalarea) wrote :

No problems since yesterday with the proposed kernel:

uname -a
Linux escritorio-u 3.2.0-41-generic #65-Ubuntu SMP Wed Apr 10 18:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lspci
...
VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
...

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

shuerhaaken, PshhPshh what is your system architecture, amd64 or i386?

Revision history for this message
linrunner (linrunner) wrote :

Verified with remarks, see #128.

Revision history for this message
Guy Van Sanden (gvs) wrote :

I got an update of the Xorg and Intel drivers today, booted 3.2.0-40 and I had a crash nearly instantly.
back on -38 which seems to be the only thing that works

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

І got error after day of usage 3.2.0-41 i386 kernel:

[35124.212413] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[35124.212423] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[35124.715457] [drm:i915_reset] *ERROR* Failed to reset chip.

problem not completely fixed

tags: added: verification-failed-precise
removed: verification-done-precise
Revision history for this message
Brad Figg (brad-figg) wrote :

@roman,

It may be that the problem is not completely fixed and it may be that there are more than one issue that people are running into. I don't think changing the tag is the answer here. I think we let this bug close out and open a new one for people still experiencing issues. Given everything that is being reported in this bug there is most likely multiple issues here. So, please open a new bug and come back here and add a comment on what that bug number is.

Revision history for this message
Brad Figg (brad-figg) wrote :

@guy

Please see comment #145 and open a new bug.

Revision history for this message
Éric Piel (pieleric) wrote :

@Guy @PshhPshh: Note that kernel 3.2.0-40 doesn't contain the fix. You need to try 3.2.0-41, which is currently only in -proposed.

PshhPshh, there is no other driver for the intel card. The only thing you can try is the "hardware enablement stack" by doing something like this (but from my experience, it doesn't change much things):
sudo apt-get install linux-generic-lts-quantal xserver-xorg-lts-quantal

Revision history for this message
IanG (ian-usts) wrote :

@Brad-figg.

What if there are client end-users? Do we just cut 'em loose? Because you can't stand over other people's machines in the hope that 3.2.0-41 works or leave them to their own devices if they haven't a clue how to mess around with -proposed kernels.

The only thing we'd be able to do in that case was lock down their updates to an older kernel and keep them there until next LTS.

That's the way I see it.

Revision history for this message
Brad Figg (brad-figg) wrote :

@ian,

The original kernel is already in -updates. There is a fix in the current -proposed kernel that seems to fix the issue for some people and not others. That's a likely sign that there are multiple issues. We intend to continue to work to find solutions to all the issues.

Revision history for this message
Steve Conklin (sconklin) wrote :

There are 3.2.0 bisection kernels we're using in an attempt to locate this problem, located here:

http://people.canonical.com/~sconklin/precise-bisection/

The most recent one is in the top directory, and all older ones are in the old/ directory.

Please test them in order, and report in this bug for each one whether you still have the problem or not. Include the kernel version in the report.

Thanks, this testing is valuable.

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :
Revision history for this message
Steve Conklin (sconklin) wrote :

Roman, I'll have one there in a few minutes

Revision history for this message
Steve Conklin (sconklin) wrote :

ok, they're up, both the first and second bisection. The first one is in the old/ directory.

Thanks!

Revision history for this message
Stephan Springer (geryon) wrote :

Changing back to verification-done-precise since I agree with Brad here.

Personally, I have tried out this new 3.5 kernel from -proposed on precise for three days now, and I did not see another GPU hang yet.

tags: added: verification-done-precise
removed: verification-failed-precise
Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

spcbisect01 from http://people.canonical.com/~sconklin/precise-bisection/old/

root@WS:~# uname -a
Linux WS 3.2.0-40-generic #63~spcbisect01 SMP Thu Apr 18 19:41:05 UTC 2013 i686 i686 i386 GNU/Linux

І got error after few login/logout

[ 260.773892] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 260.773902] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 260.777171] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 97091 at 96758, next 97213)

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

spcbisect02 from http://people.canonical.com/~sconklin/precise-bisection/

root@WS:~# uname -a
Linux WS 3.2.0-40-generic #63~spcbisect02 SMP Thu Apr 18 19:22:24 UTC 2013 i686 i686 i386 GNU/Linux

I again got error after few login/logout

[ 561.086415] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 561.086425] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 561.089716] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1020148 at 1020145, next 1020149)

Revision history for this message
Steve Conklin (sconklin) wrote :

Hang on and I'll get the next set of kernels built. THANKS

Revision history for this message
Steve Conklin (sconklin) wrote :

ok, round three is up.

info to reproduce:
# bad: [985689ad1c3211f4f3a9ce0e2371847320ba873f] UBUNTU: Ubuntu-3.2.0-40.64
# good: [ba89d2a7ca8233e29c9fdeabefb7fdbb6775626e] UBUNTU: Ubuntu-3.2.0-39.62
git bisect start 'Ubuntu-3.2.0-40.64' 'Ubuntu-3.2.0-39.62'
# bad: [d07543725cf6aabcb077501ad296aa57c76341e5] ftrace: Call ftrace cleanup module notifier after all other notifiers
git bisect bad d07543725cf6aabcb077501ad296aa57c76341e5
# bad: [c7588e84db4c69868ed9889b95129050d6463715] x86: Do not leak kernel page mapping locations
git bisect bad c7588e84db4c69868ed9889b95129050d6463715

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

Steve, I think you have a mistake:
good kernel: 3.2.0-28 and older
bad kernel: 3.2.0-29 and latest

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

Excuse me, correct is:
good kernel: 3.2.0-38 and older
bad kernel: 3.2.0-39 and latest

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

spcbisect03 from http://people.canonical.com/~sconklin/precise-bisection/

uname -a
Linux WS 3.2.0-40-generic #63~spcbisect03 SMP Fri Apr 19 00:39:14 UTC 2013 i686 i686 i386 GNU/Linux

I got error in /var/log/Xorg.0.log:

[ 3583.655] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 3583.655]
Backtrace:
[ 3583.679] 0: /usr/bin/X (xorg_backtrace+0x37) [0xba87a7]
[ 3583.679] 1: /usr/bin/X (mieqEnqueue+0x223) [0xb86af3]
[ 3583.679] 2: /usr/bin/X (0xa20000+0x4ccd5) [0xa6ccd5]
[ 3583.679] 3: /usr/bin/X (xf86PostMotionEventM+0xf9) [0xaadd99]
[ 3583.679] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x1c2000+0x3c3d) [0x1c5c3d]
[ 3583.679] 5: /usr/lib/xorg/modules/input/evdev_drv.so (0x1c2000+0x524f) [0x1c724f]
[ 3583.679] 6: /usr/bin/X (0xa20000+0x78381) [0xa98381]
[ 3583.679] 7: /usr/bin/X (0xa20000+0x9fce8) [0xabfce8]
[ 3583.679] 8: (vdso) (__kernel_sigreturn+0x0) [0x1e2400]
[ 3583.679] 9: (vdso) (__kernel_vsyscall+0x2) [0x1e2416]
[ 3583.679] 10: /lib/i386-linux-gnu/libc.so.6 (ioctl+0x19) [0x63bad9]
[ 3583.679] 11: /usr/lib/i386-linux-gnu/libdrm.so.2 (drmIoctl+0x34) [0xc5e9a4]
[ 3583.679] 12: /usr/lib/i386-linux-gnu/libdrm_intel.so.1 (0xf3d000+0x899d) [0xf4599d]
[ 3583.679] 13: /usr/lib/i386-linux-gnu/libdrm_intel.so.1 (drm_intel_bo_mrb_exec+0x4a) [0xf3f30a]
[ 3583.679] 14: /usr/lib/xorg/modules/drivers/intel_drv.so (0x2bb000+0x7e18) [0x2c2e18]
[ 3583.679] 15: /usr/lib/xorg/modules/drivers/intel_drv.so (0x2bb000+0xca25) [0x2c7a25]
[ 3583.679] 16: /usr/bin/X (_CallCallbacks+0x3c) [0xa5c28c]
[ 3583.679] 17: /usr/bin/X (FlushAllOutput+0x36) [0xbabd36]
[ 3583.679] 18: /usr/bin/X (FlushIfCriticalOutputPending+0x1e) [0xbabeae]
[ 3583.679] 19: /usr/bin/X (0xa20000+0x3773d) [0xa5773d]
[ 3583.679] 20: /usr/bin/X (0xa20000+0x2535a) [0xa4535a]
[ 3583.679] 21: /lib/i386-linux-gnu/libc.so.6 (__libc_start_main+0xf3) [0x56e4d3]
[ 3583.679] 22: /usr/bin/X (0xa20000+0x25699) [0xa45699]
[ 3583.679] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 3583.679] [mi] mieq is *NOT* the cause. It is a victim.
[ 3584.207] [mi] EQ overflow continuing. 100 events have been dropped.
[ 3584.207]

Revision history for this message
luca (llucax) wrote :

Working great for me:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.10
Release: 12.10
Codename: quantal
$ uname -a
Linux nibbler 3.5.0-28-generic #47-Ubuntu SMP Tue Apr 9 19:03:54 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
(kernel from proposed)

Revision history for this message
Renê Barbosa (renebarbosa) wrote :

I'm also affected. Both 3.2.0-39-generic and 3.2.0-40-generic are with the same issue.

Revision history for this message
Bryce Harrington (bryce) wrote :

This is important:

GPU hangs always have basically the same symptoms so are very hard to tell apart. It is _quite likely_ many people subbed to this bug are having unrelated actual bugs. That is going to make verification of any fix really hard.

I would strongly recommend anyone that believes they have this bug to collect the following file while their system is hung, and post it here:

   /debug/dri/0/i915_error_state

This must be collected while the system is hung (e.g. by ssh'ing into it). The registers get zero'd out once the machine reboots so don't bother collecting it when your system *isn't* frozen.

Two compare two people's hangs, compare the top twenty lines or so. Especially look at values of PGTBL_ER, EIR, IPEHR. The actual codes may vary from person to person, but look for the 0x00000000's vs. non-0x00000000's.

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :
Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :
Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

I upload 2 files /debug/dri/0/i915_error_state for comment #155 and #156

Revision history for this message
hillofbeans (francis-p-jones) wrote :

@Bryce: where is this file? I have ssh'ed into my hung machine and can't see a /debug directory. I am running

Linux atticus 3.2.0-40-generic-pae #64-Ubuntu SMP Mon Mar 25 21:44:41 UTC 2013 i686 i686 i386 GNU/Linux

and a quad-core i5:

mrnx@atticus:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
[...cut...]

I'd like to help out if I can.

Revision history for this message
hillofbeans (francis-p-jones) wrote :

@Bryce, sorry for the previous comment. I found the location and am attaching /debug/dri/0/i915_error_state

Revision history for this message
Alessio (alessio) wrote :

attached there is the /debug/dri/0/i915_error_state with 3.5.0-26 kernel

with 3.5.0-26 and 3.5.0-27 kernels after few minutes Xorg get temporary lockups and after about an hour I can only move the mouse pointer and I can kill Xorg with alt+print+backspace, but after that graphics will be corrupted

dmesg with 3.5.0-26:

[ 175.345408] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 175.345412] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 175.348230] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
[ 200.244562] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 200.244844] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off

Revision history for this message
Alessio (alessio) wrote :

on comment #125 there is the /debug/dri/0/i915_error_state with 3.5.0-28 kernel, with this kernel compiz and unity crash immediately after the login and graphics quickly become corrupted, I didn't waited Xorg to be locked up as anyway it isn't possible to work with that kernel

with all these kernels I can kill Xorg and log into a console to grab /debug/dri/0/i915_error_state without need to do it with ssh from another pc

Revision history for this message
Robert Hooker (sarvatt) wrote :

Guys, we know 3.5.0-26-generic, 3.5.0-27-generic, 3.2.0-39-generic, and 3.2.0-40-generic all contain the problem, it is making it harder to focus on the bug with so many people reporting that they are broken.

Steve: This has already been bisected on both 3.2 and 3.5, http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=commit;h=30ae292ec68402c773ddc8c80f83f7cd84289a39 was the cause

Revision history for this message
hillofbeans (francis-p-jones) wrote :

OK, so the cause is identified and I see a fix is committed. Can anyone comment on what a course of action could be for someone who just needs to use their machine? It's been nearly a week since the update that caused my machine to become unusable. I'm guessing: either an updated kernel in the regular repository will be coming out in the next few days or it won't... in which case getting a new kernel from -proposed might be the right idea. Comments? Thanks -

Revision history for this message
IanG (ian-usts) wrote :

@hillofbeans #173,

This is exactly my stance - although I think you'll find it's been well over a week since this problem started. The reason I'm sticking people on 12.04 LTS is because if you go onto the Ubuntu forums with such kernel problems on non-LTS you can get heavily criticized for "experimenting" on the back of a requirement for stability. So guess my general annoyance at "selling" LTS to people with such a promise. Anyway, there are people out there like me actively pushing others to get out of mainstream systems and into Linux derivatives, but those end-users just want their systems to work, whilst I can't drive everywhere sorting several Sandybridge PCs - it's just not practical, especially if people are not that confident with CLI or Synaptic instructions expressed to them over the phone.

Anyway, my temporary solution to the above has been to lockdown previously working kernels and I won't be asking people to update their machines until I know a fix is ready and stable.

Revision history for this message
Kevin Krumwiede (kjkrum) wrote :

@hillofbeans and @IanG:

I don't know if the problem that led me to this launchpad issue is exactly the same; there seem to be a whole slew of problems related to Intel graphics. But my problem is described in comment 1, and my solution in comment 8 of this thread: http://ubuntuforums.org/showthread.php?t=2136772 The problem seems not to have been in the kernel, per se, but in the kernel's incompatibility with the "-lts-quantal" packages in Precise.

Revision history for this message
Sergio (sergio-otero) wrote :

For me:

3.2.0-38: ok
3.2.0-39: hang in less than 2 minutes
3.2.0-40: hang in less than 2 minutes
3.2.0-40 spcbisect03 from http://people.canonical.com/~sconklin/precise-bisection/ hang in less than 2 minutes

I have collected info when hanged via SSH:

uname -a
Linux sergio-System-Product-Name 3.2.0-40-generic #63~spcbisect03 SMP Fri Apr 19 00:55:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

i915_error_state
no error state collected

Xorg.0.log attached (not obvious error, but i'm not an expert)

I have no "error state". Am i doing something wrong?

Maybe is not related, but i can add that i've been having a similar bug since the initial release of 12.04 a year ago
This bug has a similar consecuence (mouse moving but cannot do anything) but it triggers for me only once a week more or less and always after login and before desktop is fully loaded (never after desktop fully loaded like the bug reported in 3.2.0-39 and 3.2.0-40)

Revision history for this message
Steve Conklin (sconklin) wrote :

A patch has been identified which is known to cause problems in both precise and quantal kernels. Since people running both series are piling onto this bug, I'm posting the links for each.

Please test the appropriate kernel for you installation, and report whether you still have the problem. This is important, as there are some indications that we may be chasing multiple problems with these bugs.

Please report both success and failure, and include youe kernel version when you make your report.

Precise test kernels are located here:

http://people.canonical.com/~sconklin/precise-revert-4c443ec/

Quantal test kernels are located here:

http://people.canonical.com/~sconklin/quantal-revert-4c443ec/

Thank you for your help!

Revision history for this message
Sergio (sergio-otero) wrote :

I've been using http://people.canonical.com/~sconklin/precise-revert-4c443ec/ for an hour and everything seems perfect now

I can do more tests or collect more info if needed:

uname -a
Linux sergio-System-Product-Name 3.2.0-41-generic #65~spcreverted30ae292 SMP Mon Apr 22 17:00:01 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
stepping : 7
microcode : 0x14
cpu MHz : 1600.000
....

lspci -v -s `lspci | awk '/VGA/{print $1}'`

VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Device 844d
    Flags: bus master, fast devsel, latency 0, IRQ 53
    Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
    Memory at c0000000 (64-bit, prefetchable) [size=256M]
    I/O ports at f000 [size=64]
    Expansion ROM at <unassigned> [disabled]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915

Revision history for this message
In , B-harrington (b-harrington) wrote :

Chris, what is the upstream status for the ring kicker patch? Is that likely to get incorporated upstream, or do you feel it needs further polish before it's ready? Would this patch incur some risk of regressions in other areas were it be backported for inclusion in Ubuntu?

Revision history for this message
Günther Fröhlich (kuddel-mail) wrote :

I've also been using this kernel since yesterday with no problems on hp 2760p.
http://people.canonical.com/~sconklin/quantal-revert-4c443ec/

Linux workpad 3.5.0-28-generic #47~spcrevertedb777ab9 SMP Mon Apr 22 16:24:09 UTC 2013 i686 i686 i686 GNU/Linux

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Hewlett-Packard Company Device 162a
 Flags: bus master, fast devsel, latency 0, IRQ 45
 Memory at 94000000 (64-bit, non-prefetchable) [size=4M]
 Memory at 80000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 4000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #76)
> Chris, what is the upstream status for the ring kicker patch? Is that
> likely to get incorporated upstream, or do you feel it needs further polish
> before it's ready? Would this patch incur some risk of regressions in other
> areas were it be backported for inclusion in Ubuntu?

Merged for 3.10 as

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

    drm/i915: Resurrect ring kicking for semaphores, selectively

Nothing else planned for now, but I think we can just keep this bug here open in case we stumble across a new idea. And it seems to be good honey to attrack all the me,too reports ;-)

Revision history for this message
Sergio (sergio-otero) wrote :

Sorry, i clicked to "Fix Released" to see a description :-(
I didn't know i had permission to change it and cannot go back now...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
Changed in linux (Ubuntu Precise):
status: Fix Released → Fix Committed
Revision history for this message
In , Tomwij-1 (tomwij-1) wrote :

(In reply to comment #65)
> Kernel 3.8.0 gentoo-sources

Did you report this at the Gentoo Bugzilla?

When you do, please attach /debug/dri/0/i915_error_state

Revision history for this message
Alessio (alessio) wrote :

there is no 3.5.0 kernel for precise on http://people.canonical.com/~sconklin/precise-revert-4c443ec/
anyway with 3.2.0-41.65~spcreverted30ae292_amd64 kernel after about 2hour and half everything seems ok

Revision history for this message
Roman Shipovskij (roman-shipovskij) wrote :

After 2 days of usage kernel from http://people.canonical.com/~sconklin/precise-revert-4c443ec/ all ok

root@WS:~# uname -a
Linux WS 3.2.0-41-generic #65~spcreverted30ae292 SMP Mon Apr 22 16:42:40 UTC 2013 i686 i686 i386 GNU/Linux

root@WS:~# lspci -v -s `lspci | awk '/VGA/{print $1}'`
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: ASUSTeK Computer Inc. Device 844d
 Flags: bus master, fast devsel, latency 0, IRQ 47
 Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at f000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915
 Kernel modules: i915

Revision history for this message
DAVID (gron-h) wrote :

All is OK here too
# uname -a
# lspci -v -s `lspci | awk '/VGA/{print $1}'`
Linux marcel 3.5.0-28-generic #47~spcrevertedb777ab9 SMP Mon Apr 22 16:13:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Dell Device 0493
 Flags: bus master, fast devsel, latency 0, IRQ 44
 Memory at e1400000 (64-bit, non-prefetchable) [size=4M]
 Memory at d0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 4000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

Revision history for this message
Przemek Wesolek (pwes) wrote :

Two days of running under Steve Conklin's kernel for precise (#177), not a single GPU hangup since.

# lspci -v -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Dell Device 0571
 Flags: bus master, fast devsel, latency 0, IRQ 48
 Memory at f2400000 (64-bit, non-prefetchable) [size=4M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 5000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915
 Kernel modules: i915

Revision history for this message
Michael Basse (michael-alpha-unix) wrote :

I can confirm also that the new kernels from proposed fixed the gpu crash itself

uname -a
Linux bestbuntu 3.5.0-28-generic #47~precise1-Ubuntu SMP Wed Apr 10 15:12:40 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

But i am still getting a crash-report window which is telling me that a problem happened but i am not facing gpu hangs itself anymore (just the apport message about it). I can reprodue it everytime my Notebook comes back from suspend.

Revision history for this message
Michael Basse (michael-alpha-unix) wrote :

As it seems the crash-report is not related to the bug we are facing here (so its not related to the intel driver) This is the last crash report i got (after that i changed to the proposed kernel)

-rw------- 1 root whoopsie 3199975 Apr 23 14:53 xserver-xorg-video-intel.2013-04-23_14:53:19.785507.crash

I will check what is causing the abbort-messages now and will create a new bug.

Revision history for this message
Jane Silber (silbs) wrote :

I've been using the kernel from comment #177 for about 24 hours on a Dell XPS 13. All looks good so far.

Revision history for this message
Jane Silber (silbs) wrote :

(Using the quantal one, not precise)

Revision history for this message
Nicolas Krzywinski (nsk7even) wrote :

http://people.canonical.com/~sconklin/precise-revert-4c443ec/ is working for me as well.

Just to remember: I could reproduce the freeze on my system immediately with hovering over Docky and this is working now without a freeze, so this kernel solves at least my bug variant! :-)

Revision history for this message
harpreet bhatia (bluepicaso) wrote :

my story,
Previously i was running 12.10 with kernal 3.5.0-25-generic, see my comment #54
so yesterday, i started downloading 13.04 but i never booted up from DVD, tried USB got partman error, all checksum of the ISo matched.
Well then i finally made teh choice for switching to 12.04 and how my kernal is 3.2.0-40-generic
and when i was chnaging fonts via gnome tweak took (i am on gnome classic) my system freezed and then it freezed when it was idle ans was running xscreensaver and screen never resumed, just blank screen.

Please let me know. what should i do now?

Revision history for this message
Nicolas Krzywinski (nsk7even) wrote :

Revert to Kernel 3.2.0-38 until 3.2.0-41 is released. You can do this with uninstalling 3.2.0-40 and 3.2.0-39.

Revision history for this message
Alessio (alessio) wrote :

I installed the new 3.5.0-28.48 kernel from precise-proposed and after two days of testing everything seems ok, for me the bug seems fixed

Revision history for this message
Natalia Morandeira (natalia-sm83) wrote :

Hi, reproduce the error (the screen freezes) and I am using the Kernel 3.5.0-27-generic. What should I do? Who can I change to the proper kernel (Ubuntu 12.04 LTS 64bits)? I am not a Linux expert. When I previously tried to do important changes, I got a black screen previous to the booting screen, and I had to format and re-install Ubuntu. Please, could you give me detailed instructions on have to change the kernel? (or suggest me a confiable web where to read about this). Thanks!

Revision history for this message
Nick Demou (ndemou) wrote :

seems also fixed for me after installing kernel 3.5.0-28-generic from precise-proposed (at least after the first 3 hours of testing). Many thanks to all who helped.

uname -a
Linux ndXPS13 3.5.0-28-generic #48~precise1-Ubuntu SMP Wed Apr 24 21:43:05 UTC 2013 i686 i686 i386 GNU/Linux

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"

Revision history for this message
Varun RS (varun7rs) wrote : Re: [Bug 1140716] Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge
Download full text (9.9 KiB)

Thanks a lot for your help Nick...I followed the instructions in the first
mail but then I lost track of what to do after rebooting. The preference
file in etc folder was already there and the next command for upgradable
packages did not work.So, that is exactly where I lost track of it.And I
did not understand what the second mail was about.Do you want me to run
them in the terminal?I did try but if you could just tell me what commands
I gotta run, it will be very helpful.

Thank You Again

On Sun, Apr 28, 2013 at 4:10 PM, Nick Demou <email address hidden> wrote:

> seems also fixed for me after installing kernel 3.5.0-28-generic from
> precise-proposed (at least after the first 3 hours of testing). Many
> thanks to all who helped.
>
> uname -a
> Linux ndXPS13 3.5.0-28-generic #48~precise1-Ubuntu SMP Wed Apr 24 21:43:05
> UTC 2013 i686 i686 i386 GNU/Linux
>
> cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1173768).
> https://bugs.launchpad.net/bugs/1140716
>
> Title:
> [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
> Sandybridge
>
> Status in “linux” package in Ubuntu:
> Invalid
> Status in “linux-lts-quantal” package in Ubuntu:
> Invalid
> Status in “linux” source package in Precise:
> Fix Committed
> Status in “linux-lts-quantal” source package in Precise:
> Confirmed
> Status in “linux” source package in Quantal:
> Fix Committed
> Status in “linux-lts-quantal” source package in Quantal:
> Invalid
> Status in “linux” source package in Raring:
> Invalid
> Status in “linux-lts-quantal” source package in Raring:
> Invalid
>
> Bug description:
> I'm getting errors about GPU hangs every minute or so (usually only
> when using FF and scrolling a webpage or something). I also get an
> annoying ubuntu dialog saying there is a "system error".
>
> This didn't happen with 3.5.0-24-generic.
>
> Here is the dmesg:
> [15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.658405] ------------[ cut here ]------------
> [26285.658472] WARNING: at
> /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505
> gen6_enable_rps+0x706/0x710 [i915]()
> [26285.658474] Hardware name: SATELLITE Z830
> [26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_d...

Revision history for this message
Natalia Morandeira (natalia-sm83) wrote :
Download full text (9.5 KiB)

Thanks @Nick Demou ! I followed your instructions. Thanks a lot, it seems
fixed... :)

2013/4/28 Nick Demou <email address hidden>

> seems also fixed for me after installing kernel 3.5.0-28-generic from
> precise-proposed (at least after the first 3 hours of testing). Many
> thanks to all who helped.
>
> uname -a
> Linux ndXPS13 3.5.0-28-generic #48~precise1-Ubuntu SMP Wed Apr 24 21:43:05
> UTC 2013 i686 i686 i386 GNU/Linux
>
> cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1140716
>
> Title:
> [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
> Sandybridge
>
> Status in “linux” package in Ubuntu:
> Invalid
> Status in “linux-lts-quantal” package in Ubuntu:
> Invalid
> Status in “linux” source package in Precise:
> Fix Committed
> Status in “linux-lts-quantal” source package in Precise:
> Confirmed
> Status in “linux” source package in Quantal:
> Fix Committed
> Status in “linux-lts-quantal” source package in Quantal:
> Invalid
> Status in “linux” source package in Raring:
> Invalid
> Status in “linux-lts-quantal” source package in Raring:
> Invalid
>
> Bug description:
> I'm getting errors about GPU hangs every minute or so (usually only
> when using FF and scrolling a webpage or something). I also get an
> annoying ubuntu dialog saying there is a "system error".
>
> This didn't happen with 3.5.0-24-generic.
>
> Here is the dmesg:
> [15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
> elapsed... GPU hung
> [26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
> [26285.658405] ------------[ cut here ]------------
> [26285.658472] WARNING: at
> /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505
> gen6_enable_rps+0x706/0x710 [i915]()
> [26285.658474] Hardware name: SATELLITE Z830
> [26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate
> libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2
> snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel
> kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel
> snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev
> bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops
> snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth
> mac...

Read more...

Revision history for this message
annnomius (annnomius) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

the precise test kernel from comment #177 works for me (no more constant crashes on a core i5-based p8-h67-m pro asus motherboard).
also, ubuntu 13.04 also appears to be stable on my system, so it looks like the ubuntu developers are doing a good job stabilizing the graphics system.
thanks!

::ann

Revision history for this message
In , Longerdev (longerdev) wrote :

>Did you report this at the Gentoo Bugzilla?

>When you do, please attach /debug/dri/0/i915_error_state

Now no report in gentoo bugzilla (so as in kernel they no have patches intel drivers). But now with it patch, I can't repeat bug 2 weeks on kernel 3.9-rc6. But I no test with blender (when I try use blender, GPU hung reapeted for 1-5 minutes).

Changed in linux (Ubuntu Raring):
status: Invalid → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Raring):
status: New → Confirmed
Revision history for this message
tulskiy (tulskiy) wrote :

Just had this issue twice after upgrading from qantal to raring :( I'm getting "(EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed" in xorg log and system freezes and I have to do a hard reset.

uname -a
Linux 3.8.0-19-generic #29-Ubuntu SMP Wed Apr 17 18:19:42 UTC 2013 i686 i686 i686 GNU/Linux

lspci -v -s `lspci | awk '/VGA/{print $1}'`
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Hewlett-Packard Company Device 2abf
 Flags: bus master, fast devsel, latency 0, IRQ 50
 Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
 Memory at c0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at f000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 78692
i915_error_state (kernel 3.9 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 78693
i915_error_state (kernel 3.9 Fedora)

Revision history for this message
Jane Silber (silbs) wrote :

Crashes have started happening for me again. Filed bug 1175084 in case it's different, but that is likely a dupe of this one. I'm using the Quantal kernel from Steve's comment #177

tags: added: kernel-da-key
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge
Download full text (14.0 KiB)

This bug was fixed in the package linux - 3.2.0-41.66

---------------
linux (3.2.0-41.66) precise-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1172464

  [ Steve Conklin ]

  * Revert "drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for
    scanline waits"
    - LP: #1140716

  [ Upstream Kernel Changes ]

  * fbcon: fix locking harder
    - LP: #1168961, #1169380

linux (3.2.0-41.65) precise-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1167436

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: Fix quieting of audit messages for network
    mediation
    - LP: #1163259

  [ Steve Conklin ]

  * SAUCE: Update configs for new efivars option
    - LP: #1164646

  [ Upstream Kernel Changes ]

  * Revert "powerpc/eeh: Fix crash when adding a device in a slot with DDW"
    - LP: #1164646
  * Input: cypress_ps2 - fix trackpadi found in Dell XPS12
    - LP: #1103594
  * btrfs: Init io_lock after cloning btrfs device struct
    - LP: #1164646
  * md: protect against crash upon fsync on ro array
    - LP: #1164646
  * NFS: Don't allow NFS silly-renamed files to be deleted, no signal
    - LP: #1164646
  * SUNRPC: Don't start the retransmission timer when out of socket space
    - LP: #1164646
  * storvsc: Initialize the sglist
    - LP: #1164646
  * dc395x: uninitialized variable in device_alloc()
    - LP: #1164646
  * ARM: VFP: fix emulation of second VFP instruction
    - LP: #1164646
  * ARM: fix scheduling while atomic warning in alignment handling code
    - LP: #1164646
  * md: fix two bugs when attempting to resize RAID0 array.
    - LP: #1164646
  * md: raid0: fix error return from create_stripe_zones.
    - LP: #1164646
  * proc connector: reject unprivileged listener bumps
    - LP: #1164646
  * ath9k: fix RSSI dummy marker value
    - LP: #1164646
  * ath9k_htc: fix signal strength handling issues
    - LP: #1164646
  * mwifiex: correct sleep delay counter
    - LP: #1164646
  * cifs: ensure that cifs_get_root() only traverses directories
    - LP: #1164646
  * xen/pci: We don't do multiple MSI's.
    - LP: #1164646
  * dm: fix truncated status strings
    - LP: #1164646
  * dm snapshot: add missing module aliases
    - LP: #1164646
  * drm/i915: Don't clobber crtc->fb when queue_flip fails
    - LP: #1164646
  * ARM: 7663/1: perf: fix ARMv7 EVTYPE_MASK to include NSH bit
    - LP: #1164646
  * hwmon: (pmbus/ltc2978) Fix peak attribute handling
    - LP: #1164646
  * hwmon: (pmbus/ltc2978) Use detected chip ID to select supported
    functionality
    - LP: #1164646
  * hwmon: (sht15) Check return value of regulator_enable()
    - LP: #1164646
  * hw_random: make buffer usable in scatterlist.
    - LP: #1164646
  * ALSA: vmaster: Fix slave change notification
    - LP: #1164646
  * drm/radeon: add primary dac adj quirk for R200 board
    - LP: #1164646
  * dmi_scan: fix missing check for _DMI_ signature in smbios_present()
    - LP: #1164646
  * iwlwifi: always copy first 16 bytes of commands
    - LP: #1164646
  * HID: add support for Sony RF receiver with USB product id 0x0374
    - LP: #1164646
  * HID: clean up quirk for Sony RF receivers
    - LP: #1164646
  * ...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.7 KiB)

This bug was fixed in the package linux-lts-quantal - 3.5.0-28.48~precise1

---------------
linux-lts-quantal (3.5.0-28.48~precise1) precise-proposed; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #1172327

  [ Steve Conklin ]

  * Revert "drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for
    scanline waits"
    - LP: #1140716

  [ Upstream Kernel Changes ]

  * fbcon: fix locking harder
    - LP: #1167114

linux (3.5.0-28.47) quantal-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1166876

  [ Adam Lee ]

  * SAUCE: Bluetooth: Add support for 105b:e065
    - LP: #1161261

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: Fix quieting of audit messages for network
    mediation
    - LP: #1163259

  [ Upstream Kernel Changes ]

  * NFSv4: Fix the string length returned by the idmapper
    - LP: #1101292
  * Input: cypress_ps2 - fix trackpadi found in Dell XPS12
    - LP: #1103594
  * omap_vout: find_vma() needs ->mmap_sem held
    - LP: #1164714
  * nfsd: Fix memleak
    - LP: #1164714
  * iommu/amd: Initialize device table after dma_ops
    - LP: #1164714
  * svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    - LP: #1164714
  * target: Add missing mapped_lun bounds checking during make_mappedlun
    setup
    - LP: #1164714
  * xen-blkback: do not leak mode property
    - LP: #1164714
  * btrfs: Init io_lock after cloning btrfs device struct
    - LP: #1164714
  * NFS: Don't allow NFS silly-renamed files to be deleted, no signal
    - LP: #1164714
  * SUNRPC: Don't start the retransmission timer when out of socket space
    - LP: #1164714
  * storvsc: Initialize the sglist
    - LP: #1164714
  * dc395x: uninitialized variable in device_alloc()
    - LP: #1164714
  * ALSA: bt87x: Make load_all parameter working again
    - LP: #1164714
  * ARM: VFP: fix emulation of second VFP instruction
    - LP: #1164714
  * ARM: fix scheduling while atomic warning in alignment handling code
    - LP: #1164714
  * doc, xen: Mention 'earlyprintk=xen' in the documentation.
    - LP: #1164714
  * doc, kernel-parameters: Document 'console=hvc<n>'
    - LP: #1164714
  * sony-laptop: fully enable SNY controlled modems
    - LP: #1164714
  * x86: Make sure we can boot in the case the BDA contains pure garbage
    - LP: #1164714
  * cifs: ensure that cifs_get_root() only traverses directories
    - LP: #1164714
  * iscsi-target: Fix immediate queue starvation regression with DATAIN
    - LP: #1164714
  * ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly
    - LP: #1164714
  * ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink
    - LP: #1164714
  * block: fix ext_devt_idr handling
    - LP: #1164714
  * idr: fix a subtle bug in idr_get_next()
    - LP: #1164714
  * block: fix synchronization and limit check in blk_alloc_devt()
    - LP: #1164714
  * firewire: add minor number range check to fw_device_init()
    - LP: #1164714
  * idr: fix top layer handling
    - LP: #1164714
  * sysctl: fix null checking in bin_dn_node_address()
    - LP: #1164714
  * nbd: fsync and kill block device on shutdown
    - LP: #1164714
  * target/pscsi: Fix page increment
    - LP: #1164714
...

Changed in linux-lts-quantal (Ubuntu Precise):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.6 KiB)

This bug was fixed in the package linux - 3.5.0-28.48

---------------
linux (3.5.0-28.48) quantal-proposed; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #1172023

  [ Steve Conklin ]

  * Revert "drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for
    scanline waits"
    - LP: #1140716

  [ Upstream Kernel Changes ]

  * fbcon: fix locking harder
    - LP: #1167114

linux (3.5.0-28.47) quantal-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1166876

  [ Adam Lee ]

  * SAUCE: Bluetooth: Add support for 105b:e065
    - LP: #1161261

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: Fix quieting of audit messages for network
    mediation
    - LP: #1163259

  [ Upstream Kernel Changes ]

  * NFSv4: Fix the string length returned by the idmapper
    - LP: #1101292
  * Input: cypress_ps2 - fix trackpadi found in Dell XPS12
    - LP: #1103594
  * omap_vout: find_vma() needs ->mmap_sem held
    - LP: #1164714
  * nfsd: Fix memleak
    - LP: #1164714
  * iommu/amd: Initialize device table after dma_ops
    - LP: #1164714
  * svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    - LP: #1164714
  * target: Add missing mapped_lun bounds checking during make_mappedlun
    setup
    - LP: #1164714
  * xen-blkback: do not leak mode property
    - LP: #1164714
  * btrfs: Init io_lock after cloning btrfs device struct
    - LP: #1164714
  * NFS: Don't allow NFS silly-renamed files to be deleted, no signal
    - LP: #1164714
  * SUNRPC: Don't start the retransmission timer when out of socket space
    - LP: #1164714
  * storvsc: Initialize the sglist
    - LP: #1164714
  * dc395x: uninitialized variable in device_alloc()
    - LP: #1164714
  * ALSA: bt87x: Make load_all parameter working again
    - LP: #1164714
  * ARM: VFP: fix emulation of second VFP instruction
    - LP: #1164714
  * ARM: fix scheduling while atomic warning in alignment handling code
    - LP: #1164714
  * doc, xen: Mention 'earlyprintk=xen' in the documentation.
    - LP: #1164714
  * doc, kernel-parameters: Document 'console=hvc<n>'
    - LP: #1164714
  * sony-laptop: fully enable SNY controlled modems
    - LP: #1164714
  * x86: Make sure we can boot in the case the BDA contains pure garbage
    - LP: #1164714
  * cifs: ensure that cifs_get_root() only traverses directories
    - LP: #1164714
  * iscsi-target: Fix immediate queue starvation regression with DATAIN
    - LP: #1164714
  * ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly
    - LP: #1164714
  * ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink
    - LP: #1164714
  * block: fix ext_devt_idr handling
    - LP: #1164714
  * idr: fix a subtle bug in idr_get_next()
    - LP: #1164714
  * block: fix synchronization and limit check in blk_alloc_devt()
    - LP: #1164714
  * firewire: add minor number range check to fw_device_init()
    - LP: #1164714
  * idr: fix top layer handling
    - LP: #1164714
  * sysctl: fix null checking in bin_dn_node_address()
    - LP: #1164714
  * nbd: fsync and kill block device on shutdown
    - LP: #1164714
  * target/pscsi: Fix page increment
    - LP: #1164714
  * xen/pat: Disable PAT using pat_enabled...

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Jeff Canipe (jeffcanipe) wrote :

I also have the same issue. I, too, am using an ASUS motherboard with onboard intel graphics.
I decided to bypass using the 3.2... kernel. Instead I used Synaptic to apply the package: linux-generic-lts-quantal. This is currently bringing in the 3.5... version of the kernel. I Just applied latest kernel 3.5.0.28 and the problem seems to have been fixed. Here is the message from my dpkg log file after the maintenace was applied using the update manager: upgrade linux-image-generic-lts-quantal 3.5.0.27.34 3.5.0.28.35. So far no issues.

Revision history for this message
Olivier Febwin (febcrash) wrote :

What's about raring resolution?

Revision history for this message
Matthew Eaton (meaton) wrote :

For me, the graphical glitches from 3.5.0-27 are gone but today I got a crash report and GPU hung errors in syslog (but not in dmesg for some reason):

matt@matt-work:~$ uname -a
Linux matt-work 3.5.0-28-generic #48~precise1-Ubuntu SMP Wed Apr 24 21:42:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
matt@matt-work:~$
matt@matt-work:~$ grep "GPU hung" /var/log/syslog
May 3 08:41:25 matt-work kernel: [ 41.129686] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
May 3 08:41:32 matt-work kernel: [ 47.230798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
May 3 08:41:38 matt-work kernel: [ 53.323907] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
matt@matt-work:~$
matt@matt-work:~$ dmesg | grep "GPU hung"
matt@matt-work:~$

Changed in linux (Ubuntu Raring):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message
Amos Blanton (lightnin9) wrote :

I'm getting: [ 7564.367937] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
On X220, raring ringtail. I use a dock with my thinkpad, although I have had the error when not docked.

Revision history for this message
Filex (filip-brinkmann) wrote :

Problem still exists in 3.7.0.7.

user.21:35~$ uname -a
Linux PC 3.7.0-7-generic #15-Ubuntu SMP Sat Dec 15 16:34:25 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

dmesg:
[77506.186924] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[77506.186928] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Also still happening here on raring:

3.8.0-21-generic #32-Ubuntu SMP Tue May 14 22:16:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Joseph (kyeongsoo-kim) wrote :

Seconded Andreas Hasenack above: Exactly the same kernel (Ubuntu 13.04) on thinkpad X220.

Revision history for this message
In , Freedesktop-l (freedesktop-l) wrote :

Created attachment 79704
i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430

I can reproduce this bug every time I try to quickly drag a Chrome window with a YouTube movie to a secondary monitor connected to my laptop Dell E6430. It is very annoying. Tested on latest kernel 3.10-rc2.

I can give you any additional information you want, test patches, etc. Just please try to fix this :)

Revision history for this message
In , Freedesktop-l (freedesktop-l) wrote :

(In reply to comment #84)
> Created attachment 79704 [details]
> i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430
>
> I can reproduce this bug every time I try to quickly drag a Chrome window
> with a YouTube movie to a secondary monitor connected to my laptop Dell
> E6430.

One more information - you need to enable "Override software rendering list" in chrome://flags

Revision history for this message
Ray (ray-0711) wrote :

Same here on 3.8.0-22-generic

Revision history for this message
peterbo (peterbo) wrote :

Also happening on raring with both kernel 3.8 and 3.9, sandy bridge. This is really annoying as my pc locks up at least once a day, sometimes it can be made responsive again by hitting ctrl alt f6 and then ctrl alt f7.

Is there any progress with this bug at all? I see multiple bug reports about this issue but still no solution or workaround.

Revision history for this message
Maxim Loparev (laplandersan) wrote :
Download full text (4.1 KiB)

Hit by the HUNG bug again on Raring. However it's reported differently, so could be the different from the originating one on precise and Quantal.

dmesg
[19492.484182] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[19492.484186] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Xorg.log(EE)
[mi] EQ overflowing. Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x36) [0x7fcb2d7cb476]
(EE) 1: /usr/bin/X (mieqEnqueue+0x26b) [0x7fcb2d7ac78b]
(EE) 2: /usr/bin/X (0x7fcb2d61b000+0x6d472) [0x7fcb2d688472]
(EE) 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7fcb27aab000+0x5f44) [0x7fcb27ab0f44]
(EE) 4: /usr/bin/X (0x7fcb2d61b000+0x96927) [0x7fcb2d6b1927]
(EE) 5: /usr/bin/X (0x7fcb2d61b000+0xc0328) [0x7fcb2d6db328]
(EE) 6: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fcb2c71e000+0xfbd0) [0x7fcb2c72dbd0]
(EE) 7: /lib/x86_64-linux-gnu/libc.so.6 (ioctl+0x7) [0x7fcb2b43b747]
(EE) 8: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmIoctl+0x28) [0x7fcb2c516338]
(EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x39010) [0x7fcb29ef6010]
(EE) 10: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x3a1f7) [0x7fcb29ef71f7]
(EE) 11: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x60d51) [0x7fcb29f1dd51]
(EE) 12: /usr/bin/X (BlockHandler+0x44) [0x7fcb2d677b94]
(EE) 13: /usr/bin/X (WaitForSomething+0x114) [0x7fcb2d7c87f4]
(EE) 14: /usr/bin/X (0x7fcb2d61b000+0x58811) [0x7fcb2d673811]
(EE) 15: /usr/bin/X (0x7fcb2d61b000+0x4757a) [0x7fcb2d66257a]
(EE) 16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fcb2b36bea5]
(EE) 17: /usr/bin/X (0x7fcb2d61b000+0x478c1) [0x7fcb2d6628c1]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause. It is a victim.
(EE) [mi] EQ overflow continuing. 100 events have been dropped.

repeated 3 times every 100 events dropped till 300

(EE)
(EE) [mi] EQ overflow continuing. 300 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x36) [0x7fcb2d7cb476]
(EE) 1: /usr/bin/X (0x7fcb2d61b000+0x6d472) [0x7fcb2d688472]
(EE) 2: /usr/lib/xorg/modules/input/evdev_drv.so (0x7fcb27aab000+0x5f44) [0x7fcb27ab0f44]
(EE) 3: /usr/bin/X (0x7fcb2d61b000+0x96927) [0x7fcb2d6b1927]
(EE) 4: /usr/bin/X (0x7fcb2d61b000+0xc0328) [0x7fcb2d6db328]
(EE) 5: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fcb2c71e000+0xfbd0) [0x7fcb2c72dbd0]
(EE) 6: /lib/x86_64-linux-gnu/libc.so.6 (ioctl+0x7) [0x7fcb2b43b747]
(EE) 7: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmIoctl+0x28) [0x7fcb2c516338]
(EE) 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x39010) [0x7fcb29ef6010]
(EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x3a1f7) [0x7fcb29ef71f7]
(EE) 10: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fcb29ebd000+0x60d51) [0x7fcb29f1dd51]
(EE) 11: /usr/bin/X (BlockHandler+0x44) [0x7fcb2d677b94]
(EE) 12: /usr/bin/X (WaitForSomething+0x114) [0x7fcb2d7c87f4]
(EE) 13: /usr/bin/X (0x7fcb2d61b000+0x58811) [0x7fcb2d673811]
(EE) 14: /usr/bin/X (0x7fcb2d61b000+0x4757a) [0x7fcb...

Read more...

Revision history for this message
Ray (ray-0711) wrote :

switched to 3.9.0-030900-generic . Bug seems to be resolved.

Revision history for this message
In , Cwawak (cwawak) wrote :

Created attachment 79979
i915_error_state - 3.9.2-201.rhbz879823.fc18.x86_64 (included patch write mbox regs twice on snb, v2)

Linux bobloblaw 3.9.2-201.rhbz879823.fc18.x86_64 #1 SMP Thu May 16 13:35:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[45482.757631] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[45482.757645] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[45482.766942] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
[45482.770617] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

I added patch (drm/i915: Resurrect ring kicking for semaphores, selectively) to Fedora 18's 3.9.2-200 x86_64 kernel.

tags: added: kernel-stable-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The 3.8.13 upstream stable kernel is available. Raring is currently only up to the 3.8.11 updates. Can folks affected by this bug test 3.8.13 to see if the fix that reportedly fixes the bug in comment #217 was also sent to upstream stable.

3.8.13 can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13-raring/

Revision history for this message
DAVID (gron-h) wrote :

Hello
I run the 3.8.13 kernel since yesterday without pb.

> uname -a
Linux marcel 3.8.13-030813-generic #201305111843 SMP Sat May 11 22:44:40 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

# lspci -v -s `lspci | awk '/VGA/{print $1}'`
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Dell Device 0493
 Flags: bus master, fast devsel, latency 0, IRQ 44
 Memory at e1400000 (64-bit, non-prefetchable) [size=4M]
 Memory at d0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 4000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915

Revision history for this message
Anoop Karollil (anoop-karollil) wrote :

Happening on Raring: 3.8.0-23-generic #34-Ubuntu SMP Wed May 29 20:24:54 UTC 2013 i686 i686 i686 GNU/Linux

I will try 3.8.13.

Revision history for this message
Marc D. (marc.d) wrote :

I installed this one today from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13.1-raring/ but still got a GPU hang:

Linux marc-work 3.8.13-03081301-generic #201305311535 SMP Fri May 31 19:36:22 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Jun 7 15:15:29 marc-work kernel: [18343.103877] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 7 15:15:29 marc-work kernel: [18343.103883] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Attached is the i915_error_state file.

But this time I could at least switch to the console and back to X and it worked again. Before this i used the current image in raring (3.8.0-23.34) and then the whole system hung, I wasn't even able to go to the console.

This is a Lenovo T520, lspci output:

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:16.3 Serial controller: Intel Corporation 6 Series/C200 Series Chipset Family KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b4)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QM67 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)
0d:00.0 System peripheral: Ricoh Co Ltd MMC/SD Host Controller (rev 08)
0d:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 04)

Revision history for this message
peterbo (peterbo) wrote :

Happened today for me using raring with kernel 3.10.0-031000rc4-generic. Will try to get the error state next time and post it here. My pc is an Asus U46SV.

lspci output:
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 540M] (rev ff)
03:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
04:00.0 USB controller: Fresco Logic FL1000G USB 3.0 Host Controller (rev 04)
05:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

Revision history for this message
peterbo (peterbo) wrote :

Link to kernel 3.10rc5, could this merge of intel fixes be the culprit?

https://github.com/torvalds/linux/commit/943079e111bde93ed972d21618d1d73e75ba0d09

"Three regression fixes and one no-lvds quirk update. The regression Egbert
Eich tracked down goes back to 2.6.37 ... ugh. The other two are pretty
minor: One bogus modeset state checker WARN and a patch to prevent X
dying in a SIGBUS after a gpu hang with failed (or not implement as on
gen2/3) gpu reset."

I am running this now and encourage others to do so as well and report back.

Revision history for this message
Henning Kulander (hennikul) wrote :

I've been running the 3.8.13-030813-generic kernel (on Ubuntu 13.04) linked in Josephs comment above for almost a week now. Have experienced about five crashes so far, so the problem is still there. I've not been able to extract any debug info during the crashes.

Revision history for this message
Stan Schymanski (schymans) wrote :

Same here, just had a crash again running 3.8.13 (unresponsive screen, only Alt+SysRq+REISUB worked), no relevant error messages in the log.

kernel: [ 0.000000] Linux version 3.8.13-030813-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201305111843 SMP Sat May 11 22:44:40 UTC 2013
kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.8.13-030813-generic root=UUID=ca45f879-2b5d-4861-8d62-c72215ecb8e8 ro quiet splash vt.handoff=7

DMI: Dell Inc. Latitude E6320/087HK7, BIOS A15 08/15/2012

Revision history for this message
peterbo (peterbo) wrote :

Linux pbo-laptop 3.10.0-031000rc5-generic #201306082135 SMP Sun Jun 9 01:36:22 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

A 5 second hang just occurred, everything unresponsive for 5 seconds, and then resumed to normal behaviour. Better than locking completely up but still annoying.

dmesg:
[12670.344064] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Attached is my i915_error_state. Running Ubuntu 13.04 with Bumblebee, nvidia deactivated through Bumblebee.

Revision history for this message
CuteChaps (sh-senthilkumar) wrote :

I am running Ubuntu 13.04 X64 on 3.9 Kernel, facing same issue. Can some one get this fixed or any work around atleast would do.

[125646.132414] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[125646.132422] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[125648.145662] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[125648.145901] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[125648.145907] [drm:i915_reset] *ERROR* Failed to reset chip.

$ uname -ar
Linux Senthil-IN 3.9.0-030900-generic #201304291257 SMP Mon Apr 29 16:58:15 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
vouvoume (vouvoume) wrote :

I do not why, but it seems commit 358466cb1596bb001f8050ea84fdbbd9bfdd69c1 was not migrated into new kernel versions. Why?

I attached the patch which should fix the problem.

Please report yuour experiences with it.

Best regards,
   vouvoume

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

I'm testing this patch against the default 13.04 kernel (3.8.0-23-generic #34-Ubuntu SMP Wed May 29 20:22:58 UTC 2013 x86_64 x86_64 x86_64) now.

So far so good . . . will report back.

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

No, I'm sorry to report that the 358466cb1596bb001f8050ea84fdbbd9bfdd69c1 does not really help. After using the patched i915 module "normally" I tortured it with three glxgears and youtube in HD mode. This hangs it within a few minutes.

Doing the same test with 3.10.0-997-generic from kernel-ppa also hangs in a few minutes, but recovers. So this test kernel makes the problem only a _minor_ nuisance_ rather than a _MAJOR_ inconvenience requiring a window system restart at best and a reboot at worst.

Revision history for this message
Anoop Karollil (anoop-karollil) wrote :

I haven't seen the freeze happen again with 3.8.13-030813-generic #201305111843 SMP Sat May 11 22:52:24 UTC 2013 i686 i686 i686 GNU/Linux

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Thanks for the hint. I tried the 3.8.13 kernel on x86_64 with a Lenovo X220.

I can still torture out hangchecks with 3.8.13-03081302-generic #201306071405 SMP Fri Jun 7 18:06:32 UTC 2013 x86_64
by running three glxgears and youtube at HD resolution, e.g.:

[ 787.335798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 804.309281] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 804.317252] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
[ 878.117786] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 890.102590] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

However, the drm seems to recover on its own for the most part, like the 3.10.x kernel. So that is a definite improvement.

Revision history for this message
Tom Haddon (mthaddon) wrote :

Still seeing this with 3.8.0-25-generic on a Lenovo X220 (00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)):

[ 422.047514] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 422.047519] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :

I've experienced this on Quantal at some point but simply dealt with it by downgrading my kernel until a fix came. Then I upgraded to Raring and experienced the bug again.

Reading the comments above, the only Raring kernel that looked like it did not have a negative report against it is the one suggested by Anoop in #231: 3.8.13-030813-generic #201305111843.

I've installed it and experienced a hung GPU with that kernel too.

I'm trying to run Raring now with the latest stable kernel I had on Quantal. I don't know whether it means I've lost some functionalities that are available only with 3.8.x.

Changed in linux (Debian):
status: Unknown → New
Changed in linux:
importance: Unknown → Medium
status: Unknown → Invalid
Revision history for this message
Matt C (proteus400) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

My graphics-related daily hangs and instability have returned in the upgrade kernel 3.5.0-34-generic. :(

Revision history for this message
In , Cwawak (cwawak) wrote :

Is there any input or assistance I can give to help move this along?

Thanks!

Revision history for this message
matze (matzex) wrote :

Any news on this bug? I get it every few days and can't bring the system back to work. I have to reboot it via the hardware power button.

Revision history for this message
gianluca (amato) wrote :

Since a lot of time is passed, I would like to point out comment #36, where a workaround for this bug is found.

Revision history for this message
matze (matzex) wrote :

@gianluca #247 thanks for this hint, but i am using Ubuntu 13.04 with Linux matze 3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux, on the askubuntu site, they say it is only for intel driver < 2.20. I get this output:
matze@matze-tu:~$ apt-cache policy xserver-xorg-video-intel
xserver-xorg-video-intel:
  Installed: 2:2.21.6-0ubuntu4
  Candidate: 2:2.21.6-0ubuntu4
  Version table:
 *** 2:2.21.6-0ubuntu4 0
        500 http://ftp.uni-erlangen.de/mirrors/ubuntu/ raring/main amd64 Packages
        100 /var/lib/dpkg/status
matze@matze-tu:~$

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Try a more recent kernel from the ppa mainline. I am currently using raring with Kernel 3.9.8 with good success. Kernels 3.8.x were producing hangchecks requiring window restarting regularly on a sandybridge based laptop. However, with the latest 3.9 or 3.10, the occasional hangcheck recovers without requiring a restart.

Revision history for this message
gianluca (amato) wrote :

@matze, what do you mean with "it is only for intel driver < 2.20" ? I regularly use the SNA acceleration with Ubuntu 13.04 and intel driver 2.21 without any problem.

Revision history for this message
matze (matzex) wrote :

@gianluca Oh sorry, i haven't read it carefully.

@Martin D. Thanks for this hint, i have installed the mainline kernel 3.9.0-rc8 and will test it now. We will see if the crashes occurs also with this kernel.

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Matze:

Try using 3.9.8 not 3.9.0-rc8. I had very poor success with 3.9.0-rcX; it was no better than 3.8.X.

Revision history for this message
matze (matzex) wrote :

From where did you get 3.9.8? On http://kernel.ubuntu.com/~kernel-ppa/mainline/ i can only find 3.9.8 for saucy and not for raring. The most recent version for raring is 3.9.0-rc8 there.

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Install the saucy version from mainline; I found no packing or dependency issues under raring, even for the daily builds.

Revision history for this message
ErMejo (andrea-lombardoni) wrote :

I am running 13.04 with the linux-image-3.8.0-26-generic kernel and experienced this problem.

The best way I found to reproduce the problem systematically is by playing "Revenge of the Titans": after at most 5 minutes the system hung for a few seconds and afterward it was slow/choppy with video acceleration problems (GPU hung in the logs).

I tried linux-image-3.9.8-030908-generic and linux-image-3.9.9-030909-generic (saucy) from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and they work fine: when playing, sometime they hang for a short time, but afterwards the system recovers 100%.

Revision history for this message
Alexey Khoroshilov (khoroshilov) wrote :

I just such on my Lenovo X220 with Ubuntu 13.04 (3.8.0-25-generic)

Jul 11 10:57:18 kernel: [10784.262606] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 11 10:57:18 kernel: [10784.262611] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

i915_error_state is attached.

Revision history for this message
Olivier Febwin (febcrash) wrote :

crash@Dell-Latitude-E6520:~$ zgrep hung /var/log/syslog*
/var/log/syslog:Jul 12 10:26:27 Dell-Latitude-E6520 kernel: [ 5995.735802] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
/var/log/syslog.3.gz:Jul 9 11:11:49 Dell-Latitude-E6520 kernel: [ 5495.987596] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
/var/log/syslog.3.gz:Jul 9 14:04:17 Dell-Latitude-E6520 kernel: [15832.417726] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
/var/log/syslog.4.gz:Jul 8 16:33:55 Dell-Latitude-E6520 kernel: [27233.088137] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
/var/log/syslog.7.gz:Jul 3 12:56:18 Dell-Latitude-E6520 kernel: [34267.232367] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
Anoop Karollil (anoop-karollil) wrote :
Download full text (4.0 KiB)

Reproduced with 3.9.8, but recovered from hang, thanks Martin D. Weinberg

akarollil@akarollil-desktop:~$ uname -a
Linux akarollil-desktop 3.9.8-030908-generic #201307010556 SMP Mon Jul 1 10:04:25 UTC 2013 i686 i686 i686 GNU/Linux

akarollil@akarollil-desktop:~$ cat /var/log/kern.log | grep -a drm
Jul 16 11:36:29 akarollil-desktop kernel: [527805.031515] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 16 11:36:29 akarollil-desktop kernel: [527805.031519] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jul 16 11:48:55 akarollil-desktop kernel: [ 32.669425] [drm] Initialized drm 1.1.0 20060810
Jul 16 11:48:55 akarollil-desktop kernel: [ 33.088095] [drm] Memory usable by graphics device = 2048M
Jul 16 11:48:55 akarollil-desktop kernel: [ 33.088312] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
Jul 16 11:48:55 akarollil-desktop kernel: [ 33.088313] [drm] Driver supports precise vblank timestamp query.
Jul 16 11:48:55 akarollil-desktop kernel: [ 33.634065] fbcon: inteldrmfb (fb0) is primary device
Jul 16 11:48:55 akarollil-desktop kernel: [ 34.104124] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
Jul 16 11:48:55 akarollil-desktop kernel: [ 34.105796] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Jul 16 11:48:55 akarollil-desktop kernel: [ 34.784038] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.028372] [drm] Initialized drm 1.1.0 20060810
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.453101] [drm] Memory usable by graphics device = 2048M
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.453384] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.453385] [drm] Driver supports precise vblank timestamp query.
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.469099] [drm] Wrong MCH_SSKPD value: 0x16040307
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.469101] [drm] This can cause pipe underruns and display issues.
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.469102] [drm] Please upgrade your BIOS to fix this.
Jul 16 12:18:53 akarollil-desktop kernel: [ 30.549048] fbcon: inteldrmfb (fb0) is primary device
Jul 16 12:18:53 akarollil-desktop kernel: [ 31.015247] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
Jul 16 12:18:53 akarollil-desktop kernel: [ 31.016753] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Jul 16 12:18:53 akarollil-desktop kernel: [ 31.798905] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 16 16:17:02 akarollil-desktop kernel: [ 32.084239] [drm] Initialized drm 1.1.0 20060810
Jul 16 16:17:02 akarollil-desktop kernel: [ 32.433337] [drm] Memory usable by graphics device = 2048M
Jul 16 16:17:02 akarollil-desktop kernel: [ 32.433553] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
Jul 16 16:17:02 akarollil-desktop kernel: [ 32.433554] [drm] Driver supports precise vblank timestamp query.
Jul 16 16:17:02 akarollil-desktop kernel: [ 32.448662] [drm] Wrong MCH_SSKPD value: 0x16040307
Jul 16 16:17:02 akarollil-desktop kernel: [ 32....

Read more...

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

Is it possible to assign this bug as affecting linux-lts-raring too? It is correctly marked as fixed for linux-lts-quantal, but in a month from now 12.04.3 will be out with a buggy kernel... Or am I wrong?

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

That is correct. The default raring kernel will be very disappointing for people that are experiencing the problems outlined in this thread.

It seems that only >3.9.7 has a usable kernel for affected Sandybridge users (I am using a Lenovo X220). Even then, the bug remains at some level but it more of a minor inconvenience than a show stopper. I've settled on using 3.9.8 from the mainline ppa.

The 3.10.x series presents new drm prolems with Sandybridge (weird taring and rendering artifacs) so I would not recommend those either.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82747
New read-after-write patch

New patch for testing, thanks!

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82748
New read-after-write patch

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

For which version of the kernel this patch?

Revision history for this message
In , Longerdev (longerdev) wrote :
Download full text (4.6 KiB)

I tried it patch on linux-3.11_rc1, but when X starting I see:
791966 Jul 21 16:17:07 localhost kernel: [ 19.320879] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
791967 Jul 21 16:17:07 localhost kernel: [ 19.320948] IP: [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791968 Jul 21 16:17:07 localhost kernel: [ 19.320995] PGD b0d80067 PUD b0c18067 PMD 0
791969 Jul 21 16:17:07 localhost kernel: [ 19.321031] Oops: 0000 [#1] PREEMPT SMP
791970 Jul 21 16:17:07 localhost kernel: [ 19.321064] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec brcmsmac snd_hwdep snd_p cm cordic brcmutil bcma snd_page_alloc snd_timer snd soundcore
791971 Jul 21 16:17:07 localhost kernel: [ 19.321209] CPU: 0 PID: 2696 Comm: X Not tainted 3.11.0-rc1 #1
791972 Jul 21 16:17:07 localhost kernel: [ 19.321249] Hardware name: SAMSUNG ELECTRONICS CO., LTD. SF311/SF411/SF511/SF311/SF411/SF511, BIOS 06HW.M011.20110503.SCY 05 /03/2011
791973 Jul 21 16:17:07 localhost kernel: [ 19.321322] task: ffff8800b1c07590 ti: ffff8800b0c24000 task.ti: ffff8800b0c24000
791974 Jul 21 16:17:07 localhost kernel: [ 19.321370] RIP: 0010:[<ffffffff8136bfc0>] [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791975 Jul 21 16:17:07 localhost kernel: [ 19.321426] RSP: 0018:ffff8800b0c25bc8 EFLAGS: 00010286
791976 Jul 21 16:17:07 localhost kernel: [ 19.321461] RAX: 0000000000000000 RBX: ffff8800b1c3d4d8 RCX: 0000000000027330
791977 Jul 21 16:17:07 localhost kernel: [ 19.321506] RDX: 0000000000000080 RSI: ffffc900045c003c RDI: ffffc900045c0038
791978 Jul 21 16:17:07 localhost kernel: [ 19.321550] RBP: ffff8800b0c25c08 R08: ffff8800b0d97f00 R09: 00000000000145c0
791979 Jul 21 16:17:07 localhost kernel: [ 19.321594] R10: 0000000000001000 R11: ffff8800b1c3c000 R12: 0000000000000000
791980 Jul 21 16:17:07 localhost kernel: [ 19.321638] R13: 0000000000002044 R14: 0000000000000000 R15: ffff8800b1c3c000
791981 Jul 21 16:17:07 localhost kernel: [ 19.321682] FS: 00007ff167ae8880(0000) GS:ffff880100200000(0000) knlGS:0000000000000000
791982 Jul 21 16:17:07 localhost kernel: [ 19.321732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
791983 Jul 21 16:17:07 localhost kernel: [ 19.321767] CR2: 0000000000000010 CR3: 00000000b1cc9000 CR4: 00000000000407f0
791984 Jul 21 16:17:07 localhost kernel: [ 19.321810] Stack:
791985 Jul 21 16:17:07 localhost kernel: [ 19.321824] ffff8800b1c3d4d8 0000000000000000 ffff8800aff24000 0000000000000000
791986 Jul 21 16:17:07 localhost kernel: [ 19.321876] ffff8800b1c3c000 ffff8800b0d97f00 ffff8800b1f66a00 ffff8800b1c3d4d8
791987 Jul 21 16:17:07 localhost kernel: [ 19.321927] ffff8800b0c25c68 ffffffff81334b11 ffff880000000028 0000000000000000
791988 Jul 21 16:17:07 localhost kernel: [ 19.321979] Call Trace:
791989 Jul 21 16:17:07 localhost kernel: [ 19.322000] [<ffffffff81334b11>] __i915_add_request+0x6d/0x215
791990 Jul 21 16:17:07 localhost kernel: [ 19.322045] [<ffffffff8133b8d9>] i915_gem_do_execbuffer.isra.14+0xd07/0xdc5
791991 Jul 21 16:17:07 localhost kernel: [ 19.322089] [<ffffffff8133bd5e>] ? i915_gem_execbuffer2+0x5d/0x1e3
791992 Jul 21 1...

Read more...

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82768
New read-after-write patch

Oops, my mistake, please try again.

Revision history for this message
In , Longerdev (longerdev) wrote :

Created attachment 82773
i915_error_state with new patch

(In reply to comment #92)
> Created attachment 82768 [details] [review]
> New read-after-write patch
>
> Oops, my mistake, please try again.

Now loading, but after five minutes test:
793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 ctx 1) at 0xbfe21dc

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #93)
> Created attachment 82773 [details]
> i915_error_state with new patch
>
> (In reply to comment #92)
> > Created attachment 82768 [details] [review] [review]
> > New read-after-write patch
> >
> > Oops, my mistake, please try again.
>
> Now loading, but after five minutes test:
> 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> ctx 1) at 0xbfe21dc

That is a blorp (mesa/i965) bug and not the semaphore deadlock.

Revision history for this message
Rob Maas (blackburn) wrote :

Nothing new, but just to confirm, that this bug is also present on a Fujitsu E782

Ubuntu 13.04
3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Jul 25 10:32:52 NBICT0031L kernel: [10254.278336] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jul 25 10:32:52 NBICT0031L kernel: [10254.278340] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

tail Xorg.0.log <-- not 100% sure this is direct related
(EE) 13: /usr/bin/X (0x7fbc9698f000+0x58811) [0x7fbc969e7811]
(EE) 14: /usr/bin/X (0x7fbc9698f000+0x4757a) [0x7fbc969d657a]
(EE) 15: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fbc946dfea5]
(EE) 16: /usr/bin/X (0x7fbc9698f000+0x478c1) [0x7fbc969d68c1]
(EE)
[ 10269.829] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[ 10269.829] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
[ 10269.829] [mi] Increasing EQ size to 512 to prevent dropped events.
[ 10269.829] [mi] EQ processing has resumed after 161 dropped events.
[ 10269.829] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Will someone please try https://bugs.freedesktop.org/attachment.cgi?id=82768 with a working mesa! :)

Revision history for this message
vignesh (vignesh-sarma) wrote :

I am also facing this issue, GPU fails randomly

I am using Thinkpad E420

uname -a
Linux laptop 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.04
Release: 13.04
Codename: raring

Error in syslog
Aug 13 14:29:35 laptop kernel: [28951.958547] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
zetxx (opensuser) wrote :

same problem here

Revision history for this message
weber (pseudo-uzer) wrote :

same here on Acer TravelMate B113 on 3.8.0-29-generic

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :

I can reproduce this simply by using Google Maps in Chrome (which now uses WebGL).

Revision history for this message
missingno (guywithasword) wrote :

Linux Mint 15, tried updating to 3.9.8 as suggested above but the problem persists. It only happens while playing certain games, but that just means it's really getting in the way.

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

The patch seems to have helped -- my box survived a couple days with the patch applied.

Revision history for this message
Brian J. Cohen (brianjcohen) wrote :

I can confirm this problem on raring, 3.8.0-29-generic, on a Lenovo Thinkpad E530. Same as Dima Ryazanov, I can reproduce the problem simply by running Google Maps (webGL) in Chrome.

Revision history for this message
In , Chris Wilson (ickle) wrote :

The bad news is that I've just had the semaphore hang with all the read-after-write patch applied. :|

Revision history for this message
Chorca (chorca) wrote :

Youtube under Chrome also causes this after about 1/2 hour of viewing.

Revision history for this message
In , Januszmk6 (januszmk6) wrote :

(In reply to comment #94)
> (In reply to comment #93)
> > Created attachment 82773 [details]
> > i915_error_state with new patch
> >
> > (In reply to comment #92)
> > > Created attachment 82768 [details] [review] [review] [review]
> > > New read-after-write patch
> > >
> > > Oops, my mistake, please try again.
> >
> > Now loading, but after five minutes test:
> > 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> > 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> > 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> > [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> > 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> > error event; look for more information in
> > /sys/kernel/debug/dri/0/i915_error_state
> > 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> > ctx 1) at 0xbfe21dc
>
> That is a blorp (mesa/i965) bug and not the semaphore deadlock.
Could you please provide some link to this blorp bug report?
I had problem with semaphore deadlock, seems that with kernel 3.11 problem does not occur (without patch), but now I have:

[22221.843000] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[22221.843483] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4dfb5000 ctx 1) at 0x4dfb5518

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 68913 has been marked as a duplicate of this bug. ***

Revision history for this message
DAVID (gron-h) wrote :

Hello all,
I still have complete freeze of my laptop on a regular basis. Most of the time i have to hard reboot (seems to hapen more often when the power supply is not connected, that is when i run on battery).
I join the Xorg log file in case it can help.

dmesg:
[ 3347.543601] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 3347.543611] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

uname -a:
Linux marcel 3.8.0-30-generic #44-Ubuntu SMP Thu Aug 22 20:52:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

OD

Revision history for this message
theghost (theghost) wrote :

Instead of commenting "I can confirm xyz", please use the "Affects me too"-button above.
Please checkout upstreams bug report [1]. Intel is working in the issue.
See if you can help, when there is help needed (e.g. patch testing or debugging).

[1] https://bugs.freedesktop.org/show_bug.cgi?id=54226

Changed in dri:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Dan Doel (dan-doel) wrote :

I have, I think, a reliable way to trigger this behavior, if that helps. It requires a non-trivial setup, though.

I have gnome-shell running on dual monitors. The first is 1920x1200, the second is 1920x1080 (not sure if the resolution difference matters). If I run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in the dmesg about hangcheck timers and kickrings ("stuck wait on blitter ring").

I believe OpenGL acceleration of the desktop is important, because the freezes are not triggered in fluxbox, for instance. I'm not sure if the game itself needs to be using OpenGL, or if the full-screen window is the triggering factor, or something else entirely. It is important that the game keep the monitors distinct, and only go full screen on one. I just tried it on Battle for Wesnoth, and full screen there sets the monitors to mirror, which doesn't trigger the problem.

This is on an i7 4770, if that matters.

I realize this is may be difficult to put together for a test setup, but I thought I'd mention it.

Revision history for this message
In , Januszmk6 (januszmk6) wrote :

(In reply to comment #100)
> I have, I think, a reliable way to trigger this behavior, if that helps. It
> requires a non-trivial setup, though.
>
> I have gnome-shell running on dual monitors. The first is 1920x1200, the
> second is 1920x1080 (not sure if the resolution difference matters). If I
> run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in
> the dmesg about hangcheck timers and kickrings ("stuck wait on blitter
> ring").
>
> I believe OpenGL acceleration of the desktop is important, because the
> freezes are not triggered in fluxbox, for instance. I'm not sure if the game
> itself needs to be using OpenGL, or if the full-screen window is the
> triggering factor, or something else entirely. It is important that the game
> keep the monitors distinct, and only go full screen on one. I just tried it
> on Battle for Wesnoth, and full screen there sets the monitors to mirror,
> which doesn't trigger the problem.
>
> This is on an i7 4770, if that matters.
>
> I realize this is may be difficult to put together for a test setup, but I
> thought I'd mention it.

I also have dual monitors and also gnome-shell, but I have on both 1920x1080px. I notice that when I am watching some videos on full screen on one monitor, this is happening more often (on non full-screen work, it's still happening)

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #100)
> This is on an i7 4770, if that matters.

No, that's something completely new. Please open a new bug report and attach your dmesg, Xorg.0.log and /sys/drm/card0/error from after one of the hangs.

Revision history for this message
Eugene Lipchansky (nsky) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

I've found I get freezes more often when my Android tablet (ASUS TF300T) connected via USB. Especially when I copy a large file to it! Once I detach the device the desktop become responsive again in most cases. Does anyone have any idea how the usb drivers connected to video???

Revision history for this message
Kamil (lampshade-t) wrote :

Am I first to report this bug in Ubuntu 13.10?
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1228222
Marked as duplicate of this bug.

Revision history for this message
Leonid Evdokimov (darkk) wrote :

Yep, I have seen hang on 13.10 too. Here are logs: http://yadi.sk/d/psMKfJZi9cfoa

no longer affects: linux-lts-raring (Ubuntu Quantal)
no longer affects: linux-lts-raring (Ubuntu Raring)
tags: added: patch
Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

Created attachment 87101
i915_error_state (kernel 3.11.3)

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

After playing hedgewars for about half an hour, the gpu started to hang.
dmesg output:
[ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3442.907471] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5e52000 ctx 1) at 0x5e52220
[ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
I'm not sure my problem is related to this bug.

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

(In reply to comment #104)
> After playing hedgewars for about half an hour, the gpu started to hang.
> dmesg output:
> [ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3442.907471] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x5e52000 ctx 1) at 0x5e52220
> [ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
> [ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for
> forcewake old ack to clear.
> I'm not sure my problem is related to this bug.

My laptop is Thinkpad T420 with i5-2520M. The BIOS version is 1.44.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #104)
> I'm not sure my problem is related to this bug.

Most likely it isn't - gpu hang is similar to an application crashing. Please file a new bug report and don't forget to attach the error state file. That's the first thing we need to triage the bug.

And of course list the versions of all the userspace driver parts (mesa, ddx, ...) since like a normal application crash most often it's not a kernel bug, but a bug in the render commands submitted by userspace to the gpu.

Revision history for this message
In , Longerdev (longerdev) wrote :

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Why userspace drivers can breaking render and calling error in kernel part of driver? May be can add "filter" sent commands and ignore (or other reaction, but not execute their) their?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #107)
> (In reply to comment #106)
> > (In reply to comment #104)
> > > I'm not sure my problem is related to this bug.
> >
> > Most likely it isn't - gpu hang is similar to an application crashing.
> > Please file a new bug report and don't forget to attach the error state
> > file. That's the first thing we need to triage the bug.
> >
> > And of course list the versions of all the userspace driver parts (mesa,
> > ddx, ...) since like a normal application crash most often it's not a kernel
> > bug, but a bug in the render commands submitted by userspace to the gpu.
>
> Why userspace drivers can breaking render and calling error in kernel part
> of driver? May be can add "filter" sent commands and ignore (or other
> reaction, but not execute their) their?

The GPU is a full Turing complete computational engine (in fact, lots of them coupled in parallel and in series), see http://xkcd.com/1266/

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-raring (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
Revision history for this message
katsu (katsukatsu-deactivatedaccount) wrote :
Download full text (3.6 KiB)

hangcheck occure.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise

$ grep \(EE\) Xorg.0.log
[ 52302.812] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[ 52302.812] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.

$ dmesg | tail
[39859.567501] ath9k 0000:02:00.0 wlan0: disabling VHT as WMM/QoS is not supported by the AP
[39859.568516] wlan0: associate with 00:0d:02:4e:e2:68 (try 1/3)
[39859.571139] wlan0: RX AssocResp from 00:0d:02:4e:e2:68 (capab=0x431 status=0 aid=1)
[39859.571390] wlan0: associated
[39859.571399] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[49786.700884] CPUM: APIC 01 at 00000000fee00000 (mapped at ffffc90000676000) - ver 0x01060015, lint0=0x10700 lint1=0x10400 pc=0x00400 thmr=0x000fa
[49786.700897] CPUM: APIC 00 at 00000000fee00000 (mapped at ffffc9000067e000) - ver 0x01060015, lint0=0x10700 lint1=0x00400 pc=0x10400 thmr=0x000fa
[52239.562498] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[52239.562504] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[52247.552940] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

$ uname -a
Linux 26897LJ 3.8.0-31-generic #46~precise1-Ubuntu SMP Wed Sep 11 18:21:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ sudo hwinfo --short
cpu:
                       Intel(R) Celeron(R) CPU B820 @ 1.70GHz, 800 MHz
                       Intel(R) Celeron(R) CPU B820 @ 1.70GHz, 800 MHz
keyboard:
  /dev/input/event3 AT Translated Set 2 keyboard
mouse:
  /dev/input/mice Microsoft ® Laser Mouse 6000
  /dev/input/mice SynPS/2 Synaptics TouchPad
graphics card:
                       Intel VGA compatible controller
sound:
                       Intel Audio device
storage:
                       Intel SATA controller
network:
  wlan0 Atheros WLAN controller
  eth0 Attansic Ethernet controller
network interface:
  wlan0 WLAN network interface
  eth0 Ethernet network interface
  lo Loopback network interface
disk:
  /dev/sda ST320LT020-9YG14
  /dev/sdb Multiple Card Reader
partition:
  /dev/sda1 Partition
  /dev/sda2 Partition
  /dev/sda3 Partition
  /dev/sda4 Partition
cdrom:
  /dev/sr0 MATSHITA DVD-RAM UJ8C1
usb controller:
                       Intel USB Controller
                       Intel USB Controller
                       Intel USB Controller
bios:
                       BIOS
bridge:
                       Intel Host bridge
                       Intel PCI bridge
                       Intel PCI bridge
                       Intel PCI bridge
                       Intel ISA bridge
hub:
                       Linux 3.8.0-31-generic xhci_hcd xHCI Host Controller
                       Linux 3.8.0-31-generic xhci_hcd xHCI Host Controller
                       Linux 3.8.0-31-generic ehci_hcd EHCI Host Controller
                       Hub
                       Linux 3.8.0-31-generic ...

Read more...

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Someone has reported it here.
https://bugs.freedesktop.org/show_bug.cgi?id=70151

gokul (gokulnathonline)
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
In , Honza-h (honza-h) wrote :

Hello. Same problem here.

[ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8
[ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8
[ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220

sys: fedora 19 64b
Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

WM: KDE with effects enabled

8G ram
300G SATA HDD
ntb Lenovo ThinkPad E320

problem occurs in:
- scrolling in firefox
- playing video in vlc and switch to KDE terminal or another app
- sometimes system hangs, cpu 100%, freeze and hard reboot needed
- sometimes happens if I work with ff or in terminal only (very frustrating)
- happening across many kernel versions 3.0 to newest I think

lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #110)
> Hello. Same problem here.
>
> [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 485.443467] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0xa637000 ctx 1) at 0xa6371c8

Unlikey that this is the same gpu hang. Please file a new bug report and attach the error state.

Revision history for this message
hotani (hotani) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Seeing this several times a day in debian with kernel 3.10-3 (64bit). If I use LXDE/openbox, no freeze. Only GNOME 3.4.

I tried the fix from comment #36 and it did not work. System just froze again; took less than an hour.

This is the SNA "fix", but it does not work:
http://askubuntu.com/questions/225356/how-can-i-enable-the-sna-acceleration-method-for-intel-cards-under-ubuntu-12-04

Revision history for this message
Leonid Evdokimov (darkk) wrote :

Once again on 3.8.0-31-generic on raring
dmesg, i915_error_state and Xorg.0.log are at http://yadi.sk/d/psMKfJZi9cfoa under 2013-10-10 subfolder

Revision history for this message
theghost (theghost) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

FYI, this also affects Saucy with 3.8.0-30-generic and Mesa 9.2.1 and Intel DRI 2.99.904.
Also I observed that with switching to Mesa 9.2 the number of lockups highly increased.
Additionally with newer drivers installed, there are now complete system lockups anymore.
Just the VT switching, which is still very annoying in games.

tags: added: saucy
Revision history for this message
In , theghost (theghost) wrote :

Just a few remarks.
I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups highly increased (especially in games).
Additionally with running the latest drivers complete system lockups are gone, but it's still a lockup for multiple seconds with following VT switching.
Maybe these observations help somehow.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #112)
> Just a few remarks.
> I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
> Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups
> highly increased (especially in games).

On snb the blorp engine in mesa has become a bit more hang-happy, see bug #70151
Not all gpu hangs are created equal ;-)

> Additionally with running the latest drivers complete system lockups are
> gone, but it's still a lockup for multiple seconds with following VT
> switching.

You mean a gpu hang happens while when doing a vt switch?

Revision history for this message
In , theghost (theghost) wrote :

(In reply to comment #113)
> On snb the blorp engine in mesa has become a bit more hang-happy, see bug
> #70151
> Not all gpu hangs are created equal ;-)
>

Actually it was on Sandybridge.

> You mean a gpu hang happens while when doing a vt switch?

No I meant, if you suffer a lockup you just have to wait a few seconds and switch to another VT and back, then you can resume with your system (although sometimes fonts are broken).

Revision history for this message
Peter Silva (peter-bsqt) wrote :

On Saucy. When I run minecraft, after a few minutes, it locks up the screen.
Ctrl-Alt-F1, to get to a tty... I see:

Hang check elapsed *ERROR* stuck on render ring.
render ring stuck inside bo (0xaf4d000 ctx 1) at 0xaf4d1d8

this happens every couple of minutes...

The crash detection happens, and it tries to report, over and over again, but the report never succeeds. I do not think the reports are getting through. fwiw. I am patched upto today on launch day
 (2013/10/17) and it still happens.

Revision history for this message
In , Alexander (bay-hackerdom) wrote :

Created attachment 87857
i915_error_state

I also met this bug while I was watching video in mplayer. It every 1-2 hours.

[40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[40787.765852] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1fb63000 ctx 1) at 0x1fb63220

Revision history for this message
In , Alexander (bay-hackerdom) wrote :

Created attachment 87858
X -version output

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #115)
> Created attachment 87857 [details]
> i915_error_state
>
> I also met this bug while I was watching video in mplayer. It every 1-2
> hours.
>
> [40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [40787.765852] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x1fb63000 ctx 1) at 0x1fb63220

This looks like bug #70151, but is definitely not this bug here.

Changed in linux-lts-quantal (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Quantal):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Raring):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
status: Fix Released → Invalid
Revision history for this message
whiskers75 (whiskers75) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Happens here while gaming or other graphics-intensive tasks. Lenovo G570.

Changed in linux (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux-lts-raring (Ubuntu):
status: Confirmed → Triaged
Changed in linux-lts-raring (Ubuntu):
status: Triaged → Invalid
Changed in linux-lts-raring (Ubuntu Precise):
status: Confirmed → Invalid
Changed in mesa (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Precise):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Quantal):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in mesa (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Alberto Salvia Novella (es20490446e) wrote :

Since this bug:

- Is invalid for Linux upstream, it is also downstream.
- Is confirmed for DRI upstream, the real affected package is "mesa (Ubuntu)".

Revision history for this message
theghost (theghost) wrote :

Correct me if I am wrong, but if it's an Intel DRI bug (and thats what it is), wouldn't it mean that "xserver-xorg-video-intel" is the "real affected" package ?

Revision history for this message
Chris Wilson (ickle) wrote :

No, the original issue (still unresolved) here is in the hardware, which makes it a kernel problem. However, there are lots of *different* bugs that have been also reported here that are due to regressions in mesa/i965.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "0001-drm-i915-Fix-gen6-SNB-missed-BLT-ring-interrupts.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

Created attachment 89314
i915_error_state (kernel 3.11.6, mesa 9.2.2, xf86-video-intel 2.99.906)

GPU hangs after playing hedgewars for a few minutes. Thinkpad T420 laptop, i5-2520M.
dmesg error message:
[16901.286432] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16901.286441] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[16901.286444] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[16908.287504] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16908.287508] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring

Revision history for this message
In , Kenxeth (kenxeth) wrote :

*** Bug 71890 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 72048 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 72829 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 73659 has been marked as a duplicate of this bug. ***

Revision history for this message
In , A-bugzilla (a-bugzilla) wrote :

Created attachment 92710
i915_error_state

I'm also getting regular Sandybridge GPU lockups with Mesa 10.0.1 and Linux kernel 3.13.

dmesg output:

[ 918.876872] [drm] stuck on render ring
[ 918.876876] [drm] stuck on blitter ring
[ 918.876878] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 918.876879] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 918.876879] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 918.876880] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 918.876880] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 932.923240] [drm] stuck on render ring
[ 932.923242] [drm] stuck on blitter ring

Unfortunately the crash dump doesn't help - it's an empty file!

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74180 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74265 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74452 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74473 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74867 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 75163 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Simtn (simtn) wrote :

Created attachment 95090
Another version of the same hang - directed here from bug 75502

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 75999 has been marked as a duplicate of this bug. ***

Changed in dri:
status: Confirmed → In Progress
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76408 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76677 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76801 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

For what its worth, running 3.13.7 greatly mitigates this bug, to where the dead time is barely noticeable. It happened three times in short order here and I didn't notice any of them:

[ 4562.551141] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4582.530028] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4633.476199] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77043 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77058 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

My stuck ring faults are completely gone with i915.i915_enable_rc6=0. Fan stays on a bit more (subjectively) seems to be the only side effect. HP Pavilion dv6 (Sandybridge).

Revision history for this message
In , Chris Wilson (ickle) wrote :

Oh that's interesting. We might be able to find a register to prevent rc6 whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just frob forcewake directly.)

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

(In reply to comment #139)
> Oh that's interesting. We might be able to find a register to prevent rc6
> whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just
> frob forcewake directly.)

Happy to test patches. I'm updating to 3.13.9 tonight. I could add something on top if you have ideas. If you need more info than my attachment to #76801 just let me know.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77147 has been marked as a duplicate of this bug. ***

Revision history for this message
Alan Briolat (alan-codescape) wrote : Re: [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

I was still having this bug on a fresh install of 14.04 (kernel 3.13.0-24-generic). Strangely, since setting i915.semaphores=1 (suggested at http://askubuntu.com/questions/50033/unity-gui-pauses-freezes-for-less-than-a-few-seconds/ all the way back in 2011) I no longer get hangs. The default value of i915.semaphores seems to be -1 (according to `cat /sys/module/i915/parameters/semaphores`) and the advice elsewhere of setting i915.semaphores=0 had no effect for me. Hardware: i3-2120 (Sandy Bridge HD 2000 graphics) + Asus P8H61-M PRO motherboard.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77974 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 78317 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Hoek (artjom-simon) wrote :

Created attachment 98589
Kernel 3.14.2-1-ARCH, xf86-video-intel 2.99.911-2, mesa 10.1.2-1

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 78785 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 79500 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 79640 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

commit ca79d888eb63cdacf80653ae23ce8f7d9ac52c68
Author: Chris Wilson <email address hidden>
Date: Fri Jun 6 10:22:29 2014 +0100

    drm/i915: Reorder semaphore deadlock check

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80055 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80125 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80168 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80401 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80592 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80935 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81064 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

Can someone indicate what the current status of this is?

Revision history for this message
In , Yunloh (yunloh) wrote :

I haven't seen it with xorg-x11-drv-intel-2.99.912-4 (built for fc20) from kojipkgs.

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

I'm using 2.21.15 which as far as I know is the latest release.

Revision history for this message
In , Andre Robatino (robatino) wrote :

I am seeing

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

followed by a graphics freeze and the need to reboot (if I can) in Fedora 20 with the latest updates including the 3.15.4 kernel.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81402 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Matteo Croce (teknoraver) wrote :

same happens with 3.15.0 on Ubuntu 14.04 64 bit

Jul 11 12:43:41 localhost kernel: [42049.462542] [drm] stuck on render ring
Jul 11 12:43:41 localhost kernel: [42049.463330] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:41 localhost kernel: [42049.463334] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 11 12:43:41 localhost kernel: [42049.463335] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jul 11 12:43:41 localhost kernel: [42049.463336] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 11 12:43:41 localhost kernel: [42049.463337] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jul 11 12:43:41 localhost kernel: [42049.463338] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jul 11 12:43:43 localhost kernel: [42051.464623] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:47 localhost kernel: [42055.468816] [drm] stuck on render ring
Jul 11 12:43:47 localhost kernel: [42055.469614] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:49 localhost kernel: [42057.470899] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:53 localhost kernel: [42061.439056] [drm] stuck on render ring
Jul 11 12:43:53 localhost kernel: [42061.439867] [drm] GPU HANG: ecode 0:0xfeffffff, in chrome [2172], reason: Ring hung, action: reset

Revision history for this message
In , Cwawak (cwawak) wrote :

[872948.822279] [drm] stuck on render ring
[872948.822291] [drm] stuck on blitter ring
[872948.823041] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [30647], reason: Ring hung, action: reset
[872948.823045] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[872948.823046] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[872948.823047] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[872948.823048] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[872948.823049] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[872948.823168] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
[872950.821912] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Linux bobloblaw 3.15.0-1.fc20.x86_64 #1 SMP Sat Jun 14 11:22:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Attaching gpu crash dump as card0-error.071714-cwawak

Revision history for this message
In , Cwawak (cwawak) wrote :

Created attachment 102991
card0-error.071714-cwawak - gpu dump

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81673 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81676 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81710 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81844 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81990 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82277 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82301 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82399 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82451 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mstahl (mstahl) wrote :

*** Bug 82620 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82631 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82666 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82691 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82901 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83098 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83156 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83326 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83473 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83661 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Manuel Widmer (m-widmer-d) wrote :

Is there any ongoing development to fix this bug? I still see it with
Linux <hostname> 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

And the latest intel drivers as provided by intel linux graphics installer from
https://01.org/linuxgraphics/

Many times my system freezes few minutes after starting to watch a movie with vlc. I have my screen connected through a receiver (hdmi for audio + video) with the linux system. The probability for a freeze is higher when the hdmi receiver was powered of for some time before playing the movie than when I do a reboot and hdmi is always on.

I'm happy to help with crashdumps as far as I'm able to collect them.

Revision history for this message
In , Bartosz Brachaczek (b-brachaczek) wrote :

(In reply to comment #183)

I recommend configuring i915.semaphores=0. I did it and it doesn't freeze anymore.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83721 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83783 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Frank Stephan (f-st) wrote :

Hi Chris,

meanwhile my current kernel is 3.16.1-46.1.g90bc0f1
I'm wondering (after a reinstall) that the semaphore bug hasn't occured yet, which was the case before (after a fresh install).

This leads me to 4 definable possible reasons:

1. the named kernel revision somehow contains a fix for it. looking at the changes I could'nt get an affirmation to that assumption.
2. cgroup_memory=disabled has a relation to it. (That's why I removed it for now).
3. the BIOS settings (which could be different now) might have something to do with it.
4. I haven't installed KVM suppport yet.

I'll post again if I find a reproducible explanation.
Frank

Revision history for this message
In , Frank Stephan (f-st) wrote :

2. of course I meant cgroup_disable=memory

Revision history for this message
In , Frank Stephan (f-st) wrote :

Hi Chris,

OK, nothing of the above was the reason. In my case it's simply this:

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "TearFree" "true"
EndSection

I added it when the tearing scrolling through large webpages annoyed me.
As soon as I added it, the problems quickly started.

Selfmade problem.

Frank

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #189)
> Hi Chris,
>
> OK, nothing of the above was the reason. In my case it's simply this:
>
> /etc/X11/xorg.conf.d/20-intel.conf
>
> Section "Device"
> Identifier "Intel Graphics"
> Driver "intel"
> Option "TearFree" "true"
> EndSection
>
>
> I added it when the tearing scrolling through large webpages annoyed me.
> As soon as I added it, the problems quickly started.
>
> Selfmade problem.

Not really, https://bugs.freedesktop.org/show_bug.cgi?id=70764 tracks that this hang is more likely with TearFree (fundamentally the hang is still the same hardware issue, but it is interesting that TearFree has a higher chance of hitting it).

If you want to experiment:

 http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=requests

should have an interesting fix, at least for trying to prevent the TearFree leading to the semaphore hang.

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

What information is most useful for these repeating issues, as it just happened again:

 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140239] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140750] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:33:01 arrowsmithlap1 kernel: [1182244.142445] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
 Sep 16 08:33:01 arrowsmithlap1 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

The only thing under my /etc/X11/xorg.conf.d/ is 00-keyboard.conf (system generated).

Do you want a copy of /sys/class/drm/card0/error every time?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #191)
> What information is most useful for these repeating issues, as it just
> happened again:
>
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> render ring
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> blitter ring

So long as it is the same event, there is no more information we need other than testing feedback for an eventual workaround.

Revision history for this message
In , Manuel Widmer (m-widmer-d) wrote :

(In reply to comment #184)
> (In reply to comment #183)
>
> I recommend configuring i915.semaphores=0. I did it and it doesn't freeze
> anymore.

Meanwhile I tested both i915.semaphores=0 and i915.semaphores=1 neither of which did help in my case. But with i915.semaphores=0 my system became much more unstable and even crashed on its own after some days without stress on graphics (just ran some desktop apps like thunar or vlc for music only - no movies). With i915.semaphores=1 the system is at least stable (for some weeks) as long as I don't heavily use desktop applications.

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

*** Bug 85194 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85333 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85609 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Josh Glover (jmglov) wrote :

I am also experiencing this, on a Gentoo system running on a ThinkPad T440s. I'm not doing anything related to XBMC, simply using xrandr for multihead. The interesting thing is that DRI works fine on my laptop screen (glxgears reports 60fps, which is the refresh rate of my screen), but breaks when I move a window trying to use DRI (e.g. Chrome, glxgears) to the external monitor connected to the mini Display Port output.

I see this stuff in dmesg:

[ 3561.424762] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3561.424770] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 3561.424772] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 3561.424774] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 3561.424776] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 3561.424778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 3566.422957] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3571.425143] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3575.423680] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring

Seems like the same issue. I'm trying to downgrade X, mesa, et al., to try and get the system back in working order.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

*** Bug 79675 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85972 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86058 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fritsch-b (fritsch-b) wrote :

For those running Ubuntu, here is a build of a kernel based on 3.17.1 with the patches Chris Willson wants you to test:

- Those patches have other regressions (so be careful to only test your specific issue).

https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb
https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb

Those kernels are based on: https://bugs.freedesktop.org/show_bug.cgi?id=83677#c35

Beware, don't switch VTs.

Revision history for this message
In , Tomas Huryn (thuryn1) wrote :

I've tryed the mentioned kernel on my Fedora 21 Beta and still hangs after for example Netbeans opens main window for the whole screen.

Revision history for this message
In , Smruti-patil (smruti-patil) wrote :

*** Bug 86437 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86765 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86836 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86925 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 87710 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 87776 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88541 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

*** Bug 88626 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88723 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88789 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89078 has been marked as a duplicate of this bug. ***

Mathew Hodson (mhodson)
Changed in linux:
importance: Medium → Unknown
status: Invalid → Unknown
affects: linux → mesa
Mathew Hodson (mhodson)
tags: removed: saucy
Mathew Hodson (mhodson)
tags: added: metabug
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89299 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89570 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89671 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89774 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89771 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89981 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90106 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90146 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90271 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90473 has been marked as a duplicate of this bug. ***

Andy Whitcroft (apw)
Changed in linux-lts-quantal (Ubuntu Precise):
status: Invalid → Fix Committed
Changed in linux (Ubuntu Precise):
status: Invalid → Fix Committed
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90835 has been marked as a duplicate of this bug. ***

Revision history for this message
In , helios (martin-lichtvoll) wrote :

Chris, you referred me to this bug as I reported

Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring

I skimmed through it and it appears that there are some patches to test? But I am not sure which ones these are. Can you or someone else enlighten me?

Also I note that I still use

        Option "AccelMethod" "uxa"

and I have

martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
options i915 modeset=1 i915_enable_rc6=7

thus maximum energy saving. But according to powertop it never enters the highest sleep state anyway.

I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it.

And its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang? I didn´t have this before 4.1 kernel?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to Martin Steigerwald from comment #225)
> Chris, you referred me to this bug as I reported
>
> Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> semaphore on render ring
>
> I skimmed through it and it appears that there are some patches to test? But
> I am not sure which ones these are. Can you or someone else enlighten me?

There's likely a modest improvement in 4.2.

> Also I note that I still use
>
> Option "AccelMethod" "uxa"
>
> and I have
>
> martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> options i915 modeset=1 i915_enable_rc6=7

Fortuitously that dangerous option doesn't do anything for your kernel.

> ffffffff813a4b0e
> thus maximum energy saving. But according to powertop it never enters the
> highest sleep state anyway.
>
> I will remove the AccelMethod setting now and see whether it helps. If not,
> I downgrade to 4.1-rc4 for now, as issues have been at least much less
> frequent with it.

Purely circumstantial.

> And its really that for me 4.1-rc6 makes things much *worse*. I am typing
> this after a clean reboot and already got the GPU hang again. It happens
> about every few minutes. Are you really sure this is the same GPU hang? I
> didn´t have this before 4.1 kernel?

Yes.

Revision history for this message
In , helios (martin-lichtvoll) wrote :

(In reply to Chris Wilson from comment #226)
> (In reply to Martin Steigerwald from comment #225)
> > Chris, you referred me to this bug as I reported
> >
> > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> > semaphore on render ring
> >
> > I skimmed through it and it appears that there are some patches to test? But
> > I am not sure which ones these are. Can you or someone else enlighten me?
>
> There's likely a modest improvement in 4.2.

Nice.

> > Also I note that I still use
> >
> > Option "AccelMethod" "uxa"
> >
> > and I have
> >
> > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> > options i915 modeset=1 i915_enable_rc6=7
>
> Fortuitously that dangerous option doesn't do anything for your kernel.

Well I found out why, I compiled i915 into the kernel it seems, at least I don´t have an i915 module in lsmod. But also i915.i915_enable_rc6=7 on kernel command line does not seem to have any effect. I removed the option.

> > ffffffff813a4b0e
> > thus maximum energy saving. But according to powertop it never enters the
> > highest sleep state anyway.
> >
> > I will remove the AccelMethod setting now and see whether it helps. If not,
> > I downgrade to 4.1-rc4 for now, as issues have been at least much less
> > frequent with it.
>
> Purely circumstantial.

Since using SNA I didn´t see a GPU hang so far. Too early to say for sure, but it seems something in UXA may have triggered it more easily.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91212 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91662 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91810 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91832 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

(In reply to Chris Wilson from comment #192)
> (In reply to comment #191)
> > What information is most useful for these repeating issues, as it just
> > happened again:
> >
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> > render ring
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> > blitter ring
>
> So long as it is the same event, there is no more information we need other
> than testing feedback for an eventual workaround.

Is this the same bug?

$ journalctl -p 3 -b -1
Ruj 25 02:13:01 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:01 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
... [ repeated messages ] ...
Ruj 25 02:13:33 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:33 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
Ruj 25 02:13:34 crnigrom kernel: [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring
Ruj 25 02:13:34 crnigrom kernel: [drm:init_ring_common [i915]] *ERROR* render ring initialization failed ctl 00000000 (valid? 0) head 00000000 tail 00000000 start 00000000 [expected 00000000]
Ruj 25 02:13:34 crnigrom kernel: [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5
Ruj 25 02:13:34 crnigrom gnome-session[1823]: Unrecoverable failure in required component gnome-shell.desktop

After which gnome crashes with "Oh No Something Is Wrong" screen

$ uname -r
4.1.7-200.fc22.x86_64

Hardware i3-2100 CPU/GPU

This bug is going on already for a long long time, but at least computer is not hard freezing anymore, although gnome is crashing so any gtk applications running doing something stalls.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92118 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92739 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was resolved by uninstalling various components, re-installing and updating them. I have a hunch (completely unproven) that it was a transparent bit-fail issue from the SSD. By un-installing and re-installing, the files were likely installed to a different location on the drive. It wasn't configuration, as I tried erasing, and even rolling back to defaults, with the problem still persisting. As it was almost daily, prior to uninstall, and hasn't happened since the install, this is all I can attribute it to.

HTH someone.

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119432
attachment-28908-0.html

I reported this bug from a system without an SSD. Recently, I have not
seen the kernel messages appear however--currently on linux 4.2.5.

On Sun, Nov 1, 2015 at 10:04 PM, <email address hidden> wrote:

> *Comment # 235 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c235>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was
> resolved by uninstalling various components, re-installing and updating them. I
> have a hunch (completely unproven) that it was a transparent bit-fail issue
> from the SSD. By un-installing and re-installing, the files were likely
> installed to a different location on the drive. It wasn't configuration, as I
> tried erasing, and even rolling back to defaults, with the problem still
> persisting. As it was almost daily, prior to uninstall, and hasn't happened
> since the install, this is all I can attribute it to.
>
> HTH someone.
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

(In reply to Jeffrey E. Bedard from comment #236)
> Created attachment 119432 [details]
> attachment-28908-0.html
>
> I reported this bug from a system without an SSD. Recently, I have not
> seen the kernel messages appear however--currently on linux 4.2.5.

Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an SSD. There was lots of clicking. Upgraded packages as they came in, but no change. Only the uninstall and re-install cleared the repeat button. :)

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119433
attachment-32271-0.html

I think this bug can be marked as closed with the latest linux/mesa/xorg
versions :)

On Fri, Nov 6, 2015 at 1:47 AM, <email address hidden> wrote:

> *Comment # 237 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c237>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> (In reply to Jeffrey E. Bedard from comment #236 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c236>)> Created attachment 119432 <https://bugs.freedesktop.org/attachment.cgi?id=119432> [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432&action=edit>
> > attachment-28908-0.html
> >
> > I reported this bug from a system without an SSD. Recently, I have not
> > seen the kernel messages appear however--currently on linux 4.2.5.
>
> Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an
> SSD. There was lots of clicking. Upgraded packages as they came in, but no
> change. Only the uninstall and re-install cleared the repeat button. :)
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92927 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

Created attachment 120189
error state with 4.2 kernel

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93331 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93482 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93493 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89524 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93595 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93876 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93824 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 94057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , sander eikelenboom (b-linux) wrote :

Tuesday, March 1, 2016, 9:43:23 PM, you wrote:

> Chris Wilson changed bug 54226
> WhatRemovedAddedCC <email address hidden>
>

> Comment # 249 on bug 54226 from Chris Wilson
> *** Bug 94057 has been marked as a duplicate of this bug. ***
>

> You are receiving this mail because:
> You are on the CC list for the bug.
>

Sorry to say, but:
Is there a way to get off the CC-list of this slightly depressing kind of "catch-all" bug ?
It unfortunately doesn't seem to have be going anywhere for the last 3 to 4 years accept
for an endless stream of duplicates being appended.

--
Sander

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

(In reply to Sander Eikelenboom from comment #250)
> Is there a way to get off the CC-list of this slightly depressing kind of
> "catch-all" bug ?

CC list is at the top right corner. Choose the address, tick "Remove selected CCs", and hit Save Changes.

I've done this for you now.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 95238 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samantham (samantham) wrote :

Chris, I seem to be experiencing this bug in Linux 4.7rc3 on an x220 ThinkPad with Intel HD 3000 chipset. I was getting random full system freeze, non responsive over network.

The main messages before the crash were:
Jun 23 19:11:18 athena kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Jun 23 19:11:18 athena kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.7 [i915]] *ERROR* GT thread status wait timed out.

The original crash I haven't been able to reproduce easily but I CAN reproduce every time a full system lockup running the following intel-gpu-tools tests (I have not even close to run all the tests though) [**This may or may not be related to the original crash**]

gem_sync, subtest: bsd2-hang
drv_hangman, subtest: error-state-capture-bit

I do not know if these tests are helpful or related (maybe some are known to fail? not sure).
I have drm debugging turned on for when I ran those tests. (drm.debug=0x1e log_buf_len=1M)
I can post logs of the hangs associated with the two tests/subtests and run any other tests if you desire (with kernel drm debug on), I will wait for the issue to reappear with the drm debug on before posting that log though. By the number of similar bugs you may already have the CALL TRACE and non-debug level logs.

I know how to patch and am able to compile kernels to test. The bug effects me maybe once every 1 or 2 days. I use XOrg with Glamor. I have been seeing these crashes since 4.6 (maybe 4.5 or earlier not sure).

I know how to apply patches and am able to compile drm-next or any patches you have to see if this issue can be isolated. Thanks, sorry for the long response.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97451 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Yann-argotti (yann-argotti) wrote :

*** Bug 98294 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 98807 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 100245 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ricardo-vega-u (ricardo-vega-u) wrote :

Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

I doesn't seem to be getting mentioned Gnome crashes on my sandybridge anymore with mainline kernels, that is currently 4.11 and I think even with 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default centos 7 kernels I am definitely getting very frequent GPU crashes that brings down Gnome.

So it is either fixed for good, or it become much rarer. The issue I am/was experiencing happens when Gnome is running, it does not happen when only GDM is loaded. System load seems to not have effect on the bug triggering, seems to happen any time, on idle, or when machine is loaded.

Revision history for this message
In , Elizabethx-de-la-torre-mena (elizabethx-de-la-torre-mena) wrote :

(In reply to samuel.rakitnican from comment #260)
> I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> anymore with mainline kernels, that is currently 4.11 and I think even with
> 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> centos 7 kernels I am definitely getting very frequent GPU crashes that
> brings down Gnome.
>
> So it is either fixed for good, or it become much rarer. The issue I am/was
> experiencing happens when Gnome is running, it does not happen when only GDM
> is loaded. System load seems to not have effect on the bug triggering, seems
> to happen any time, on idle, or when machine is loaded.
Hopefully, is fixed for good. I'm closing this bug, if problem arise with latest kernel versions https://www.kernel.org/ please open a NEW bug with HW and SW information, steps to reproduce and relevant logs.Thank you.

Changed in dri:
status: In Progress → Fix Released
Changed in linux (Fedora):
importance: Unknown → Undecided
status: Unknown → Won't Fix
Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to Elizabeth from comment #261)
> (In reply to samuel.rakitnican from comment #260)
> > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> > anymore with mainline kernels, that is currently 4.11 and I think even with
> > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> > centos 7 kernels I am definitely getting very frequent GPU crashes that
> > brings down Gnome.
> >
> > So it is either fixed for good, or it become much rarer. The issue I am/was
> > experiencing happens when Gnome is running, it does not happen when only GDM
> > is loaded. System load seems to not have effect on the bug triggering, seems
> > to happen any time, on idle, or when machine is loaded.
> Hopefully, is fixed for good. I'm closing this bug, if problem arise with
> latest kernel versions https://www.kernel.org/ please open a NEW bug with HW
> and SW information, steps to reproduce and relevant logs.Thank you.

There was no fix for this HW issue.

Revision history for this message
In , Aaron-lu-a (aaron-lu-a) wrote :

Created attachment 135173
gpu error file on 4.13.5-200.fc26.x86_64

This problem reappeared on 4.13.5-200.fc26.x86_64 last Friday.

[774249.632109] [drm] GPU HANG: ecode 6:0:0x85fffff8, in Xorg [696], reason: Hang on rcs0, action: reset
[774249.632110] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[774249.632111] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[774249.632111] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[774249.632111] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[774249.632112] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[774249.632172] drm/i915: Resetting chip after gpu hang

Changed in dri:
status: Fix Released → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

commit 0da715ee60774401bea00dc71fca6fd1096c734a
Author: Chris Wilson <email address hidden>
Date: Mon Nov 20 20:55:02 2017 +0000

    drm/i915: Disable semaphores on Sandybridge

Changed in dri:
status: Confirmed → Won't Fix
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104243 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104772 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jani-saarinen-g (jani-saarinen-g) wrote :

I will close this now.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 106119 has been marked as a duplicate of this bug. ***

summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Order Xanax Online Overnight
description: updated
description: updated
Steve Langasek (vorlon)
summary: - Order Xanax Online Overnight
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
description: updated
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Best place to order Tramadol Online in Religh NC
description: updated
description: updated
summary: - Best place to order Tramadol Online in Religh NC
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
smithava (smithava23)
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Buy Adipex Online To Suppress Appetite
description: updated
summary: - Buy Adipex Online To Suppress Appetite
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

closing mesa as fixed according to upstream years ago

Changed in mesa (Ubuntu):
status: Triaged → Fix Released
Changed in mesa (Ubuntu Precise):
status: Triaged → Fix Released
Changed in linux-lts-quantal (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
Changed in linux (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.