Bug #1041790 “[snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140...” : Bugs : xserver-xorg-video-intel package : Ubuntu

Revision history for this message

Rocko (rockorequin) wrote on 2012-08-26:

#1

BootDmesg.txt Edit (62.7 KiB, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (10.2 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (3.9 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (34.0 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (707 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (7.1 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (4.1 KiB, text/plain; charset="utf-8")
ProcMaps.txt Edit (18.2 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (3.9 KiB, text/plain; charset="utf-8")
ProcStatus.txt Edit (766 bytes, text/plain; charset="utf-8")
UdevLog.txt Edit (356.7 KiB, text/plain; charset="utf-8")
XorgLog.txt Edit (40.8 KiB, text/plain; charset="utf-8")
XorgLogOld.txt Edit (42.2 KiB, text/plain; charset="utf-8")
i915_error_state.txt Edit (2.2 MiB, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-08-26:

#2

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	New → Confirmed

Apport retracing service (apport) on 2012-08-26

tags:

removed: need-duplicate-check

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-08-29:

#25

Created attachment 66289
dmesg output

From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle

$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)

Created attachment 66289
dmesg output

From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle

$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)

Revision history for this message

Bryce Harrington (bryce) wrote on 2012-09-10:

#3

Does switching from UXA to SNA help?

Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Undecided → Medium

Revision history for this message

Rocko (rockorequin) wrote on 2012-09-10:

#4

Ha! I thought SNA was turned on by default, but it isn't, is it. Is it possible to switch between SNA and UXB on when X is running, or to tell which one is being used?

I've turned SNA on via AccelMethod in xorg.conf now, so I'll see if the freezes go away.

Since I restarted X with SNA, the titlebar of windows that don't have the focus change their background to light grey. The window buttons and the title text stay the same, which looks weird. Is that something that can be configured?

Revision history for this message

Rocko (rockorequin) wrote on 2012-09-13:

#5

Is SNA turned on by default now? I had a couple of hours freeze-free with it the other day, but removed my xorg.conf shortly afterwards because the white titlebars and glitchy 3D graphics were annoying, and also because with SNA enabled the backlight didn't come on after the screensaver turned it off. But now the titlebars are white again.

Revision history for this message

Bryce Harrington (bryce) wrote on 2012-09-14:

#6

SNA is not the default for quantal. No, there is not a way to toggle between UXA and SNA at run time. /var/log/Xorg.0.log is where to look to see which acceleration tech is active.

If I understand your testing feedback, you do believe SNA helps eliminate the freeze behaviors, and thus we can consider UXA the likely source of the bug.

Revision history for this message

Rocko (rockorequin) wrote on 2012-09-15:

#7

Yes, I think the bug doesn't happen with SNA whereas it occurs pretty regularly with UXA. I've been using SNA for a couple of days now since it became the default on my system. Does X now look for other xorg.conf files? I created one called /etc/X11/xorg.conf-intel-sna and symlinked to it to test out SNA; then I deleted the symlink, and a day or two later suddenly SNA became the default.

Revision history for this message

Rocko (rockorequin) wrote on 2012-09-15:

#8

Ah, I am using xorg-edgers. Perhaps they are trying out SNA as the default there.

Revision history for this message

Rocko (rockorequin) wrote on 2012-09-28:

#9

I've been using SNA for a couple of weeks now, and it doesn't seem to suffer from this particular bug.

The bug still occurs in the latest xf86-video-intel driver from git (as of 27/9/12), though. It generally occurs when focus changes, eg when a menu or popup window is opening.

Revision history for this message

Ursula Junque (ursinha) wrote on 2012-10-01:

#10

Hi Bryce, I've been getting this error every once in a while and when it happens, apport tries to report the bug like ten times. Let me know if I can provide more information about it.

Cheers,

Revision history for this message

Ursula Junque (ursinha) wrote on 2012-10-01:

#11

I've filed another bug with apport and all my files are attached there: bug 1059737, just in case they're not duplicates.

Revision history for this message

Paul Smedley (paul-smedley) wrote on 2012-10-07:

#12

Switching from UXA to SNA fixes this for me too, on an Asus Zenbook UX31E

Revision history for this message

Dimitri John Ledkov (xnox) wrote on 2012-10-10:

#13

I am hitting this bug. Can somebody please explain how to check if I am using UXA or SNA and how to switch between the two? If SNA helps, and I am using UXA I'd like to try SNA.

Revision history for this message

Rocko (rockorequin) wrote on 2012-10-10:

#14

@Dmitrijs: To find which method is being used, do:

grep AccelMethod /var/log/Xorg.0.log

I find also that the titlebars of non-focused windows are often light grey instead of black when using SNA.

And to change methods, put this in your xorg.conf to set the acceleration method and then restart X:

Section "Device"
Identifier "Card0"
Driver "intel"
Option "AccelMethod" "sna" # or uxa, as appropriate
EndSection

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-10-21:

#26

If you can easily reproduce this error, can you please build a kernel using http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=xv-overlay which has some revised memory barriers.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-10-27:

#27

Can you help me to build rpm for fedora?

Revision history for this message

Rocko (rockorequin) wrote on 2012-11-17:

#15

I still experience this bug, even with the latest intel driver from git, xf86-video-intel-2.6.99.902. I would use SNA but it has an even more annoying bug after the screen saver unlocks where unity just shows me a black screen and mouse cursor, and I have to physically restart unity to get it working again.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-11-22:

#28

On second thoughts, I think this should be fixed by the slight robustification in more recent hangcheck.

Please try the latest kernel for your distribution (should be 3.6.7 atm) and reopen if it still occurs.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-11-24:

#29

I am use Fedora 18 with 3.6.7-5.fc18.i686 kernel and in dmesg output still exists message:
[22826.654365] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22826.654369] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-11-24:

#30

That is not the same bug, so you need to attach a fresh set of debug info (please remember the i915_error_state)...

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-11-24:

#31

Please, explain how get needed debug info. Thanks.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-11-24:

#32

http://intellinuxgraphics.org/how_to_report_bug.html

From which we need the i915_error_state, so

$ sudo mount -tdebugfs debug /sys/kernel/debug
$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-11-24:

#33

Created attachment 70518
i915_error_state

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-11-24:

#34

Looks that corresponds to the bug

commit 1c8b46fc8c865189f562c9ab163d63863759712f
Author: Chris Wilson <email address hidden>
Date: Wed Nov 14 09:15:14 2012 +0000

drm/i915: Use LRI to update the semaphore registers

    The bspec was recently updated to remove the ability to update the
    semaphore using the MI_SEMAPHORE_BOX command, the ability to wait upon
    the semaphore value remained. Instead the advice is to update the
    register using the MI_LOAD_REGISTER_IMM command. In cursory testing,
    semaphores continue to function - the question is whether this fixes
    some of the deadlocks where the semaphore registers contained stale
    values?

hopefully addresses.

That patch is only available on drm-intel-next at the moment, which is available either at http://cgit.freedesktop.org/~danvet/drm-intel or available as drm-intel-experimental in the ubuntu kernel-ppa.

Karma Dorje (taaroa) on 2012-11-28

tags:

added: raring

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2012-11-29:

#16

I've uploaded -intel 2.20.14 to raring, so please test with both UXA and SNA to see if either or both work.

Rocko: I can't reproduce your bug with SNA (with this new version anyway), works fine on my T420s. 2.6.99.902 sounds old too :)

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Rocko (rockorequin) wrote on 2012-11-29:

#17

Yes, I've been running v2.20.14 from git (using SNA, not UXA) for a few days on Quantal and so far I hasn't seen that other bug I mentioned - it hasn't fatally locked up after the screensaver kicks in. However, it has experienced *this* particular bug a few times, ie where the screen locks but I can fix it by switching to a tty terminal and back.

Re 2.6.99.902, I think I probably did a git tag command and looked at the last entry, which is definitely old. I would have been running a pre-v2.20.14 version at the time.

Revision history for this message

Karma Dorje (taaroa) wrote on 2012-12-01:

#18

@Timo Aaltonen
SNA — ok. looks like some sort of regression in the driver.

Revision history for this message

Bryce Harrington (bryce) wrote on 2012-12-03:

#19

Rocko, thanks for testing the git DDX. Next time you get one of these freezes can you please collect a fresh i915_error_state, dmesg, and Xorg.0.log?

Sounds like this bug should go upstream.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Incomplete → New
status:	New → Incomplete

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-08:

#35

Problem repeated with patched kernel.

[118637.439016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[118637.439020] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[mikhail@localhost ~]$ uname -a
Linux localhost.localdomain 3.6.9-4.1.fc18.i686.PAE #1 SMP Wed Dec 5 15:16:33 UTC 2012 i686 i686 i386 GNU/Linux
[mikhail@localhost ~]$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state
[sudo] password for mikhail:
[mikhail@localhost ~]$

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-08:

#36

Created attachment 71192
i915_error_state (new)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-08:

#37

sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state-8
cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

What it mean??

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-08:

#38

Created attachment 71199
i915_error_state (new)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-08:

#39

Created attachment 71200
dmesg output (new)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-12-08:

#40

Lalalalala.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-12-09:

#41

*** Bug 58057 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-12-12:

#42

*** Bug 58212 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-12-13:

#43

We can confirm the synopsis by disabling semaphores (i915.semaphore=0), but can we also test whether this is an rc6 side-effect (i915.i915_enable_rc6-0)?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2012-12-13:

#44

Also maybe time for ' git revert 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f'

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-15:

#45

Created attachment 71549
i915_error_state

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2012-12-17:

#48

Created attachment 71630
dmesg

Revision history for this message

bugbot (bugbot) wrote on 2012-12-27:

#20

We're closing this bug since there has not been a response from the original reporter. However, the issue still exists please feel free to reopen with the requested information. If you're not the original reporter, we'd prefer you file a new bug report.

Some tips:

* Report X.org bugs via the command: `ubuntu-bug xorg`

* Test against the latest development Ubuntu. http://cdimage.ubuntu.com/daily-live/
Bugs marked as affecting the development version tend to get priority attention.

* The `xdiagnose` utility has functionality for enabling debugging and
analyzing a few common X problems.

* Tag your bugs with the Ubuntu versions you have reproduced the issue in.

* See https://wiki.ubuntu.com/X/Reporting for tips on writing good bug reports.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Incomplete → Expired

Adam Conrad (adconrad) on 2013-01-08

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Expired → Confirmed

Timo Aaltonen (tjaalton) on 2013-01-09

Changed in xserver-xorg-video-intel (Ubuntu):
assignee:	nobody → Timo Aaltonen (tjaalton)
status:	Confirmed → Incomplete

Revision history for this message

Rocko (rockorequin) wrote on 2013-01-10:

#23

I've seen it happen with kernel 3.8-rc2 and SNA using the latest intel driver from git.

The hang isn't always the same:

* Sometimes it locks the computer up completely, requiring a hard reboot.

* Sometimes it locks X, but CTRL-ALT-F1 and back unlocks it.

* Sometimes it resolves itself without me even noticing that it has happened, other than that there may be some corruption in the tabs' title text in chrome and window movement has become somewhat jerky instead of the normal smooth movement you get after restarting X.

Next time it happens I'll see if I can recover any information.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-01-10:

#52

(In reply to comment #24)
> Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> /sys/modules/i915/parameters/semaphores) to prevent this hang.

What are the consequences?

> The only interesting patch I can suggest atm is
>
> commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> Author: Ben Widawsky <email address hidden>
> Date: Wed Sep 26 10:34:01 2012 -0700
>
> drm/i915: Workaround to bump rc6 voltage to 450
>
> BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> buggy BIOSen may not be doing this, so we correct it for them. Ideally
> customers should update the BIOS as only it would know the optimal
> values for the platform, so we leave that fact as a DRM_ERROR for the
> user to see.
>
> in 3.8-rc1 or look for a BIOS update.

I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
How to check problem persists or not?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-01-10:

#53

(In reply to comment #27)
> (In reply to comment #24)
> > Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> > /sys/modules/i915/parameters/semaphores) to prevent this hang.
>
> What are the consequences?

Rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption.

Revision history for this message

Adam Conrad (adconrad) wrote on 2013-01-11:

#24

Timo: I've never had it completely hang the machine, but I've also not been patient enough to sit around and wait to see if X will eventually recover on its own, I always do a VT switch out and back (and get welcomed by an apport dialog)

Has happened several times today. Will be upgrading to 3.8.0-rc soon to see if that helps, but the comment above me doesn't give much hope.

Bug Watch Updater (bug-watch-updater) on 2013-01-11

Changed in xserver-xorg-video-intel:
importance:	Unknown → Medium
status:	Unknown → Confirmed

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2013-01-11: Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

#54

ok, so the hang I'm seeing is this same one, not that frequent though

Revision history for this message

Roman Yepishev (rye) wrote on 2013-01-15:

#55

I've started getting this failure after migrating to Raring and I get the recoverable lockups 3 times a day or even more with the same GPU lockup message in dmesg. Apport has proposed [sandybridge-m-gt2+] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001 as a bug title.

I am using 3.8.0-0-generic kernel.

There is no AccelMethod string in Xorg log (so it should be uxa), will check whether this happens with sna.

Revision history for this message

In freedesktop.org Bugzilla #54226, bwidawsk (bwidawsk) wrote on 2013-01-20:

#56

(In reply to comment #27)

> > The only interesting patch I can suggest atm is
> >
> > commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> > Author: Ben Widawsky <email address hidden>
> > Date: Wed Sep 26 10:34:01 2012 -0700
> >
> > drm/i915: Workaround to bump rc6 voltage to 450
> >
> > BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> > buggy BIOSen may not be doing this, so we correct it for them. Ideally
> > customers should update the BIOS as only it would know the optimal
> > values for the platform, so we leave that fact as a DRM_ERROR for the
> > user to see.
> >
> > in 3.8-rc1 or look for a BIOS update.
>
> I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
> ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
> How to check problem persists or not?

The easiest way is to apply the patch and look for DRM_DEBUG_DRIVER messages. This is unlikely to fix the problem, but also can't hurt.

We've only assumed new BIOS will fix the problem, but who knows. Especially if it's a 3rd party BIOS.

Timo Aaltonen (tjaalton) on 2013-01-22

Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Medium → High
status:	Incomplete → Triaged

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-01-22: Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

#57

I'm seeing this on an X220 running up to date raring. Happens a few times a day. .

alan@deep-thought:~$ grep SNA /var/log/Xorg.0.log
[ 12.175] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.20.19-0ubuntu1 (Timo Aaltonen <email address hidden>)
[ 13.744] (II) intel(0): SNA initialized with SandyBridge backend
alan@deep-thought:~$ grep UXA /var/log/Xorg.0.log
alan@deep-thought:~$ grep AccelMethod /var/log/Xorg.0.log
alan@deep-thought:~$

Pretty sure I have the very latest BIOS from Lenovo as I recently installed Windows on the machine to do exactly that.

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2013-01-22:

#58

try 'echo 0 > /sys/module/i915/parameters/semaphores' to see if it stops the hangs.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Triaged → Confirmed

Revision history for this message

Timo Aaltonen (tjaalton) wrote on 2013-01-22:

#59

..and restart X (logging out should do it when lightdm is used)

Revision history for this message

Timo Jyrinki (timo-jyrinki) wrote on 2013-01-22:

#60

My first of this type happened today, even though I've already been using SNA before since August or so. If this now starts to be recurring (since I also started to have bug #1102390 only yesterday), I can try disabling the semaphores.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-01-24:

#62

*** Bug 59786 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-01-24:

#63

Created attachment 73560
write mbox regs twice on snb

Another piece of magic which might help. Please test this patch and the one from Chris ("Read back semaphore mboxes after update") separately and report back whether anything changes.

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-01-24:

#64

Created attachment 73577
write mbox regs twice on snb, v2

Now actually the right patch attached, the old one didn't compile ...

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-01-25:

#61

The workaround in comment #58 does eliminate the GPU lockups. Which I notice when I boot in the morning and forget to do that workaround and get a lockup part way through the day.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-01-30:

#65

Which patch I need applied for fix this issue?

I see that patches from comment 26 and 32 have similar logic...

@@ -596,6 +606,16 @@ gen6_add_request(struct intel_ring_buffer *ring)
intel_ring_emit(ring, MI_USER_INTERRUPT);
intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ read_mboxes(ring, mbox1_reg, 1024);
+ read_mboxes(ring, mbox2_reg, 1028);
+ intel_ring_advance(ring);
+ }
+
return 0;
}

@@ -598,6 +598,19 @@ gen6_add_request(struct intel_ring_buffer *ring)
intel_ring_emit(ring, MI_USER_INTERRUPT);
intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ mbox1_reg = ring->signal_mbox[0];
+ mbox2_reg = ring->signal_mbox[1];
+
+ update_mboxes(ring, mbox1_reg);
+ update_mboxes(ring, mbox2_reg);
+ intel_ring_advance(ring);
+ }
+
return 0;
}

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-01-30:

#66

> --- Comment #33 from <email address hidden> ---
> Which patch I need applied for fix this issue?

We can't reproduce the bug, so those are just patches to test
different ideas. Please test them both each individually (i.e. remove
the first before testing the 2nd patch) and the report whether
anything changes (i.e. harder or easier for you to hit the issue).

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-02-02:

#67

Can't compile kernel with patch above:

drivers/gpu/drm/i915/intel_ringbuffer.c: In function 'gen6_add_request':
drivers/gpu/drm/i915/intel_ringbuffer.c:611:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
drivers/gpu/drm/i915/intel_ringbuffer.c:612:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1
make[3]: *** [drivers/gpu/drm/i915] Error 2
make[2]: *** [drivers/gpu/drm] Error 2
make[1]: *** [drivers/gpu] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....

Bryce Harrington (bryce) on 2013-02-04

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

In freedesktop.org Bugzilla #54226, Norman Yarvin (yarvin-yarchive) wrote on 2013-02-20:

#75

I'm seeing this bug, or something like it, on an older chip (G965, desktop version):

Feb 19 22:05:56 muttonhead kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Feb 19 22:05:56 muttonhead kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Feb 19 22:05:56 muttonhead kernel: [drm:kick_ring] *ERROR* Kicking stuck wait on render ring
Feb 19 22:05:57 muttonhead kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

after which the mouse pointer sticks in one spot (with most other things working), and then when I shut down X, the console fails to appear, requiring a reboot. Not knowing that the given file path was under /sys/kernel, I failed to capture the error state, but will do so next time this happens (which is maybe every other day). This is with a 3.7 kernel (Gentoo); before 3.7, the driver was stable. I don't know what the 'generation' numbers in the driver mean, but I'm guessing that generation 6 is later, so many of the suggested fixes would not make any difference on this machine.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-02-20:

#76

(In reply to comment #42)
> I'm seeing this bug, or something like it, on an older chip (G965, desktop
> version):

Good news, it is not this bug. Please make sure you have the latest stable driver (a gentoo user not using 3.8 already! ;-) and latest xf86-video-intel, then file a fresh bug report, attaching your dmesg, Xorg.0.log and i915_error_state.

Revision history for this message

In freedesktop.org Bugzilla #54226, gneman (luis6674) wrote on 2013-02-20:

#77

I subscribed to this bug because I was seeing this hang too. It happened randomly several times, without a specific cause or way to reproduce it.

This was around December, and it happened maybe 4-5 times along a month. The GPU would hang with that error in dmesg, and everything continued to work, though very slowly.

However, I must say that since then it didn't happen again for almost 2 months maybe. I use Arch Linux, which means I always update to the latest stable packages of everything, so it seems that for me it got solved at some point (or at least much harder to reproduce).

This is an Ironlake / HD 2000 based Dell laptop. I did update the BIOS when I found this bug report, but it didn't solve the problem, the hang happened after updating it.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-02-22:

#78

*** Bug 61310 has been marked as a duplicate of this bug. ***

Revision history for this message

Laura Czajkowski (czajkowski) wrote on 2013-02-24: Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

#74

This bug has started to affect me on 13.04

Chris Wilson (ickle) on 2013-02-25

summary:

- [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001
+ [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround
+ i915.semaphores=0

Bryce Harrington (bryce) on 2013-03-02

Changed in linux (Ubuntu):
importance:	Undecided → High

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-03-02: Status changed to Confirmed

#79

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-03:

#86

Created attachment 75818
i915_error_state (kernel 3.8.1 Fedora)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-03:

#87

Today Fedora 18 updated kernel to 3.8.1 and message "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung" still here. Please look at my last log. Any updates?

Revision history for this message

Pete Graner (pgraner) wrote on 2013-03-04:

#80

GPU is locking up numerous times a day, just started on Raring. Apport is telling me my bug has already been reported and points me at https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1100360 which is a dupe of this bug.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-03-05:

#81

@pgraner I have the same GPU lockup frequently, daily. I sometimes forget to set the workaround i915.semaphores to 0, but when I do the lockups go away. I have now set it in the /etc/default/grub as GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0" and update-grub...

Revision history for this message

Robin Munn (rmunn) wrote on 2013-03-05:

#82

I've been experiencing a recurring lockup that may or may not be this bug; what I can add to the discussion is that if it is this bug, it appears to be Linux-only. My company bought several of its developers identical laptops (Dell Latitude E6520 models). The other developers who are using these laptops are using Windows, and I'm the only one using Linux. I've been experiencing these hangups 3-4 times per week, on average, over the past year. But when I asked the other developers in my company with Latitude E6520s if they'd experienced random-seeming hangups, they all said no. Which means that either I got the only laptop with a defective chip, or else the Windows driver doesn't experience this problem.

I hope this helps in some way; sorry I can't be of more technical help, but GPUs are outside my area of expertise.

Revision history for this message

In freedesktop.org Bugzilla #54226, bwidawsk (bwidawsk) wrote on 2013-03-06:

#88

This looks weird to me:

0x00005a58: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a5c: 0x00012044: dword 1
0x00005a60: 0x0043b625: dword 2
0x00005a64: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a68: 0x00022040: dword 1
0x00005a6c: 0x0043b625: dword 2
0x00005a70: 0x10800001: MI_STORE_DATA_INDEX
0x00005a74: 0x00000080: index
0x00005a78: 0x0043b625: dword
0x00005a7c: 0x01000000: MI_USER_INTERRUPT
0x00005a80: 0x0b160001: MI_SEMAPHORE_MBOX compare semaphore, use compare reg 2
0x00005a84: 0x0043b625: value
0x00005a88: 0x00000000: address
0x00005a8c: 0x00000000: MI_NOOP

Chris?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-06:

#89

Weird? Did you just forget about that the hw does a strictly greater-than comparison?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-06:

#90

(In reply to comment #47)
> Today Fedora 18 updated kernel to 3.8.1 and message
> "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung"
> still here. Please look at my last log. Any updates?

We're still waiting upon you apply patches and report.

Revision history for this message

Stan Schymanski (schymans) wrote on 2013-03-06:

#83

drm:i915_hangcheck_hung and following messages in kern.log before permanent lock-up Edit (6.3 KiB, text/plain)

I experienced a permanent GPU lock-up today, the screen was unresponsive for half an hour before I decided to reboot by pressing Alt+SysRq-REISUB. The below is the the relevant section of kern.log before shutting down.

Revision history for this message

Stan Schymanski (schymans) wrote on 2013-03-06:

#84

Just to add some background information to my previous post:
I have been having random hangups for more than a year now, but only since the beginning of this week, I have been getting the GPU hang-up error messages. Today, it coincided with one of the complete lock-ups, that could only be resolved by Alt+SysRq-REISUB. I will now try the approach proposed in Comment #81, hoping for the best. I hope that I am not seeing two different issues here. The GPU hang-up error message has been reoccurring without obvious hang-ups in the past few days, while my complete hang-ups have been happening randomly, sometimes with the possibility to reboot as outlined above, sometimes without (only the power button would do) and sometimes the computer turned itself off without any interaction from my side...
If anyone sees anything new in the kern.log section I pasted, please let me know.

Joseph Salisbury (jsalisbury) on 2013-03-06

Changed in linux (Ubuntu):
status:	Confirmed → Invalid

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-03-06:

#91

*** Bug 61925 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-08:

#92

Created attachment 76196
i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on snb, v2)

I am applied patch "write mbox regs twice on snb, v2" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-09:

#93

Created attachment 76208
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am also applied patch "Read back semaphore mboxes after update" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-09:

#94

(In reply to comment #52)
> Created attachment 76196 [details]
> i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> I am applied patch "write mbox regs twice on snb, v2" but still have problem
> [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

0x00052cc8: 0x18800100: MI_BATCH_BUFFER_START
0x00052ccc: 0x0d59b000: dword 1
0x00052cd0: 0x13000001: MI_FLUSH_DW post_sync_op='no write'
0x00052cd4: 0x000000c4: address
0x00052cd8: 0x00000000: dword
0x00052cdc: 0x00000000: MI_NOOP
0x00052ce0: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052ce4: 0x00002044: dword 1
0x00052ce8: 0x0007a582: dword 2
0x00052cec: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052cf0: 0x00012040: dword 1
0x00052cf4: 0x0007a582: dword 2
0x00052cf8: 0x10800001: MI_STORE_DATA_INDEX
0x00052cfc: 0x00000080: index
0x00052d00: 0x0007a582: dword
0x00052d04: 0x01000000: MI_USER_INTERRUPT

That's only a single LRI per semaphore, the patch wasn't tested.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-09:

#95

I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your patched version.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-09:

#96

Created attachment 76215
kernel.spec

(In reply to comment #55)
> I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your
> patched version.

It's impossible. Distro kernel is 3.8.1-201.fc18.i686.PAE. 3.8.1-202.fc18.i686.PAE and 3.8.1-203.fc18.i686.PAE is kernels patched by me.

You can sure if look at my build spec file.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-09:

#97

Created attachment 76239
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am sorry. Seems I forgot add "ApplyPatch" to spec. I am rebuild kernel with "0001-drm-i915-Read-back-semaphore-mboxes-after-updating-t.patch" patch, but seems problem still here.

Does it make sense to check the "0001-write-mbox-regs-twice-on-gen6.patch" patch?

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-09:

#98

Created attachment 76243
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-10:

#99

Created attachment 76261
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

"write mbox regs twice on snb, v2" patch also not solve problem.

[ 1399.270341] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1399.270345] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1399.277331] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-10:

#100

Created attachment 76293
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-03-12:

#108

Created attachment 76448
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Any updates?

Revision history for this message

Stan Schymanski (schymans) wrote on 2013-03-12:

#101

GPU -lockup messages started re-appearing, despite an Upgrade to Kernel 3.8.2 in Ubuntu 12.10.

Revision history for this message

Chris Wilson (ickle) wrote on 2013-03-13:

#102

#81 is only applicable to this bug, and is the correct workaround. Not all GPU hangs are this bug, most tend to be mesa...

Revision history for this message

Stan Schymanski (schymans) wrote on 2013-03-13:

#103

Sorry, I haven't been able to attach any of the crash reports, because I get the message that the crash relates to Bug #1059737 and I don't get an opportunity to upload the report. Bug #1059737 is supposed to be a duplicate of this bug, which is why I started posting here. How can I verify whether it is this bug or a mesa bug?

Revision history for this message

Chris Wilson (ickle) wrote on 2013-03-13:

#104

The most brutal way would be to remove /usr/lib/*/dri/i965_dri.so and so force it to use software.

Revision history for this message

Stan Schymanski (schymans) wrote on 2013-03-13:

#105

Before I do this, could you confirm whether my new report of Bug #1154591 is related to this here or not? Should I still try removing i965_dri.so, or has this become irrelevant?

Revision history for this message

Chris Wilson (ickle) wrote on 2013-03-13:

#106

Easy answer: probably. Impossible to tell without the error-state.

Revision history for this message

Bilal Shahid (s9iper1) wrote on 2013-03-17:

#107

here is another bug and i frequently having this hang recently it hanged again
and i didnt report the new ones bug it open the link into the previous bug 1153202

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-17:

#109

*** Bug 62443 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-03-19:

#111

As a workaround, this

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

drm/i915: Resurrect ring kicking for semaphores, selectively

should improve the recovery from the hangs.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2013-03-20:

#110

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in sandybridge-meta (Ubuntu):
status:	New → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #54226, cbrnr (cbrnr) wrote on 2013-03-20:

#112

OK, I've been experiencing this bug from time to time on my Arch Linux box. No apparent reason, last time it happened I was watching a Youtube video, and it also seems to happen more often when I'm running VirtualBox. However, this might just be a coincidence.

Revision history for this message

In freedesktop.org Bugzilla #54226, Longerdev (longerdev) wrote on 2013-03-31:

#114

I have this bug too.

Gentoo 64bit
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Samsung Electronics Co Ltd Device c0a0
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f5c00000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at e000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915

Kernel 3.8.0 gentoo-sources

I try patch a24a11e6b4e96bca817f854e0ffcce75d3eddd13, but nothing change.
Mar 31 15:14:37 localhost kernel: [64379.291736] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 31 15:14:37 localhost kernel: [64379.291742] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message

Spencer Mabrito (xoptics) wrote on 2013-04-01:

#113

Just some further confirmation of this issue.

On a Sandy Bridge i5 Dell Latitude e6420 laptop with Intel HD 4000 integrated graphics. On 3.5.0-26-generic I experienced GPU hangs once or twice an hour (fixed themselves after a few seconds) and full lockups 2 to 5 times per day, depending on what I was doing (have to hard reboot the machine to recover). Having firefox open seemed to be a risk factor, and if FF had loaded flashplayer, I expected a full lockup at any moment.

As described in the following bug, which is a similar issue... https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1041790 ... I added the following to my /etc/default/grub options:

i915.semaphores=0

This decreased the frequency and severity of lockups/crashes, but they were still there. Upon further investigation, I found this bug to be the actual issue (my syslog showing the same 3 lines as the others in this thread), and have reverted to 3.5.0-25-generic and have been running without any issues for a day or two now. I still have i915-semaphores=0 in my grub boot options but don't think I still need that...

Revision history for this message

In freedesktop.org Bugzilla #54226, Mika-kuoppala (mika-kuoppala) wrote on 2013-04-05:

#119

Created attachment 77475
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Revision history for this message

In freedesktop.org Bugzilla #54226, Mika-kuoppala (mika-kuoppala) wrote on 2013-04-05:

#120

(In reply to comment #61)
> Created attachment 76448 [details]
> i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> Any updates?

Mikhail,

Could you please try patch:
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Bryce Harrington (bryce) on 2013-04-05

description:	updated
description:	updated
description:	updated
description:	updated

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-04-05:

#121

Patch is also included in latest drm-intel-nightly, linux-next. So you can test it by grabbing a distro-build of one of those.

Bryce Harrington (bryce) on 2013-04-05

description:

updated

Bryce Harrington (bryce) on 2013-04-05

Changed in linux (Ubuntu):
status:	Invalid → New

Revision history for this message

Bryce Harrington (bryce) wrote on 2013-04-05:

#115

@kernel team - comment #111 has a test patch worth having a kernel built for that folks can test.

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-04-05: Status changed to Confirmed

#116

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Seth Forshee (sforshee) wrote on 2013-04-05:

#117

I've built a quantal kernel with the patch mentioned on the freedesktop.org bugzilla. Please test to see if it resolves the issue.

http://people.canonical.com/~sforshee/lp1041790/linux-3.5.0-27.46~lp1041790v201304052041/

Compiler issues are preventing me from building for raring atm, but I'll post a raring build as soon as I am able to do so.

Revision history for this message

In freedesktop.org Bugzilla #54226, Brian Ealdwine (eode) wrote on 2013-04-08:

#118

Occurred while playing vessel. Never ran into the problem on 12.10.
I'm available to provide info.

Bug Watch Updater (bug-watch-updater) on 2013-04-08

Changed in xserver-xorg-video-intel:
status:	Confirmed → Incomplete

Revision history for this message

Seth Forshee (sforshee) wrote on 2013-04-08:

#122

A test build for raring is now available.

http://people.canonical.com/~sforshee/lp1041790/linux-3.8.0-17.27~lp1041790v201304081343/

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-04-09:

#123

(In reply to comment #67)
> (In reply to comment #61)
> > Created attachment 76448 [details]
> > i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> > snb, v2)
> >
> > Any updates?
>
> Mikhail,
>
> Could you please try patch:
> [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Hm, seems better but problem still here

[59120.008798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[59120.008802] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[59120.012173] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-04-09:

#124

Created attachment 77692
i915_error_state (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-04-09:

#125

Created attachment 77693
dmesg (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-04-09:

#126

\o/ It kicked the right ring.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-04-09:

#127

(In reply to comment #72)
> \o/ It kicked the right ring.

So is this normal?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-04-09:

#128

It's the expected 'improved' recovery behaviour for this bug.

Bug Watch Updater (bug-watch-updater) on 2013-04-09

Changed in xserver-xorg-video-intel:
status:	Incomplete → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-04-15:

#129

*** Bug 63542 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Bryce Harrington (bryce) wrote on 2013-04-22:

#130

Chris, what is the upstream status for the ring kicker patch? Is that likely to get incorporated upstream, or do you feel it needs further polish before it's ready? Would this patch incur some risk of regressions in other areas were it be backported for inclusion in Ubuntu?

tags:

added: kernel-handoff-graphics

Revision history for this message

Chris Wilson (ickle) wrote on 2013-04-23:

#131

The ring kicker is upstream, but it is not a fix. It is just a ligherweight reset mechanism that should prevent a cascade of errors and corruption - but the user still suffers the 3s stall before the hang is detected.

However, the patch seems to be solid and has survived its trial by fire.

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-04-23:

#132

(In reply to comment #76)
> Chris, what is the upstream status for the ring kicker patch? Is that
> likely to get incorporated upstream, or do you feel it needs further polish
> before it's ready? Would this patch incur some risk of regressions in other
> areas were it be backported for inclusion in Ubuntu?

Merged for 3.10 as

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

drm/i915: Resurrect ring kicking for semaphores, selectively

Nothing else planned for now, but I think we can just keep this bug here open in case we stumble across a new idea. And it seems to be good honey to attrack all the me,too reports ;-)

Revision history for this message

In freedesktop.org Bugzilla #54226, Tomwij-1 (tomwij-1) wrote on 2013-04-23:

#133

(In reply to comment #65)
> Kernel 3.8.0 gentoo-sources

Did you report this at the Gentoo Bugzilla?

When you do, please attach /debug/dri/0/i915_error_state

Revision history for this message

In freedesktop.org Bugzilla #54226, Longerdev (longerdev) wrote on 2013-04-29:

#134

>Did you report this at the Gentoo Bugzilla?

>When you do, please attach /debug/dri/0/i915_error_state

Now no report in gentoo bugzilla (so as in kernel they no have patches intel drivers). But now with it patch, I can't repeat bug 2 weeks on kernel 3.9-rc6. But I no test with blender (when I try use blender, GPU hung reapeted for 1-5 minutes).

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-05-01:

#135

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-05-01:

#136

Created attachment 78692
i915_error_state (kernel 3.9 Fedora)

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-05-01:

#137

Created attachment 78693
i915_error_state (kernel 3.9 Fedora)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-05-07:

#138

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Freedesktop-l (freedesktop-l) wrote on 2013-05-23:

#139

Created attachment 79704
i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430

I can reproduce this bug every time I try to quickly drag a Chrome window with a YouTube movie to a secondary monitor connected to my laptop Dell E6430. It is very annoying. Tested on latest kernel 3.10-rc2.

I can give you any additional information you want, test patches, etc. Just please try to fix this :)

Revision history for this message

In freedesktop.org Bugzilla #54226, Freedesktop-l (freedesktop-l) wrote on 2013-05-23:

#140

(In reply to comment #84)
> Created attachment 79704 [details]
> i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430
>
> I can reproduce this bug every time I try to quickly drag a Chrome window
> with a YouTube movie to a secondary monitor connected to my laptop Dell
> E6430.

One more information - you need to enable "Override software rendering list" in chrome://flags

Revision history for this message

Dac Chartrand (conner-bw) wrote on 2013-05-23:

#141

Came here from bug #1129679 which more accurately describes my problem. Apparently that bug is a this is a dupe of this bug which IMHO is not really the same, but whatever...

I'm running 13.03 on Lenovo X220

Has anyone looked at?
https://bugs.freedesktop.org/show_bug.cgi?id=47535#c14

Did this patch get lost along the way?

Regards,

Revision history for this message

In freedesktop.org Bugzilla #54226, Cwawak (cwawak) wrote on 2013-05-29:

#142

Created attachment 79979
i915_error_state - 3.9.2-201.rhbz879823.fc18.x86_64 (included patch write mbox regs twice on snb, v2)

Linux bobloblaw 3.9.2-201.rhbz879823.fc18.x86_64 #1 SMP Thu May 16 13:35:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[45482.757631] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[45482.757645] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[45482.766942] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
[45482.770617] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

I added patch (drm/i915: Resurrect ring kicking for semaphores, selectively) to Fedora 18's 3.9.2-200 x86_64 kernel.

Revision history for this message

In freedesktop.org Bugzilla #54226, Cwawak (cwawak) wrote on 2013-06-30:

#143

Is there any input or assistance I can give to help move this along?

Thanks!

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-07-20:

#144

Created attachment 82747
New read-after-write patch

New patch for testing, thanks!

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-07-20:

#145

Created attachment 82748
New read-after-write patch

Revision history for this message

In freedesktop.org Bugzilla #54226, Mikhail Gavrilov (mikegav) wrote on 2013-07-20:

#146

For which version of the kernel this patch?

Revision history for this message

In freedesktop.org Bugzilla #54226, Longerdev (longerdev) wrote on 2013-07-21:

#147

Download full text (4.6 KiB)

I tried it patch on linux-3.11_rc1, but when X starting I see:
791966 Jul 21 16:17:07 localhost kernel: [ 19.320879] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
791967 Jul 21 16:17:07 localhost kernel: [ 19.320948] IP: [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791968 Jul 21 16:17:07 localhost kernel: [ 19.320995] PGD b0d80067 PUD b0c18067 PMD 0
791969 Jul 21 16:17:07 localhost kernel: [ 19.321031] Oops: 0000 [#1] PREEMPT SMP
791970 Jul 21 16:17:07 localhost kernel: [ 19.321064] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec brcmsmac snd_hwdep snd_p cm cordic brcmutil bcma snd_page_alloc snd_timer snd soundcore
791971 Jul 21 16:17:07 localhost kernel: [ 19.321209] CPU: 0 PID: 2696 Comm: X Not tainted 3.11.0-rc1 #1
791972 Jul 21 16:17:07 localhost kernel: [ 19.321249] Hardware name: SAMSUNG ELECTRONICS CO., LTD. SF311/SF411/SF511/SF311/SF411/SF511, BIOS 06HW.M011.20110503.SCY 05 /03/2011
791973 Jul 21 16:17:07 localhost kernel: [ 19.321322] task: ffff8800b1c07590 ti: ffff8800b0c24000 task.ti: ffff8800b0c24000
791974 Jul 21 16:17:07 localhost kernel: [ 19.321370] RIP: 0010:[<ffffffff8136bfc0>] [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791975 Jul 21 16:17:07 localhost kernel: [ 19.321426] RSP: 0018:ffff8800b0c25bc8 EFLAGS: 00010286
791976 Jul 21 16:17:07 localhost kernel: [ 19.321461] RAX: 0000000000000000 RBX: ffff8800b1c3d4d8 RCX: 0000000000027330
791977 Jul 21 16:17:07 localhost kernel: [ 19.321506] RDX: 0000000000000080 RSI: ffffc900045c003c RDI: ffffc900045c0038
791978 Jul 21 16:17:07 localhost kernel: [ 19.321550] RBP: ffff8800b0c25c08 R08: ffff8800b0d97f00 R09: 00000000000145c0
791979 Jul 21 16:17:07 localhost kernel: [ 19.321594] R10: 0000000000001000 R11: ffff8800b1c3c000 R12: 0000000000000000
791980 Jul 21 16:17:07 localhost kernel: [ 19.321638] R13: 0000000000002044 R14: 0000000000000000 R15: ffff8800b1c3c000
791981 Jul 21 16:17:07 localhost kernel: [ 19.321682] FS: 00007ff167ae8880(0000) GS:ffff880100200000(0000) knlGS:0000000000000000
791982 Jul 21 16:17:07 localhost kernel: [ 19.321732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
791983 Jul 21 16:17:07 localhost kernel: [ 19.321767] CR2: 0000000000000010 CR3: 00000000b1cc9000 CR4: 00000000000407f0
791984 Jul 21 16:17:07 localhost kernel: [ 19.321810] Stack:
791985 Jul 21 16:17:07 localhost kernel: [ 19.321824] ffff8800b1c3d4d8 0000000000000000 ffff8800aff24000 0000000000000000
791986 Jul 21 16:17:07 localhost kernel: [ 19.321876] ffff8800b1c3c000 ffff8800b0d97f00 ffff8800b1f66a00 ffff8800b1c3d4d8
791987 Jul 21 16:17:07 localhost kernel: [ 19.321927] ffff8800b0c25c68 ffffffff81334b11 ffff880000000028 0000000000000000
791988 Jul 21 16:17:07 localhost kernel: [ 19.321979] Call Trace:
791989 Jul 21 16:17:07 localhost kernel: [ 19.322000] [<ffffffff81334b11>] __i915_add_request+0x6d/0x215
791990 Jul 21 16:17:07 localhost kernel: [ 19.322045] [<ffffffff8133b8d9>] i915_gem_do_execbuffer.isra.14+0xd07/0xdc5
791991 Jul 21 16:17:07 localhost kernel: [ 19.322089] [<ffffffff8133bd5e>] ? i915_gem_execbuffer2+0x5d/0x1e3
791992 Jul 21 1...

I tried it patch on linux-3.11_rc1, but when X starting I see:
791966 Jul 21 16:17:07 localhost kernel: [   19.320879] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
791967 Jul 21 16:17:07 localhost kernel: [   19.320948] IP: [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791968 Jul 21 16:17:07 localhost kernel: [   19.320995] PGD b0d80067 PUD b0c18067 PMD 0
791969 Jul 21 16:17:07 localhost kernel: [   19.321031] Oops: 0000 [#1] PREEMPT SMP
791970 Jul 21 16:17:07 localhost kernel: [   19.321064] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec brcmsmac snd_hwdep snd_p       cm cordic brcmutil bcma snd_page_alloc snd_timer snd soundcore
791971 Jul 21 16:17:07 localhost kernel: [   19.321209] CPU: 0 PID: 2696 Comm: X Not tainted 3.11.0-rc1 #1
791972 Jul 21 16:17:07 localhost kernel: [   19.321249] Hardware name: SAMSUNG ELECTRONICS CO., LTD. SF311/SF411/SF511/SF311/SF411/SF511, BIOS 06HW.M011.20110503.SCY 05       /03/2011
791973 Jul 21 16:17:07 localhost kernel: [   19.321322] task: ffff8800b1c07590 ti: ffff8800b0c24000 task.ti: ffff8800b0c24000
791974 Jul 21 16:17:07 localhost kernel: [   19.321370] RIP: 0010:[<ffffffff8136bfc0>]  [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791975 Jul 21 16:17:07 localhost kernel: [   19.321426] RSP: 0018:ffff8800b0c25bc8  EFLAGS: 00010286
791976 Jul 21 16:17:07 localhost kernel: [   19.321461] RAX: 0000000000000000 RBX: ffff8800b1c3d4d8 RCX: 0000000000027330
791977 Jul 21 16:17:07 localhost kernel: [   19.321506] RDX: 0000000000000080 RSI: ffffc900045c003c RDI: ffffc900045c0038
791978 Jul 21 16:17:07 localhost kernel: [   19.321550] RBP: ffff8800b0c25c08 R08: ffff8800b0d97f00 R09: 00000000000145c0
791979 Jul 21 16:17:07 localhost kernel: [   19.321594] R10: 0000000000001000 R11: ffff8800b1c3c000 R12: 0000000000000000
791980 Jul 21 16:17:07 localhost kernel: [   19.321638] R13: 0000000000002044 R14: 0000000000000000 R15: ffff8800b1c3c000
791981 Jul 21 16:17:07 localhost kernel: [   19.321682] FS:  00007ff167ae8880(0000) GS:ffff880100200000(0000) knlGS:0000000000000000
791982 Jul 21 16:17:07 localhost kernel: [   19.321732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
791983 Jul 21 16:17:07 localhost kernel: [   19.321767] CR2: 0000000000000010 CR3: 00000000b1cc9000 CR4: 00000000000407f0
791984 Jul 21 16:17:07 localhost kernel: [   19.321810] Stack:
791985 Jul 21 16:17:07 localhost kernel: [   19.321824]  ffff8800b1c3d4d8 0000000000000000 ffff8800aff24000 0000000000000000
791986 Jul 21 16:17:07 localhost kernel: [   19.321876]  ffff8800b1c3c000 ffff8800b0d97f00 ffff8800b1f66a00 ffff8800b1c3d4d8
791987 Jul 21 16:17:07 localhost kernel: [   19.321927]  ffff8800b0c25c68 ffffffff81334b11 ffff880000000028 0000000000000000
791988 Jul 21 16:17:07 localhost kernel: [   19.321979] Call Trace:
791989 Jul 21 16:17:07 localhost kernel: [   19.322000]  [<ffffffff81334b11>] __i915_add_request+0x6d/0x215
791990 Jul 21 16:17:07 localhost kernel: [   19.322045]  [<ffffffff8133b8d9>] i915_gem_do_execbuffer.isra.14+0xd07/0xdc5
791991 Jul 21 16:17:07 localhost kernel: [   19.322089]  [<ffffffff8133bd5e>] ? i915_gem_execbuffer2+0x5d/0x1e3
791992 Jul 21 16:17:07 localhost kernel: [   19.322128]  [<ffffffff8133be5a>] i915_gem_execbuffer2+0x159/0x1e3
791993 Jul 21 16:17:07 localhost kernel: [   19.322170]  [<ffffffff8130e167>] drm_ioctl+0x302/0x446
791994 Jul 21 16:17:07 localhost kernel: [   19.322204]  [<ffffffff8133bd01>] ? i915_gem_execbuffer+0x36a/0x36a
791995 Jul 21 16:17:07 localhost kernel: [   19.322245]  [<ffffffff8102a823>] ? __do_page_fault+0x34f/0x3f3
791996 Jul 21 16:17:07 localhost kernel: [   19.322285]  [<ffffffff810d3621>] vfs_ioctl+0x21/0x34
791997 Jul 21 16:17:07 localhost kernel: [   19.322317]  [<ffffffff810d3e7a>] do_vfs_ioctl+0x3b8/0x3fb
791998 Jul 21 16:17:07 localhost kernel: [   19.322353]  [<ffffffff810dbab9>] ? fget_light+0xa1/0xb8
791999 Jul 21 16:17:07 localhost kernel: [   19.322387]  [<ffffffff810d3efd>] SyS_ioctl+0x40/0x6b
792000 Jul 21 16:17:07 localhost kernel: [   19.322420]  [<ffffffff816450d2>] system_call_fastpath+0x16/0x1b
792001 Jul 21 16:17:07 localhost kernel: [   19.322457] Code: e8 d4 c0 f0 ff 8b 73 2c 44 89 ef 83 c6 04 89 73 2c 48 03 73 10 e8 bf c0 f0 ff 8b 73 2c 48 8b 45 c8 83 c6 0       4 89 73 2c 48 03 73 10 <8b> 78 10 83 ef 80 e8 a3 c0 f0 ff 83 43 2c 04 49 ff c4 49 83 fc
792002 Jul 21 16:17:07 localhost kernel: [   19.322688] RIP  [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
792003 Jul 21 16:17:07 localhost kernel: [   19.322728]  RSP <ffff8800b0c25bc8>
792004 Jul 21 16:17:07 localhost kernel: [   19.322750] CR2: 0000000000000010
792005 Jul 21 16:17:07 localhost kernel: [   19.330669] ---[ end trace b13215eb98a2df5f ]---

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-07-21:

#148

Created attachment 82768
New read-after-write patch

Oops, my mistake, please try again.

Revision history for this message

In freedesktop.org Bugzilla #54226, Longerdev (longerdev) wrote on 2013-07-21:

#149

Created attachment 82773
i915_error_state with new patch

(In reply to comment #92)
> Created attachment 82768 [details] [review]
> New read-after-write patch
>
> Oops, my mistake, please try again.

Now loading, but after five minutes test:
793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 ctx 1) at 0xbfe21dc

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-07-21:

#150

(In reply to comment #93)
> Created attachment 82773 [details]
> i915_error_state with new patch
>
> (In reply to comment #92)
> > Created attachment 82768 [details] [review] [review]
> > New read-after-write patch
> >
> > Oops, my mistake, please try again.
>
> Now loading, but after five minutes test:
> 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> ctx 1) at 0xbfe21dc

That is a blorp (mesa/i965) bug and not the semaphore deadlock.

Revision history for this message

Tomasz Melcer (liori) wrote on 2013-07-26:

#151

I can reproduce this problem fairly regularly on a certain game inside wine (and nowhere else) on my up-to-date Debian Sid box. If there's any solution/patch to test and there's any benefit in testing it on a non-Ubuntu system, I volunteer to help.

Revision history for this message

Chris Wilson (ickle) wrote on 2013-07-26:

#152

Sure try https://bugs.freedesktop.org/attachment.cgi?id=82768

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-08-11:

#153

Will someone please try https://bugs.freedesktop.org/attachment.cgi?id=82768 with a working mesa! :)

Bug Watch Updater (bug-watch-updater) on 2013-08-22

Changed in xserver-xorg-video-intel:
status:	Confirmed → Incomplete

Revision history for this message

In freedesktop.org Bugzilla #54226, Andy Lutomirski (luto-mit) wrote on 2013-08-24:

#154

The patch seems to have helped -- my box survived a couple days with the patch applied.

Bug Watch Updater (bug-watch-updater) on 2013-08-24

Changed in xserver-xorg-video-intel:
status:	Incomplete → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-08-25:

#155

The bad news is that I've just had the semaphore hang with all the read-after-write patch applied. :|

Revision history for this message

In freedesktop.org Bugzilla #54226, Januszmk6 (januszmk6) wrote on 2013-09-03:

#156

(In reply to comment #94)
> (In reply to comment #93)
> > Created attachment 82773 [details]
> > i915_error_state with new patch
> >
> > (In reply to comment #92)
> > > Created attachment 82768 [details] [review] [review] [review]
> > > New read-after-write patch
> > >
> > > Oops, my mistake, please try again.
> >
> > Now loading, but after five minutes test:
> > 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> > 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> > 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> > [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> > 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> > error event; look for more information in
> > /sys/kernel/debug/dri/0/i915_error_state
> > 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> > ctx 1) at 0xbfe21dc
>
> That is a blorp (mesa/i965) bug and not the semaphore deadlock.
Could you please provide some link to this blorp bug report?
I had problem with semaphore deadlock, seems that with kernel 3.11 problem does not occur (without patch), but now I have:

[22221.843000] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[22221.843483] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4dfb5000 ctx 1) at 0x4dfb5518

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-09-04:

#157

*** Bug 68913 has been marked as a duplicate of this bug. ***

Alan Pope 🍺🐧🐱 🦄 (popey) on 2013-09-08

description:

updated

Revision history for this message

In freedesktop.org Bugzilla #54226, Dan Doel (dan-doel) wrote on 2013-09-08:

#158

I have, I think, a reliable way to trigger this behavior, if that helps. It requires a non-trivial setup, though.

I have gnome-shell running on dual monitors. The first is 1920x1200, the second is 1920x1080 (not sure if the resolution difference matters). If I run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in the dmesg about hangcheck timers and kickrings ("stuck wait on blitter ring").

I believe OpenGL acceleration of the desktop is important, because the freezes are not triggered in fluxbox, for instance. I'm not sure if the game itself needs to be using OpenGL, or if the full-screen window is the triggering factor, or something else entirely. It is important that the game keep the monitors distinct, and only go full screen on one. I just tried it on Battle for Wesnoth, and full screen there sets the monitors to mirror, which doesn't trigger the problem.

This is on an i7 4770, if that matters.

I realize this is may be difficult to put together for a test setup, but I thought I'd mention it.

Revision history for this message

In freedesktop.org Bugzilla #54226, Januszmk6 (januszmk6) wrote on 2013-09-08:

#159

(In reply to comment #100)
> I have, I think, a reliable way to trigger this behavior, if that helps. It
> requires a non-trivial setup, though.
>
> I have gnome-shell running on dual monitors. The first is 1920x1200, the
> second is 1920x1080 (not sure if the resolution difference matters). If I
> run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in
> the dmesg about hangcheck timers and kickrings ("stuck wait on blitter
> ring").
>
> I believe OpenGL acceleration of the desktop is important, because the
> freezes are not triggered in fluxbox, for instance. I'm not sure if the game
> itself needs to be using OpenGL, or if the full-screen window is the
> triggering factor, or something else entirely. It is important that the game
> keep the monitors distinct, and only go full screen on one. I just tried it
> on Battle for Wesnoth, and full screen there sets the monitors to mirror,
> which doesn't trigger the problem.
>
> This is on an i7 4770, if that matters.
>
> I realize this is may be difficult to put together for a test setup, but I
> thought I'd mention it.

I also have dual monitors and also gnome-shell, but I have on both 1920x1080px. I notice that when I am watching some videos on full screen on one monitor, this is happening more often (on non full-screen work, it's still happening)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-09-08:

#160

(In reply to comment #100)
> This is on an i7 4770, if that matters.

No, that's something completely new. Please open a new bug report and attach your dmesg, Xorg.0.log and /sys/drm/card0/error from after one of the hangs.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-09-11:

#161

I filed the duplicate of this bug 1222261 as I get this frequently on both my i7 laptop and i7 desktop.

I think I found a way to reproduce it easily on demand. Open chromium, and visit google maps. Sign up for "new maps". Visit somewhere on the map and zoom all the way in then zoom in and out a bit. I have triggered this bug a few times just by doing that. I'm not using dual screens or anything funky. Just a basic Ubuntu 13.10 install with Unity.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-09-11:

#162

Forgot to mention, I am using the semaphores kernel parameter...

alan@wopr:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.11.0-4-generic root=UUID=3d53ca44-6af4-422d-b1b0-adf30c679a2f ro quiet splash i915.semaphores=0 vt.handoff=7

Revision history for this message

Lorant Nemeth (loci) wrote on 2013-09-16: Re: [Bug 1041790] Re: [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

#163

Hi,

I can confirm, that this way I'm able to reproduce the bug as well.

Br,
Loci

On 09/11/2013 03:02 AM, Alan Pope ㋛ wrote:
> I filed the duplicate of this bug 1222261 as I get this frequently on
> both my i7 laptop and i7 desktop.
>
> I think I found a way to reproduce it easily on demand. Open chromium,
> and visit google maps. Sign up for "new maps". Visit somewhere on the
> map and zoom all the way in then zoom in and out a bit. I have triggered
> this bug a few times just by doing that. I'm not using dual screens or
> anything funky. Just a basic Ubuntu 13.10 install with Unity.
>

Revision history for this message

bharath (bharath1097) wrote on 2013-09-23:

#164

(In reply to #163)

I can reproduce this bug as well on a sandybridge i5 desktop

Revision history for this message

mike@papersolve.com (mike-papersolve) wrote on 2013-09-29:

#165

I can also reproduce the bug using the new Google Maps (easiest way to do so seems to be zooming in/out). However it occurs for me even after I disable semaphores:

root@hawty:~# cat /sys/module/i915/parameters/semaphores
0

(after that):
[614445.590357] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[614445.590468] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5268000 ctx 10) at 0x5268210

this is ubuntu 13.04 with 2.21.6-0ubuntu4 intel driver, but kernel 3.11.1.

Revision history for this message

Chris Wilson (ickle) wrote on 2013-09-29:

#166

That's because that would be NOT the same bug.

Revision history for this message

mike@papersolve.com (mike-papersolve) wrote on 2013-09-29:

#167

Sorry. :) Unfortunately/fortunately I can no longer reproduce it after doing a BIOS update on my ASUS motherboard (was about 18 months behind). Based on your description of the bug as possibly related to the BIOS I decided to do that update, and even though it's only been about 10 minutes of usage, I can't reproduce this in Google Maps at all, where it was sure to cause the bug. I'd certainly recommend that anyone having this issue investigate a BIOS upgrade.

Revision history for this message

In freedesktop.org Bugzilla #54226, Yjcoshc (yjcoshc) wrote on 2013-10-04:

#168

Created attachment 87101
i915_error_state (kernel 3.11.3)

Revision history for this message

In freedesktop.org Bugzilla #54226, Yjcoshc (yjcoshc) wrote on 2013-10-04:

#169

After playing hedgewars for about half an hour, the gpu started to hang.
dmesg output:
[ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3442.907471] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5e52000 ctx 1) at 0x5e52220
[ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
I'm not sure my problem is related to this bug.

Revision history for this message

In freedesktop.org Bugzilla #54226, Yjcoshc (yjcoshc) wrote on 2013-10-04:

#170

(In reply to comment #104)
> After playing hedgewars for about half an hour, the gpu started to hang.
> dmesg output:
> [ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3442.907471] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x5e52000 ctx 1) at 0x5e52220
> [ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
> [ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for
> forcewake old ack to clear.
> I'm not sure my problem is related to this bug.

My laptop is Thinkpad T420 with i5-2520M. The BIOS version is 1.44.

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-10-04:

#171

(In reply to comment #104)
> I'm not sure my problem is related to this bug.

Most likely it isn't - gpu hang is similar to an application crashing. Please file a new bug report and don't forget to attach the error state file. That's the first thing we need to triage the bug.

And of course list the versions of all the userspace driver parts (mesa, ddx, ...) since like a normal application crash most often it's not a kernel bug, but a bug in the render commands submitted by userspace to the gpu.

Revision history for this message

In freedesktop.org Bugzilla #54226, Longerdev (longerdev) wrote on 2013-10-04:

#172

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Why userspace drivers can breaking render and calling error in kernel part of driver? May be can add "filter" sent commands and ignore (or other reaction, but not execute their) their?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-10-04:

#173

(In reply to comment #107)
> (In reply to comment #106)
> > (In reply to comment #104)
> > > I'm not sure my problem is related to this bug.
> >
> > Most likely it isn't - gpu hang is similar to an application crashing.
> > Please file a new bug report and don't forget to attach the error state
> > file. That's the first thing we need to triage the bug.
> >
> > And of course list the versions of all the userspace driver parts (mesa,
> > ddx, ...) since like a normal application crash most often it's not a kernel
> > bug, but a bug in the render commands submitted by userspace to the gpu.
>
> Why userspace drivers can breaking render and calling error in kernel part
> of driver? May be can add "filter" sent commands and ignore (or other
> reaction, but not execute their) their?

The GPU is a full Turing complete computational engine (in fact, lots of them coupled in parallel and in series), see http://xkcd.com/1266/

Revision history for this message

Shuhao (shuhao) wrote on 2013-10-04:

#174

Does anyone here notice graphics slow down? After a couple minutes of game play with CSS, my framerate would drop to about 20fps (status of other activities on the computer will also slowdown).

This did not happen in 13.04 but is occuring with 13.10.

Revision history for this message

In freedesktop.org Bugzilla #54226, Yjcoshc (yjcoshc) wrote on 2013-10-05:

#175

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Someone has reported it here.
https://bugs.freedesktop.org/show_bug.cgi?id=70151

Revision history for this message

In freedesktop.org Bugzilla #54226, Honza-h (honza-h) wrote on 2013-10-08:

#176

Hello. Same problem here.

[ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8
[ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8
[ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220

sys: fedora 19 64b
Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

WM: KDE with effects enabled

8G ram
300G SATA HDD
ntb Lenovo ThinkPad E320

problem occurs in:
- scrolling in firefox
- playing video in vlc and switch to KDE terminal or another app
- sometimes system hangs, cpu 100%, freeze and hard reboot needed
- sometimes happens if I work with ff or in terminal only (very frustrating)
- happening across many kernel versions 3.0 to newest I think

lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

Hello. Same problem here.

[  485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[  485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[  485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8
[  821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[  821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8
[ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220

sys: fedora 19 64b
Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

WM: KDE with effects enabled

8G ram
300G SATA HDD
ntb Lenovo ThinkPad E320

problem occurs in:
- scrolling in firefox
- playing video in vlc and switch to KDE terminal or another app
- sometimes system hangs, cpu 100%, freeze and hard reboot needed
- sometimes happens if I work with ff or in terminal only (very frustrating)
- happening across many kernel versions 3.0 to newest I think

lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-10-08:

#177

(In reply to comment #110)
> Hello. Same problem here.
>
> [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 485.443467] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0xa637000 ctx 1) at 0xa6371c8

Unlikey that this is the same gpu hang. Please file a new bug report and attach the error state.

Revision history for this message

In freedesktop.org Bugzilla #54226, theghost (theghost) wrote on 2013-10-15:

#178

Just a few remarks.
I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups highly increased (especially in games).
Additionally with running the latest drivers complete system lockups are gone, but it's still a lockup for multiple seconds with following VT switching.
Maybe these observations help somehow.

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-10-16:

#179

(In reply to comment #112)
> Just a few remarks.
> I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
> Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups
> highly increased (especially in games).

On snb the blorp engine in mesa has become a bit more hang-happy, see bug #70151
Not all gpu hangs are created equal ;-)

> Additionally with running the latest drivers complete system lockups are
> gone, but it's still a lockup for multiple seconds with following VT
> switching.

You mean a gpu hang happens while when doing a vt switch?

Revision history for this message

In freedesktop.org Bugzilla #54226, theghost (theghost) wrote on 2013-10-16:

#180

(In reply to comment #113)
> On snb the blorp engine in mesa has become a bit more hang-happy, see bug
> #70151
> Not all gpu hangs are created equal ;-)
>

Actually it was on Sandybridge.

> You mean a gpu hang happens while when doing a vt switch?

No I meant, if you suffer a lockup you just have to wait a few seconds and switch to another VT and back, then you can resume with your system (although sometimes fonts are broken).

Revision history for this message

In freedesktop.org Bugzilla #54226, Alexander (bay-hackerdom) wrote on 2013-10-19:

#181

Created attachment 87857
i915_error_state

I also met this bug while I was watching video in mplayer. It every 1-2 hours.

[40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[40787.765852] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1fb63000 ctx 1) at 0x1fb63220

Revision history for this message

In freedesktop.org Bugzilla #54226, Alexander (bay-hackerdom) wrote on 2013-10-19:

#182

Created attachment 87858
X -version output

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2013-10-19:

#183

(In reply to comment #115)
> Created attachment 87857 [details]
> i915_error_state
>
> I also met this bug while I was watching video in mplayer. It every 1-2
> hours.
>
> [40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [40787.765852] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x1fb63000 ctx 1) at 0x1fb63220

This looks like bug #70151, but is definitely not this bug here.

Revision history for this message

Thomas Mayer (thomas303) wrote on 2013-10-28:

#184

It seems that ubuntu 12.04.3 is also affected.

I get the error using ubuntu 12.04.3 (after upgrading from 12.04.2 in the last days):
Oct 28 18:43:29 localhost kernel: [31236.041655] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Oct 28 18:43:29 localhost kernel: [31236.041664] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Oct 28 18:43:32 localhost kernel: [31239.040790] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Oct 28 18:43:32 localhost kernel: [31239.041127] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Oct 28 18:43:32 localhost kernel: [31239.041132] [drm:i915_reset] *ERROR* Failed to reset chip.
Oct 28 18:43:38 localhost gnome-session[3983]: WARNING: App 'gnome-wm.desktop' respawning too quickly
Oct 28 18:43:38 localhost gnome-session[3983]: CRITICAL: We failed, but the fail whale is dead. Sorry....

Kernel version:
3.8.0-32-generic #47~precise1-Ubuntu SMP Wed Oct 2 16:19:35 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

xserver-xorg-video-intel-lts-raring version 2:2.21.6-0ubuntu4.1~precise1

For me the error occurs when I move the mouse cursor in the PhpStorm IDE, which is based on oracle java (I use version 1.8). The error occurs every few hours when working with PhpStorm 7.0

Revision history for this message

theghost (theghost) wrote on 2013-10-29:

#185

@thomas303: Since 12.04.3 uses the video stack and kernel of Raring it's no wonder that it's also affected.
If you didn't have the errors before 12.04.3 you can still revert to the video stack / kernel of 12.04.2 (Quantal) or 12.04 (Precise).

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2013-10-31:

#186

I have been running kernel 3.12.0-031200rc6-generic for a while now and in 8 days uptime I haven't had any lockups that I recall. Previously on older kernels on 13.10 I would get more than one lockup a day, sometimes many a day.

Revision history for this message

theghost (theghost) wrote on 2013-10-31:

#187

@popey:

I tested kernel 3.12.0-031200rc7-generic with Mesa 9.2.2 and xf86-video-intel-2.99.905 running Dota 2 which, is a useful test case to produce hangs and I can assure that there are still plenty lockups. Only the output differs now:

[ 2937.818867] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1c703000 ctx 1) at 0x1c7037e0
[ 2943.810157] [drm] stuck on render ring
[ 2943.810208] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x7a57000 ctx 1) at 0x5c
[ 3152.914976] [drm] stuck on render ring
[ 3152.915045] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x7549000 ctx 1) at 0x7549288
[ 3568.158967] [drm] stuck on render ring
[ 3568.174992] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x12d63000 ctx 1) at 0x12d637e0
[ 3568.175030] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x1b78f000 ctx 1) at 0x12d637e0
[ 3839.310462] [drm] stuck on render ring
[ 3839.310463] [drm] stuck on blitter ring
[ 4292.575683] [drm] stuck on render ring
[ 4292.575684] [drm] stuck on blitter ring

So it's still in the kernel. ;)

Revision history for this message

In freedesktop.org Bugzilla #54226, Yjcoshc (yjcoshc) wrote on 2013-11-16:

#188

Created attachment 89314
i915_error_state (kernel 3.11.6, mesa 9.2.2, xf86-video-intel 2.99.906)

GPU hangs after playing hedgewars for a few minutes. Thinkpad T420 laptop, i5-2520M.
dmesg error message:
[16901.286432] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16901.286441] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[16901.286444] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[16908.287504] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16908.287508] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring

Revision history for this message

theghost (theghost) wrote on 2013-11-16:

#189

If you have these problems running Dota 2, you should try Mesa Git or wait for Mesa 10. It contains several patches to remove lockups.
For me on Dota 2 the lockups are completely gone, probably they're also gone in other applications.

Revision history for this message

In freedesktop.org Bugzilla #54226, Kenxeth (kenxeth) wrote on 2013-11-21:

#190

*** Bug 71890 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-11-26:

#191

*** Bug 72048 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2013-12-18:

#192

*** Bug 72829 has been marked as a duplicate of this bug. ***

Revision history for this message

penalvch (penalvch) wrote on 2014-01-07:

#193

Rocko, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux REPLACE-WITH-BUG-NUMBER

Please note, given that the information from the prior release is already available, doing this on a release prior to the development one would not be helpful.

If reproducible, could you also please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc7

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

description:	updated
tags:	added: bios-outdated-a12
Changed in linux (Ubuntu):
importance:	High → Low
status:	Confirmed → Incomplete
tags:	added: needs-upstream-testing regression-potential

Revision history for this message

Hamish MacEwan (hamish-macewan) wrote on 2014-01-08: Re: [Bug 1041790] Re: [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

#194

On 8 January 2014 07:20, Christopher M. Penalver
<email address hidden> wrote:

> Rocko, this bug was reported a while ago and there hasn't been any
> activity in it recently. We were wondering if this is still an issue? If
> so, could you please test for this with the latest development release
> of Ubuntu? ISO images are available from http://cdimage.ubuntu.com
> /daily-live/current/ .

Hi Christopher, I'm not Rocko, but haven't had any trouble with this
bug on Debian of late.

Hamish.
--
http://About.me/Hamish.MacEwan

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-01-15:

#195

*** Bug 73659 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, A-bugzilla (a-bugzilla) wrote on 2014-01-24:

#196

Created attachment 92710
i915_error_state

I'm also getting regular Sandybridge GPU lockups with Mesa 10.0.1 and Linux kernel 3.13.

dmesg output:

[ 918.876872] [drm] stuck on render ring
[ 918.876876] [drm] stuck on blitter ring
[ 918.876878] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 918.876879] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 918.876879] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 918.876880] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 918.876880] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 932.923240] [drm] stuck on render ring
[ 932.923242] [drm] stuck on blitter ring

Unfortunately the crash dump doesn't help - it's an empty file!

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-01-29:

#197

*** Bug 74180 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-01-31:

#198

*** Bug 74265 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-02-03:

#199

*** Bug 74452 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-02-03:

#200

*** Bug 74473 has been marked as a duplicate of this bug. ***

Revision history for this message

Adam Conrad (adconrad) wrote on 2014-02-07:

#201

This is still happening (although very infrequently) on current trusty. I just hit it this morning.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-02-12:

#202

*** Bug 74867 has been marked as a duplicate of this bug. ***

Revision history for this message

Steven Goris (sg-steven13) wrote on 2014-02-15:

#203

I experience this bug on Linux Mint 16 Cinnamon. It drives me crazy. My computer hangs approx every 2 hours. I tried the fix in grub. I hope it works as a temporary fix, because I can't work like this on my computer.
Linux 3.11.0-15-generic

no longer affects:

linuxmint

Revision history for this message

unksi (unksi) wrote on 2014-02-15:

#204

I found it a big help to switch to tty1 with ctrl+alt+F1 and then back with ctrl+alt+F7/F8. This would make it return to normal a lot faster, and unstuck it most of the times it seems to be totally stuck.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-02-18:

#205

*** Bug 75163 has been marked as a duplicate of this bug. ***

Revision history for this message

Spect (al106208) wrote on 2014-02-28:

#206

I experience this bug on Ubuntu 12.04.4.
System: Ubuntu 12.04.4 LTS x86_64
Kernel: 3.11.0-17-generic DE: Unity Session: ubuntu
Use: xserver-xorg-video-intel-lts-saucy vers.2:2.99.904-0ubuntu2.1~precise1
----------------------------------
Processor: Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz Memory (Gb): 7.53
Video: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Subsystem: Gigabyte Technology Co., Ltd Device d000 Kernel driver in use: i915
----------------------------------
kern.log:
Feb 28 15:24:44 specttop kernel: [42677.565850] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1c85f000 ctx 1) at 0x1c85f220
Feb 28 15:24:44 specttop kernel: [42677.565904] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x4d8f000 ctx 0) at 0x1c85f220

Revision history for this message

In freedesktop.org Bugzilla #54226, Simtn (simtn) wrote on 2014-03-04:

#207

Created attachment 95090
Another version of the same hang - directed here from bug 75502

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-03-10:

#208

*** Bug 75999 has been marked as a duplicate of this bug. ***

Bug Watch Updater (bug-watch-updater) on 2014-03-14

Changed in xserver-xorg-video-intel:
status:	Confirmed → In Progress

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-03-20:

#209

*** Bug 76408 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-03-27:

#210

*** Bug 76677 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-03-30:

#211

*** Bug 76801 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Phil Turmel (pturmel-lp) wrote on 2014-03-30:

#212

For what its worth, running 3.13.7 greatly mitigates this bug, to where the dead time is barely noticeable. It happened three times in short order here and I didn't notice any of them:

[ 4562.551141] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4582.530028] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4633.476199] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-04-04:

#213

*** Bug 77043 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-04-04:

#214

*** Bug 77058 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Phil Turmel (pturmel-lp) wrote on 2014-04-05:

#215

My stuck ring faults are completely gone with i915.i915_enable_rc6=0. Fan stays on a bit more (subjectively) seems to be the only side effect. HP Pavilion dv6 (Sandybridge).

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-04-05:

#216

Oh that's interesting. We might be able to find a register to prevent rc6 whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just frob forcewake directly.)

Revision history for this message

In freedesktop.org Bugzilla #54226, Phil Turmel (pturmel-lp) wrote on 2014-04-06:

#217

(In reply to comment #139)
> Oh that's interesting. We might be able to find a register to prevent rc6
> whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just
> frob forcewake directly.)

Happy to test patches. I'm updating to 3.13.9 tonight. I could add something on top if you have ideas. If you need more info than my attachment to #76801 just let me know.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-04-07:

#218

*** Bug 77147 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-04-30:

#219

*** Bug 77974 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-05-06:

#220

*** Bug 78317 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Hoek (artjom-simon) wrote on 2014-05-06:

#221

Created attachment 98589
Kernel 3.14.2-1-ARCH, xf86-video-intel 2.99.911-2, mesa 10.1.2-1

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-05-16:

#222

*** Bug 78785 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-01:

#223

*** Bug 79500 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-04:

#224

*** Bug 79640 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Jani-nikula (jani-nikula) wrote on 2014-06-10:

#225

commit ca79d888eb63cdacf80653ae23ce8f7d9ac52c68
Author: Chris Wilson <email address hidden>
Date: Fri Jun 6 10:22:29 2014 +0100

drm/i915: Reorder semaphore deadlock check

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-15:

#226

*** Bug 80055 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-17:

#227

*** Bug 80125 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-17:

#228

*** Bug 80168 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-23:

#229

*** Bug 80401 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-06-27:

#230

*** Bug 80592 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-05:

#231

*** Bug 80935 has been marked as a duplicate of this bug. ***

Revision history for this message

DooMMasteR (winrootkit) wrote on 2014-07-08:

#232

error log Edit (393.0 KiB, application/octet-stream)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-08:

#233

*** Bug 81064 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Kurt Roeckx (kurt-roeckx) wrote on 2014-07-08:

#234

Can someone indicate what the current status of this is?

Revision history for this message

In freedesktop.org Bugzilla #54226, Yunloh (yunloh) wrote on 2014-07-08:

#235

I haven't seen it with xorg-x11-drv-intel-2.99.912-4 (built for fc20) from kojipkgs.

Revision history for this message

In freedesktop.org Bugzilla #54226, Kurt Roeckx (kurt-roeckx) wrote on 2014-07-08:

#236

I'm using 2.21.15 which as far as I know is the latest release.

Revision history for this message

In freedesktop.org Bugzilla #54226, Andre Robatino (robatino) wrote on 2014-07-11:

#237

I am seeing

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

followed by a graphics freeze and the need to reboot (if I can) in Fedora 20 with the latest updates including the 3.15.4 kernel.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-16:

#238

*** Bug 81402 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Matteo Croce (teknoraver) wrote on 2014-07-16:

#239

same happens with 3.15.0 on Ubuntu 14.04 64 bit

Jul 11 12:43:41 localhost kernel: [42049.462542] [drm] stuck on render ring
Jul 11 12:43:41 localhost kernel: [42049.463330] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:41 localhost kernel: [42049.463334] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 11 12:43:41 localhost kernel: [42049.463335] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jul 11 12:43:41 localhost kernel: [42049.463336] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 11 12:43:41 localhost kernel: [42049.463337] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jul 11 12:43:41 localhost kernel: [42049.463338] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jul 11 12:43:43 localhost kernel: [42051.464623] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:47 localhost kernel: [42055.468816] [drm] stuck on render ring
Jul 11 12:43:47 localhost kernel: [42055.469614] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:49 localhost kernel: [42057.470899] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:53 localhost kernel: [42061.439056] [drm] stuck on render ring
Jul 11 12:43:53 localhost kernel: [42061.439867] [drm] GPU HANG: ecode 0:0xfeffffff, in chrome [2172], reason: Ring hung, action: reset

Revision history for this message

In freedesktop.org Bugzilla #54226, Cwawak (cwawak) wrote on 2014-07-17:

#240

[872948.822279] [drm] stuck on render ring
[872948.822291] [drm] stuck on blitter ring
[872948.823041] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [30647], reason: Ring hung, action: reset
[872948.823045] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[872948.823046] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[872948.823047] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[872948.823048] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[872948.823049] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[872948.823168] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
[872950.821912] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Linux bobloblaw 3.15.0-1.fc20.x86_64 #1 SMP Sat Jun 14 11:22:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Attaching gpu crash dump as card0-error.071714-cwawak

Revision history for this message

In freedesktop.org Bugzilla #54226, Cwawak (cwawak) wrote on 2014-07-17:

#241

Created attachment 102991
card0-error.071714-cwawak - gpu dump

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-23:

#242

*** Bug 81673 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-23:

#243

*** Bug 81676 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-24:

#244

*** Bug 81710 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-07-28:

#245

*** Bug 81844 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-01:

#246

*** Bug 81990 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-07:

#247

*** Bug 82277 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-07:

#248

*** Bug 82301 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-10:

#249

*** Bug 82399 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-11:

#250

*** Bug 82451 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Mstahl (mstahl) wrote on 2014-08-14:

#251

*** Bug 82620 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-15:

#252

*** Bug 82631 has been marked as a duplicate of this bug. ***

Timo Aaltonen (tjaalton) on 2014-08-15

Changed in xserver-xorg-video-intel (Ubuntu):
assignee:	Timo Aaltonen (tjaalton) → nobody

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-15:

#253

*** Bug 82666 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-16:

#254

*** Bug 82691 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-21:

#255

*** Bug 82901 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-26:

#256

*** Bug 83098 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-08-27:

#257

*** Bug 83156 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-01:

#258

*** Bug 83326 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-04:

#259

*** Bug 83473 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-09:

#260

*** Bug 83661 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Manuel Widmer (m-widmer-d) wrote on 2014-09-09:

#261

Is there any ongoing development to fix this bug? I still see it with
Linux <hostname> 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

And the latest intel drivers as provided by intel linux graphics installer from
https://01.org/linuxgraphics/

Many times my system freezes few minutes after starting to watch a movie with vlc. I have my screen connected through a receiver (hdmi for audio + video) with the linux system. The probability for a freeze is higher when the hdmi receiver was powered of for some time before playing the movie than when I do a reboot and hdmi is always on.

I'm happy to help with crashdumps as far as I'm able to collect them.

Revision history for this message

In freedesktop.org Bugzilla #54226, Bartosz Brachaczek (b-brachaczek) wrote on 2014-09-09:

#262

(In reply to comment #183)

I recommend configuring i915.semaphores=0. I did it and it doesn't freeze anymore.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-10:

#263

*** Bug 83721 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-12:

#264

*** Bug 83783 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Frank Stephan (f-st) wrote on 2014-09-13:

#265

Hi Chris,

meanwhile my current kernel is 3.16.1-46.1.g90bc0f1
I'm wondering (after a reinstall) that the semaphore bug hasn't occured yet, which was the case before (after a fresh install).

This leads me to 4 definable possible reasons:

1. the named kernel revision somehow contains a fix for it. looking at the changes I could'nt get an affirmation to that assumption.
2. cgroup_memory=disabled has a relation to it. (That's why I removed it for now).
3. the BIOS settings (which could be different now) might have something to do with it.
4. I haven't installed KVM suppport yet.

I'll post again if I find a reproducible explanation.
Frank

Revision history for this message

In freedesktop.org Bugzilla #54226, Frank Stephan (f-st) wrote on 2014-09-13:

#266

2. of course I meant cgroup_disable=memory

Revision history for this message

In freedesktop.org Bugzilla #54226, Frank Stephan (f-st) wrote on 2014-09-13:

#267

Hi Chris,

OK, nothing of the above was the reason. In my case it's simply this:

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "TearFree" "true"
EndSection

I added it when the tearing scrolling through large webpages annoyed me.
As soon as I added it, the problems quickly started.

Selfmade problem.

Frank

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-13:

#268

(In reply to comment #189)
> Hi Chris,
>
> OK, nothing of the above was the reason. In my case it's simply this:
>
> /etc/X11/xorg.conf.d/20-intel.conf
>
> Section "Device"
> Identifier "Intel Graphics"
> Driver "intel"
> Option "TearFree" "true"
> EndSection
>
>
> I added it when the tearing scrolling through large webpages annoyed me.
> As soon as I added it, the problems quickly started.
>
> Selfmade problem.

Not really, https://bugs.freedesktop.org/show_bug.cgi?id=70764 tracks that this hang is more likely with TearFree (fundamentally the hang is still the same hardware issue, but it is interesting that TearFree has a higher chance of hitting it).

If you want to experiment:

http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=requests

should have an interesting fix, at least for trying to prevent the TearFree leading to the semaphore hang.

Revision history for this message

In freedesktop.org Bugzilla #54226, Arrowsmith (arrowsmith) wrote on 2014-09-15:

#269

What information is most useful for these repeating issues, as it just happened again:

Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on render ring
Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on blitter ring
Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140239] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140750] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on render ring
Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on blitter ring
Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
Sep 16 08:32:59 arrowsmithlap1 kernel: [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
Sep 16 08:33:01 arrowsmithlap1 kernel: [1182244.142445] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Sep 16 08:33:01 arrowsmithlap1 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

The only thing under my /etc/X11/xorg.conf.d/ is 00-keyboard.conf (system generated).

Do you want a copy of /sys/class/drm/card0/error every time?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-09-16:

#270

(In reply to comment #191)
> What information is most useful for these repeating issues, as it just
> happened again:
>
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> render ring
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> blitter ring

So long as it is the same event, there is no more information we need other than testing feedback for an eventual workaround.

Revision history for this message

In freedesktop.org Bugzilla #54226, Manuel Widmer (m-widmer-d) wrote on 2014-09-22:

#271

(In reply to comment #184)
> (In reply to comment #183)
>
> I recommend configuring i915.semaphores=0. I did it and it doesn't freeze
> anymore.

Meanwhile I tested both i915.semaphores=0 and i915.semaphores=1 neither of which did help in my case. But with i915.semaphores=0 my system became much more unstable and even crashed on its own after some days without stress on graphics (just ran some desktop apps like thunar or vlc for music only - no movies). With i915.semaphores=1 the system is at least stable (for some weeks) as long as I don't heavily use desktop applications.

Revision history for this message

In freedesktop.org Bugzilla #54226, Mika-kuoppala (mika-kuoppala) wrote on 2014-10-20:

#272

*** Bug 85194 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-10-22:

#273

*** Bug 85333 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-10-29:

#274

*** Bug 85609 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Josh Glover (jmglov) wrote on 2014-11-03:

#275

I am also experiencing this, on a Gentoo system running on a ThinkPad T440s. I'm not doing anything related to XBMC, simply using xrandr for multihead. The interesting thing is that DRI works fine on my laptop screen (glxgears reports 60fps, which is the refresh rate of my screen), but breaks when I move a window trying to use DRI (e.g. Chrome, glxgears) to the external monitor connected to the mini Display Port output.

I see this stuff in dmesg:

[ 3561.424762] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3561.424770] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 3561.424772] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 3561.424774] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 3561.424776] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 3561.424778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 3566.422957] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3571.425143] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3575.423680] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring

Seems like the same issue. I'm trying to downgrade X, mesa, et al., to try and get the system back in working order.

Revision history for this message

In freedesktop.org Bugzilla #54226, Daniel-ffwll (daniel-ffwll) wrote on 2014-11-04:

#276

*** Bug 79675 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-11-06:

#277

*** Bug 85972 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-11-09:

#278

*** Bug 86058 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Fritsch-b (fritsch-b) wrote on 2014-11-11:

#279

For those running Ubuntu, here is a build of a kernel based on 3.17.1 with the patches Chris Willson wants you to test:

- Those patches have other regressions (so be careful to only test your specific issue).

https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb
https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb

Those kernels are based on: https://bugs.freedesktop.org/show_bug.cgi?id=83677#c35

Beware, don't switch VTs.

Revision history for this message

In freedesktop.org Bugzilla #54226, Tomas Huryn (thuryn1) wrote on 2014-11-17:

#280

I've tryed the mentioned kernel on my Fedora 21 Beta and still hangs after for example Netbeans opens main window for the whole screen.

Revision history for this message

In freedesktop.org Bugzilla #54226, Smruti-patil (smruti-patil) wrote on 2014-11-19:

#281

*** Bug 86437 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-11-27:

#282

*** Bug 86765 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-11-29:

#283

*** Bug 86836 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-12-02:

#284

*** Bug 86925 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-12-25:

#285

*** Bug 87710 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2014-12-28:

#286

*** Bug 87776 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-01-17:

#287

*** Bug 88541 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Samuel Rakitničan (semirocket) wrote on 2015-01-20:

#288

*** Bug 88626 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-01-22:

#289

*** Bug 88723 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-01-26:

#290

*** Bug 88789 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-02-11:

#291

*** Bug 89078 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-02-24:

#292

*** Bug 89299 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-03-13:

#293

*** Bug 89570 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-03-19:

#294

*** Bug 89671 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-03-26:

#295

*** Bug 89774 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-03-26:

#296

*** Bug 89771 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-04-11:

#297

*** Bug 89981 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-04-20:

#298

*** Bug 90106 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-04-22:

#299

*** Bug 90146 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-05-03:

#300

*** Bug 90271 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-05-15:

#301

*** Bug 90473 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-06-04:

#302

*** Bug 90835 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, helios (martin-lichtvoll) wrote on 2015-06-04:

#303

Chris, you referred me to this bug as I reported

Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring

I skimmed through it and it appears that there are some patches to test? But I am not sure which ones these are. Can you or someone else enlighten me?

Also I note that I still use

Option "AccelMethod" "uxa"

and I have

martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
options i915 modeset=1 i915_enable_rc6=7

thus maximum energy saving. But according to powertop it never enters the highest sleep state anyway.

I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it.

And its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang? I didn´t have this before 4.1 kernel?

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-06-04:

#304

(In reply to Martin Steigerwald from comment #225)
> Chris, you referred me to this bug as I reported
>
> Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> semaphore on render ring
>
> I skimmed through it and it appears that there are some patches to test? But
> I am not sure which ones these are. Can you or someone else enlighten me?

There's likely a modest improvement in 4.2.

> Also I note that I still use
>
> Option "AccelMethod" "uxa"
>
> and I have
>
> martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> options i915 modeset=1 i915_enable_rc6=7

Fortuitously that dangerous option doesn't do anything for your kernel.

> ffffffff813a4b0e
> thus maximum energy saving. But according to powertop it never enters the
> highest sleep state anyway.
>
> I will remove the AccelMethod setting now and see whether it helps. If not,
> I downgrade to 4.1-rc4 for now, as issues have been at least much less
> frequent with it.

Purely circumstantial.

> And its really that for me 4.1-rc6 makes things much *worse*. I am typing
> this after a clean reboot and already got the GPU hang again. It happens
> about every few minutes. Are you really sure this is the same GPU hang? I
> didn´t have this before 4.1 kernel?

Yes.

Revision history for this message

In freedesktop.org Bugzilla #54226, helios (martin-lichtvoll) wrote on 2015-06-04:

#305

(In reply to Chris Wilson from comment #226)
> (In reply to Martin Steigerwald from comment #225)
> > Chris, you referred me to this bug as I reported
> >
> > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> > semaphore on render ring
> >
> > I skimmed through it and it appears that there are some patches to test? But
> > I am not sure which ones these are. Can you or someone else enlighten me?
>
> There's likely a modest improvement in 4.2.

Nice.

> > Also I note that I still use
> >
> > Option "AccelMethod" "uxa"
> >
> > and I have
> >
> > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> > options i915 modeset=1 i915_enable_rc6=7
>
> Fortuitously that dangerous option doesn't do anything for your kernel.

Well I found out why, I compiled i915 into the kernel it seems, at least I don´t have an i915 module in lsmod. But also i915.i915_enable_rc6=7 on kernel command line does not seem to have any effect. I removed the option.

> > ffffffff813a4b0e
> > thus maximum energy saving. But according to powertop it never enters the
> > highest sleep state anyway.
> >
> > I will remove the AccelMethod setting now and see whether it helps. If not,
> > I downgrade to 4.1-rc4 for now, as issues have been at least much less
> > frequent with it.
>
> Purely circumstantial.

Since using SNA I didn´t see a GPU hang so far. Too early to say for sure, but it seems something in UXA may have triggered it more easily.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-07-03:

#306

*** Bug 91212 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-08-17:

#307

*** Bug 91662 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-08-30:

#308

*** Bug 91810 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-09-02:

#309

*** Bug 91832 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Samuel Rakitničan (semirocket) wrote on 2015-09-25:

#310

(In reply to Chris Wilson from comment #192)
> (In reply to comment #191)
> > What information is most useful for these repeating issues, as it just
> > happened again:
> >
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> > render ring
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> > blitter ring
>
> So long as it is the same event, there is no more information we need other
> than testing feedback for an eventual workaround.

Is this the same bug?

$ journalctl -p 3 -b -1
Ruj 25 02:13:01 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:01 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
... [ repeated messages ] ...
Ruj 25 02:13:33 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:33 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
Ruj 25 02:13:34 crnigrom kernel: [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring
Ruj 25 02:13:34 crnigrom kernel: [drm:init_ring_common [i915]] *ERROR* render ring initialization failed ctl 00000000 (valid? 0) head 00000000 tail 00000000 start 00000000 [expected 00000000]
Ruj 25 02:13:34 crnigrom kernel: [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5
Ruj 25 02:13:34 crnigrom gnome-session[1823]: Unrecoverable failure in required component gnome-shell.desktop

After which gnome crashes with "Oh No Something Is Wrong" screen

$ uname -r
4.1.7-200.fc22.x86_64

Hardware i3-2100 CPU/GPU

This bug is going on already for a long long time, but at least computer is not hard freezing anymore, although gnome is crashing so any gtk applications running doing something stalls.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-09-28:

#311

*** Bug 92118 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-10-31:

#312

*** Bug 92739 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Arrowsmith (arrowsmith) wrote on 2015-11-02:

#313

FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was resolved by uninstalling various components, re-installing and updating them. I have a hunch (completely unproven) that it was a transparent bit-fail issue from the SSD. By un-installing and re-installing, the files were likely installed to a different location on the drive. It wasn't configuration, as I tried erasing, and even rolling back to defaults, with the problem still persisting. As it was almost daily, prior to uninstall, and hasn't happened since the install, this is all I can attribute it to.

HTH someone.

Revision history for this message

In freedesktop.org Bugzilla #54226, Jefbed (jefbed) wrote on 2015-11-06:

#314

Created attachment 119432
attachment-28908-0.html

I reported this bug from a system without an SSD. Recently, I have not
seen the kernel messages appear however--currently on linux 4.2.5.

On Sun, Nov 1, 2015 at 10:04 PM, <email address hidden> wrote:

> *Comment # 235 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c235>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was
> resolved by uninstalling various components, re-installing and updating them. I
> have a hunch (completely unproven) that it was a transparent bit-fail issue
> from the SSD. By un-installing and re-installing, the files were likely
> installed to a different location on the drive. It wasn't configuration, as I
> tried erasing, and even rolling back to defaults, with the problem still
> persisting. As it was almost daily, prior to uninstall, and hasn't happened
> since the install, this is all I can attribute it to.
>
> HTH someone.
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message

In freedesktop.org Bugzilla #54226, Arrowsmith (arrowsmith) wrote on 2015-11-06:

#315

(In reply to Jeffrey E. Bedard from comment #236)
> Created attachment 119432 [details]
> attachment-28908-0.html
>
> I reported this bug from a system without an SSD. Recently, I have not
> seen the kernel messages appear however--currently on linux 4.2.5.

Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an SSD. There was lots of clicking. Upgraded packages as they came in, but no change. Only the uninstall and re-install cleared the repeat button. :)

Revision history for this message

In freedesktop.org Bugzilla #54226, Jefbed (jefbed) wrote on 2015-11-06:

#316

Created attachment 119433
attachment-32271-0.html

I think this bug can be marked as closed with the latest linux/mesa/xorg
versions :)

On Fri, Nov 6, 2015 at 1:47 AM, <email address hidden> wrote:

> *Comment # 237 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c237>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> (In reply to Jeffrey E. Bedard from comment #236 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c236>)> Created attachment 119432 <https://bugs.freedesktop.org/attachment.cgi?id=119432> [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432&action=edit>
> > attachment-28908-0.html
> >
> > I reported this bug from a system without an SSD. Recently, I have not
> > seen the kernel messages appear however--currently on linux 4.2.5.
>
> Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an
> SSD. There was lots of clicking. Upgraded packages as they came in, but no
> change. Only the uninstall and re-install cleared the repeat button. :)
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-11-12:

#317

*** Bug 92927 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-11-21:

#318

*** Bug 93057 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Kurt Roeckx (kurt-roeckx) wrote on 2015-11-28:

#319

Created attachment 120189
error state with 4.2 kernel

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-12-10:

#320

*** Bug 93331 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-12-23:

#321

*** Bug 93482 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-12-24:

#322

*** Bug 93493 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2015-12-30:

#323

*** Bug 89524 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-01-05:

#324

*** Bug 93595 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-01-26:

#325

*** Bug 93876 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-02-19:

#326

*** Bug 93824 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-03-01:

#327

*** Bug 94057 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, sander eikelenboom (b-linux) wrote on 2016-03-01:

#328

Tuesday, March 1, 2016, 9:43:23 PM, you wrote:

> Chris Wilson changed bug 54226
> WhatRemovedAddedCC <email address hidden>
>

> Comment # 249 on bug 54226 from Chris Wilson
> *** Bug 94057 has been marked as a duplicate of this bug. ***
>

> You are receiving this mail because:
> You are on the CC list for the bug.
>

Sorry to say, but:
Is there a way to get off the CC-list of this slightly depressing kind of "catch-all" bug ?
It unfortunately doesn't seem to have be going anywhere for the last 3 to 4 years accept
for an endless stream of duplicates being appended.

--
Sander

Revision history for this message

In freedesktop.org Bugzilla #54226, Jani-nikula (jani-nikula) wrote on 2016-03-02:

#329

(In reply to Sander Eikelenboom from comment #250)
> Is there a way to get off the CC-list of this slightly depressing kind of
> "catch-all" bug ?

CC list is at the top right corner. Choose the address, tick "Remove selected CCs", and hit Save Changes.

I've done this for you now.

Revision history for this message

In freedesktop.org Bugzilla #54226, fjgaude (tanzen) wrote on 2016-03-02: Re: [Bug 1041790]

#330

Please take me off too.

frank

On 03/02/2016 03:33 AM, Jani-nikula wrote:
> (In reply to Sander Eikelenboom from comment #250)
>> Is there a way to get off the CC-list of this slightly depressing kind of
>> "catch-all" bug ?
> CC list is at the top right corner. Choose the address, tick "Remove
> selected CCs", and hit Save Changes.
>
> I've done this for you now.
>

penalvch (penalvch) on 2016-03-06

no longer affects:

sandybridge-meta (Ubuntu)

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-05-02:

#331

*** Bug 95238 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Samantham (samantham) wrote on 2016-06-24:

#332

Chris, I seem to be experiencing this bug in Linux 4.7rc3 on an x220 ThinkPad with Intel HD 3000 chipset. I was getting random full system freeze, non responsive over network.

The main messages before the crash were:
Jun 23 19:11:18 athena kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Jun 23 19:11:18 athena kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.7 [i915]] *ERROR* GT thread status wait timed out.

The original crash I haven't been able to reproduce easily but I CAN reproduce every time a full system lockup running the following intel-gpu-tools tests (I have not even close to run all the tests though) [**This may or may not be related to the original crash**]

gem_sync, subtest: bsd2-hang
drv_hangman, subtest: error-state-capture-bit

I do not know if these tests are helpful or related (maybe some are known to fail? not sure).
I have drm debugging turned on for when I ran those tests. (drm.debug=0x1e log_buf_len=1M)
I can post logs of the hangs associated with the two tests/subtests and run any other tests if you desire (with kernel drm debug on), I will wait for the issue to reappear with the drm debug on before posting that log though. By the number of similar bugs you may already have the CALL TRACE and non-debug level logs.

I know how to patch and am able to compile kernels to test. The bug effects me maybe once every 1 or 2 days. I use XOrg with Glamor. I have been seeing these crashes since 4.6 (maybe 4.5 or earlier not sure).

I know how to apply patches and am able to compile drm-next or any patches you have to see if this issue can be isolated. Thanks, sorry for the long response.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-08-11:

#333

*** Bug 97304 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-08-23:

#334

*** Bug 97451 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Yann-argotti (yann-argotti) wrote on 2016-10-17:

#335

*** Bug 98294 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2016-11-21:

#336

*** Bug 98807 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2017-03-17:

#337

*** Bug 100245 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Ricardo-vega-u (ricardo-vega-u) wrote on 2017-05-09:

#338

Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included

Revision history for this message

In freedesktop.org Bugzilla #54226, Samuel Rakitničan (semirocket) wrote on 2017-07-14:

#339

I doesn't seem to be getting mentioned Gnome crashes on my sandybridge anymore with mainline kernels, that is currently 4.11 and I think even with 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default centos 7 kernels I am definitely getting very frequent GPU crashes that brings down Gnome.

So it is either fixed for good, or it become much rarer. The issue I am/was experiencing happens when Gnome is running, it does not happen when only GDM is loaded. System load seems to not have effect on the bug triggering, seems to happen any time, on idle, or when machine is loaded.

Revision history for this message

In freedesktop.org Bugzilla #54226, Elizabethx-de-la-torre-mena (elizabethx-de-la-torre-mena) wrote on 2017-07-31:

#340

(In reply to samuel.rakitnican from comment #260)
> I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> anymore with mainline kernels, that is currently 4.11 and I think even with
> 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> centos 7 kernels I am definitely getting very frequent GPU crashes that
> brings down Gnome.
>
> So it is either fixed for good, or it become much rarer. The issue I am/was
> experiencing happens when Gnome is running, it does not happen when only GDM
> is loaded. System load seems to not have effect on the bug triggering, seems
> to happen any time, on idle, or when machine is loaded.
Hopefully, is fixed for good. I'm closing this bug, if problem arise with latest kernel versions https://www.kernel.org/ please open a NEW bug with HW and SW information, steps to reproduce and relevant logs.Thank you.

Bug Watch Updater (bug-watch-updater) on 2017-07-31

Changed in xserver-xorg-video-intel:
status:	In Progress → Fix Released

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2017-10-28:

#341

(In reply to Elizabeth from comment #261)
> (In reply to samuel.rakitnican from comment #260)
> > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> > anymore with mainline kernels, that is currently 4.11 and I think even with
> > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> > centos 7 kernels I am definitely getting very frequent GPU crashes that
> > brings down Gnome.
> >
> > So it is either fixed for good, or it become much rarer. The issue I am/was
> > experiencing happens when Gnome is running, it does not happen when only GDM
> > is loaded. System load seems to not have effect on the bug triggering, seems
> > to happen any time, on idle, or when machine is loaded.
> Hopefully, is fixed for good. I'm closing this bug, if problem arise with
> latest kernel versions https://www.kernel.org/ please open a NEW bug with HW
> and SW information, steps to reproduce and relevant logs.Thank you.

There was no fix for this HW issue.

Bug Watch Updater (bug-watch-updater) on 2017-10-28

Changed in xserver-xorg-video-intel:
status:	Fix Released → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #54226, Aaron-lu-a (aaron-lu-a) wrote on 2017-10-31:

#342

Created attachment 135173
gpu error file on 4.13.5-200.fc26.x86_64

This problem reappeared on 4.13.5-200.fc26.x86_64 last Friday.

[774249.632109] [drm] GPU HANG: ecode 6:0:0x85fffff8, in Xorg [696], reason: Hang on rcs0, action: reset
[774249.632110] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[774249.632111] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[774249.632111] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[774249.632111] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[774249.632112] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[774249.632172] drm/i915: Resetting chip after gpu hang

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2017-11-20:

#343

commit 0da715ee60774401bea00dc71fca6fd1096c734a
Author: Chris Wilson <email address hidden>
Date: Mon Nov 20 20:55:02 2017 +0000

drm/i915: Disable semaphores on Sandybridge

Bug Watch Updater (bug-watch-updater) on 2017-11-20

Changed in xserver-xorg-video-intel:
status:	Confirmed → Won't Fix

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2017-12-13:

#344

*** Bug 104243 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2017-12-17:

#345

*** Bug 104304 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2018-01-24:

#346

*** Bug 104772 has been marked as a duplicate of this bug. ***

Revision history for this message

In freedesktop.org Bugzilla #54226, Jani-saarinen-g (jani-saarinen-g) wrote on 2018-03-28:

#347

I will close this now.

Revision history for this message

In freedesktop.org Bugzilla #54226, Chris Wilson (ickle) wrote on 2018-04-18:

#348

*** Bug 106119 has been marked as a duplicate of this bug. ***

Ubuntu
xserver-xorg-video-intel package

[snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
xf86-video-intel	Won't Fix	Medium	freedesktop-bugs #54226
linux (Ubuntu)	Incomplete	Low	Unassigned
xserver-xorg-video-intel (Ubuntu)	Triaged	High	Unassigned

Ubuntuxserver-xorg-video-intel package

[snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
xserver-xorg-video-intel package