[snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

Bug #1041790 reported by Rocko
936
This bug affects 232 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Won't Fix
Medium
linux (Ubuntu)
Incomplete
Low
Unassigned
xserver-xorg-video-intel (Ubuntu)
Triaged
High
Unassigned

Bug Description

X locks up periodically for a 2 to ten seconds at a time and this crash log gets generated. It's significantly more than several times a day but not quite continuous. If you indeed have this bug, that should stop the lockups from happening. Irrespective, please file a new bug report so your hardware may be tracked.

WORKAROUND: Edit your /etc/default/grub from:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0"

run the following and reboot:
sudo update-grub

The side effects of this is rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption.

ProblemType: Crash
DistroRelease: Ubuntu 12.10
Package: xserver-xorg-video-intel 2:2.20.3-0ubuntu1
Uname: Linux 3.6.0-rc3-git-20120826.1015 x86_64
ApportVersion: 2.5.1-0ubuntu2
Architecture: amd64
Chipset: sandybridge-m-gt2
Date: Sun Aug 26 16:06:32 2012
DistroCodename: quantal
DistroVariant: ubuntu
DuplicateSignature: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001 Ubuntu 12.10
EcryptfsInUse: Yes
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GpuHangFrequency: Continuously
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120724.2)
InterpreterPath: /usr/bin/python3.2mu
MachineType: Dell Inc. Dell System XPS L502X
ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.6.0-rc3-git-20120826.1015 root=UUID=135c8090-427c-460a-909d-eff262cd44b6 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.7+1ubuntu3
 libdrm2 2.4.38-0ubuntu2
 xserver-xorg-video-intel 2:2.20.3-0ubuntu1
SourcePackage: xserver-xorg-video-intel
Title: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001
UdevDb: Error: [Errno 2] No such file or directory: 'udevadm'
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 05/29/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A11
dmi.board.name: 0NJT03
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnDellInc.:bvrA11:bd05/29/2012:svnDellInc.:pnDellSystemXPSL502X:pvr:rvnDellInc.:rn0NJT03:rvrA00:cvnDellInc.:ct8:cvr0.1:
dmi.product.name: Dell System XPS L502X
dmi.sys.vendor: Dell Inc.

Revision history for this message
Rocko (rockorequin) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
tags: removed: need-duplicate-check
Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 66289
dmesg output

From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle

$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)

Revision history for this message
Bryce Harrington (bryce) wrote :

Does switching from UXA to SNA help?

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Rocko (rockorequin) wrote :

Ha! I thought SNA was turned on by default, but it isn't, is it. Is it possible to switch between SNA and UXB on when X is running, or to tell which one is being used?

I've turned SNA on via AccelMethod in xorg.conf now, so I'll see if the freezes go away.

Since I restarted X with SNA, the titlebar of windows that don't have the focus change their background to light grey. The window buttons and the title text stay the same, which looks weird. Is that something that can be configured?

Revision history for this message
Rocko (rockorequin) wrote :

Is SNA turned on by default now? I had a couple of hours freeze-free with it the other day, but removed my xorg.conf shortly afterwards because the white titlebars and glitchy 3D graphics were annoying, and also because with SNA enabled the backlight didn't come on after the screensaver turned it off. But now the titlebars are white again.

Revision history for this message
Bryce Harrington (bryce) wrote :

SNA is not the default for quantal. No, there is not a way to toggle between UXA and SNA at run time. /var/log/Xorg.0.log is where to look to see which acceleration tech is active.

If I understand your testing feedback, you do believe SNA helps eliminate the freeze behaviors, and thus we can consider UXA the likely source of the bug.

Revision history for this message
Rocko (rockorequin) wrote :

Yes, I think the bug doesn't happen with SNA whereas it occurs pretty regularly with UXA. I've been using SNA for a couple of days now since it became the default on my system. Does X now look for other xorg.conf files? I created one called /etc/X11/xorg.conf-intel-sna and symlinked to it to test out SNA; then I deleted the symlink, and a day or two later suddenly SNA became the default.

Revision history for this message
Rocko (rockorequin) wrote :

Ah, I am using xorg-edgers. Perhaps they are trying out SNA as the default there.

Revision history for this message
Rocko (rockorequin) wrote :

I've been using SNA for a couple of weeks now, and it doesn't seem to suffer from this particular bug.

The bug still occurs in the latest xf86-video-intel driver from git (as of 27/9/12), though. It generally occurs when focus changes, eg when a menu or popup window is opening.

Revision history for this message
Ursula Junque (ursinha) wrote :

Hi Bryce, I've been getting this error every once in a while and when it happens, apport tries to report the bug like ten times. Let me know if I can provide more information about it.

Cheers,

Revision history for this message
Ursula Junque (ursinha) wrote :

I've filed another bug with apport and all my files are attached there: bug 1059737, just in case they're not duplicates.

Revision history for this message
Paul Smedley (paul-smedley) wrote :

Switching from UXA to SNA fixes this for me too, on an Asus Zenbook UX31E

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I am hitting this bug. Can somebody please explain how to check if I am using UXA or SNA and how to switch between the two? If SNA helps, and I am using UXA I'd like to try SNA.

Revision history for this message
Rocko (rockorequin) wrote :

@Dmitrijs: To find which method is being used, do:

grep AccelMethod /var/log/Xorg.0.log

I find also that the titlebars of non-focused windows are often light grey instead of black when using SNA.

And to change methods, put this in your xorg.conf to set the acceleration method and then restart X:

Section "Device"
 Identifier "Card0"
 Driver "intel"
 Option "AccelMethod" "sna" # or uxa, as appropriate
EndSection

Revision history for this message
In , Chris Wilson (ickle) wrote :

If you can easily reproduce this error, can you please build a kernel using http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=xv-overlay which has some revised memory barriers.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Can you help me to build rpm for fedora?

Revision history for this message
Rocko (rockorequin) wrote :

I still experience this bug, even with the latest intel driver from git, xf86-video-intel-2.6.99.902. I would use SNA but it has an even more annoying bug after the screen saver unlocks where unity just shows me a black screen and mouse cursor, and I have to physically restart unity to get it working again.

Revision history for this message
In , Chris Wilson (ickle) wrote :

On second thoughts, I think this should be fixed by the slight robustification in more recent hangcheck.

Please try the latest kernel for your distribution (should be 3.6.7 atm) and reopen if it still occurs.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

I am use Fedora 18 with 3.6.7-5.fc18.i686 kernel and in dmesg output still exists message:
[22826.654365] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22826.654369] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
In , Chris Wilson (ickle) wrote :

That is not the same bug, so you need to attach a fresh set of debug info (please remember the i915_error_state)...

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Please, explain how get needed debug info. Thanks.

Revision history for this message
In , Chris Wilson (ickle) wrote :

http://intellinuxgraphics.org/how_to_report_bug.html

From which we need the i915_error_state, so

$ sudo mount -tdebugfs debug /sys/kernel/debug
$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 70518
i915_error_state

Revision history for this message
In , Chris Wilson (ickle) wrote :

Looks that corresponds to the bug

commit 1c8b46fc8c865189f562c9ab163d63863759712f
Author: Chris Wilson <email address hidden>
Date: Wed Nov 14 09:15:14 2012 +0000

    drm/i915: Use LRI to update the semaphore registers

    The bspec was recently updated to remove the ability to update the
    semaphore using the MI_SEMAPHORE_BOX command, the ability to wait upon
    the semaphore value remained. Instead the advice is to update the
    register using the MI_LOAD_REGISTER_IMM command. In cursory testing,
    semaphores continue to function - the question is whether this fixes
    some of the deadlocks where the semaphore registers contained stale
    values?

hopefully addresses.

That patch is only available on drm-intel-next at the moment, which is available either at http://cgit.freedesktop.org/~danvet/drm-intel or available as drm-intel-experimental in the ubuntu kernel-ppa.

Karma Dorje (taaroa)
tags: added: raring
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

I've uploaded -intel 2.20.14 to raring, so please test with both UXA and SNA to see if either or both work.

Rocko: I can't reproduce your bug with SNA (with this new version anyway), works fine on my T420s. 2.6.99.902 sounds old too :)

Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Rocko (rockorequin) wrote :

Yes, I've been running v2.20.14 from git (using SNA, not UXA) for a few days on Quantal and so far I hasn't seen that other bug I mentioned - it hasn't fatally locked up after the screensaver kicks in. However, it has experienced *this* particular bug a few times, ie where the screen locks but I can fix it by switching to a tty terminal and back.

Re 2.6.99.902, I think I probably did a git tag command and looked at the last entry, which is definitely old. I would have been running a pre-v2.20.14 version at the time.

Revision history for this message
Karma Dorje (taaroa) wrote :

@Timo Aaltonen
SNA — ok. looks like some sort of regression in the driver.

Revision history for this message
Bryce Harrington (bryce) wrote :

Rocko, thanks for testing the git DDX. Next time you get one of these freezes can you please collect a fresh i915_error_state, dmesg, and Xorg.0.log?

Sounds like this bug should go upstream.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Problem repeated with patched kernel.

[118637.439016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[118637.439020] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[mikhail@localhost ~]$ uname -a
Linux localhost.localdomain 3.6.9-4.1.fc18.i686.PAE #1 SMP Wed Dec 5 15:16:33 UTC 2012 i686 i686 i386 GNU/Linux
[mikhail@localhost ~]$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state
[sudo] password for mikhail:
[mikhail@localhost ~]$

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71192
i915_error_state (new)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state-8
cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

What it mean??

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71199
i915_error_state (new)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71200
dmesg output (new)

Revision history for this message
In , Chris Wilson (ickle) wrote :

Lalalalala.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 58057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 58212 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

We can confirm the synopsis by disabling semaphores (i915.semaphore=0), but can we also test whether this is an rc6 side-effect (i915.i915_enable_rc6-0)?

Revision history for this message
In , Chris Wilson (ickle) wrote :

Also maybe time for ' git revert 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f'

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71549
i915_error_state

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 71630
dmesg

Revision history for this message
bugbot (bugbot) wrote :

We're closing this bug since there has not been a response from the original reporter. However, the issue still exists please feel free to reopen with the requested information. If you're not the original reporter, we'd prefer you file a new bug report.

Some tips:

  * Report X.org bugs via the command: `ubuntu-bug xorg`

  * Test against the latest development Ubuntu. http://cdimage.ubuntu.com/daily-live/
    Bugs marked as affecting the development version tend to get priority attention.

  * The `xdiagnose` utility has functionality for enabling debugging and
    analyzing a few common X problems.

  * Tag your bugs with the Ubuntu versions you have reproduced the issue in.

  * See https://wiki.ubuntu.com/X/Reporting for tips on writing good bug reports.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Expired
Adam Conrad (adconrad)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Expired → Confirmed
Timo Aaltonen (tjaalton)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: nobody → Timo Aaltonen (tjaalton)
status: Confirmed → Incomplete
Revision history for this message
Rocko (rockorequin) wrote :

I've seen it happen with kernel 3.8-rc2 and SNA using the latest intel driver from git.

The hang isn't always the same:

* Sometimes it locks the computer up completely, requiring a hard reboot.

* Sometimes it locks X, but CTRL-ALT-F1 and back unlocks it.

* Sometimes it resolves itself without me even noticing that it has happened, other than that there may be some corruption in the tabs' title text in chrome and window movement has become somewhat jerky instead of the normal smooth movement you get after restarting X.

Next time it happens I'll see if I can recover any information.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #24)
> Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> /sys/modules/i915/parameters/semaphores) to prevent this hang.

What are the consequences?

> The only interesting patch I can suggest atm is
>
> commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> Author: Ben Widawsky <email address hidden>
> Date: Wed Sep 26 10:34:01 2012 -0700
>
> drm/i915: Workaround to bump rc6 voltage to 450
>
> BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> buggy BIOSen may not be doing this, so we correct it for them. Ideally
> customers should update the BIOS as only it would know the optimal
> values for the platform, so we leave that fact as a DRM_ERROR for the
> user to see.
>
> in 3.8-rc1 or look for a BIOS update.

I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
How to check problem persists or not?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #27)
> (In reply to comment #24)
> > Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 >
> > /sys/modules/i915/parameters/semaphores) to prevent this hang.
>
> What are the consequences?

Rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption.

Revision history for this message
Adam Conrad (adconrad) wrote :

Timo: I've never had it completely hang the machine, but I've also not been patient enough to sit around and wait to see if X will eventually recover on its own, I always do a VT switch out and back (and get welcomed by an apport dialog)

Has happened several times today. Will be upgrading to 3.8.0-rc soon to see if that helps, but the comment above me doesn't give much hope.

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

ok, so the hang I'm seeing is this same one, not that frequent though

Revision history for this message
Roman Yepishev (rye) wrote :

I've started getting this failure after migrating to Raring and I get the recoverable lockups 3 times a day or even more with the same GPU lockup message in dmesg. Apport has proposed [sandybridge-m-gt2+] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001 as a bug title.

I am using 3.8.0-0-generic kernel.

There is no AccelMethod string in Xorg log (so it should be uxa), will check whether this happens with sna.

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

(In reply to comment #27)

> > The only interesting patch I can suggest atm is
> >
> > commit 31643d54a739382626c27c0f2a12b3bbc22d1a38
> > Author: Ben Widawsky <email address hidden>
> > Date: Wed Sep 26 10:34:01 2012 -0700
> >
> > drm/i915: Workaround to bump rc6 voltage to 450
> >
> > BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or
> > buggy BIOSen may not be doing this, so we correct it for them. Ideally
> > customers should update the BIOS as only it would know the optimal
> > values for the platform, so we leave that fact as a DRM_ERROR for the
> > user to see.
> >
> > in 3.8-rc1 or look for a BIOS update.
>
> I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012
> ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip
> How to check problem persists or not?

The easiest way is to apply the patch and look for DRM_DEBUG_DRIVER messages. This is unlikely to fix the problem, but also can't hurt.

We've only assumed new BIOS will fix the problem, but who knows. Especially if it's a 3rd party BIOS.

Timo Aaltonen (tjaalton)
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Medium → High
status: Incomplete → Triaged
Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

I'm seeing this on an X220 running up to date raring. Happens a few times a day. .

alan@deep-thought:~$ grep SNA /var/log/Xorg.0.log
[ 12.175] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.20.19-0ubuntu1 (Timo Aaltonen <email address hidden>)
[ 13.744] (II) intel(0): SNA initialized with SandyBridge backend
alan@deep-thought:~$ grep UXA /var/log/Xorg.0.log
alan@deep-thought:~$ grep AccelMethod /var/log/Xorg.0.log
alan@deep-thought:~$

Pretty sure I have the very latest BIOS from Lenovo as I recently installed Windows on the machine to do exactly that.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

try 'echo 0 > /sys/module/i915/parameters/semaphores' to see if it stops the hangs.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

..and restart X (logging out should do it when lightdm is used)

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

My first of this type happened today, even though I've already been using SNA before since August or so. If this now starts to be recurring (since I also started to have bug #1102390 only yesterday), I can try disabling the semaphores.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 59786 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 73560
write mbox regs twice on snb

Another piece of magic which might help. Please test this patch and the one from Chris ("Read back semaphore mboxes after update") separately and report back whether anything changes.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 73577
write mbox regs twice on snb, v2

Now actually the right patch attached, the old one didn't compile ...

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

The workaround in comment #58 does eliminate the GPU lockups. Which I notice when I boot in the morning and forget to do that workaround and get a lockup part way through the day.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Which patch I need applied for fix this issue?

I see that patches from comment 26 and 32 have similar logic...

@@ -596,6 +606,16 @@ gen6_add_request(struct intel_ring_buffer *ring)
  intel_ring_emit(ring, MI_USER_INTERRUPT);
  intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ read_mboxes(ring, mbox1_reg, 1024);
+ read_mboxes(ring, mbox2_reg, 1028);
+ intel_ring_advance(ring);
+ }
+
  return 0;
 }

@@ -598,6 +598,19 @@ gen6_add_request(struct intel_ring_buffer *ring)
  intel_ring_emit(ring, MI_USER_INTERRUPT);
  intel_ring_advance(ring);

+ if (IS_GEN6(ring->dev)) {
+ ret = intel_ring_begin(ring, 6);
+ if (ret)
+ return ret;
+
+ mbox1_reg = ring->signal_mbox[0];
+ mbox2_reg = ring->signal_mbox[1];
+
+ update_mboxes(ring, mbox1_reg);
+ update_mboxes(ring, mbox2_reg);
+ intel_ring_advance(ring);
+ }
+
  return 0;
 }

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

> --- Comment #33 from <email address hidden> ---
> Which patch I need applied for fix this issue?

We can't reproduce the bug, so those are just patches to test
different ideas. Please test them both each individually (i.e. remove
the first before testing the 2nd patch) and the report whether
anything changes (i.e. harder or easier for you to hit the issue).

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Can't compile kernel with patch above:

drivers/gpu/drm/i915/intel_ringbuffer.c: In function 'gen6_add_request':
drivers/gpu/drm/i915/intel_ringbuffer.c:611:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
drivers/gpu/drm/i915/intel_ringbuffer.c:612:3: error: too few arguments to function 'update_mboxes'
drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here
make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1
make[3]: *** [drivers/gpu/drm/i915] Error 2
make[2]: *** [drivers/gpu/drm] Error 2
make[1]: *** [drivers/gpu] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
In , Norman Yarvin (yarvin-yarchive) wrote :

I'm seeing this bug, or something like it, on an older chip (G965, desktop version):

Feb 19 22:05:56 muttonhead kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Feb 19 22:05:56 muttonhead kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Feb 19 22:05:56 muttonhead kernel: [drm:kick_ring] *ERROR* Kicking stuck wait on render ring
Feb 19 22:05:57 muttonhead kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

after which the mouse pointer sticks in one spot (with most other things working), and then when I shut down X, the console fails to appear, requiring a reboot. Not knowing that the given file path was under /sys/kernel, I failed to capture the error state, but will do so next time this happens (which is maybe every other day). This is with a 3.7 kernel (Gentoo); before 3.7, the driver was stable. I don't know what the 'generation' numbers in the driver mean, but I'm guessing that generation 6 is later, so many of the suggested fixes would not make any difference on this machine.

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #42)
> I'm seeing this bug, or something like it, on an older chip (G965, desktop
> version):

Good news, it is not this bug. Please make sure you have the latest stable driver (a gentoo user not using 3.8 already! ;-) and latest xf86-video-intel, then file a fresh bug report, attaching your dmesg, Xorg.0.log and i915_error_state.

Revision history for this message
In , gneman (luis6674) wrote :

I subscribed to this bug because I was seeing this hang too. It happened randomly several times, without a specific cause or way to reproduce it.

This was around December, and it happened maybe 4-5 times along a month. The GPU would hang with that error in dmesg, and everything continued to work, though very slowly.

However, I must say that since then it didn't happen again for almost 2 months maybe. I use Arch Linux, which means I always update to the latest stable packages of everything, so it seems that for me it got solved at some point (or at least much harder to reproduce).

This is an Ironlake / HD 2000 based Dell laptop. I did update the BIOS when I found this bug report, but it didn't solve the problem, the hang happened after updating it.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 61310 has been marked as a duplicate of this bug. ***

Revision history for this message
Laura Czajkowski (czajkowski) wrote : Re: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001

This bug has started to affect me on 13.04

Chris Wilson (ickle)
summary: - [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001
+ [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround
+ i915.semaphores=0
Bryce Harrington (bryce)
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 75818
i915_error_state (kernel 3.8.1 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Today Fedora 18 updated kernel to 3.8.1 and message "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung" still here. Please look at my last log. Any updates?

Revision history for this message
Pete Graner (pgraner) wrote :

GPU is locking up numerous times a day, just started on Raring. Apport is telling me my bug has already been reported and points me at https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1100360 which is a dupe of this bug.

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

@pgraner I have the same GPU lockup frequently, daily. I sometimes forget to set the workaround i915.semaphores to 0, but when I do the lockups go away. I have now set it in the /etc/default/grub as GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0" and update-grub...

Revision history for this message
Robin Munn (rmunn) wrote :

I've been experiencing a recurring lockup that may or may not be this bug; what I can add to the discussion is that if it is this bug, it appears to be Linux-only. My company bought several of its developers identical laptops (Dell Latitude E6520 models). The other developers who are using these laptops are using Windows, and I'm the only one using Linux. I've been experiencing these hangups 3-4 times per week, on average, over the past year. But when I asked the other developers in my company with Latitude E6520s if they'd experienced random-seeming hangups, they all said no. Which means that either I got the only laptop with a defective chip, or else the Windows driver doesn't experience this problem.

I hope this helps in some way; sorry I can't be of more technical help, but GPUs are outside my area of expertise.

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

This looks weird to me:

0x00005a58: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a5c: 0x00012044: dword 1
0x00005a60: 0x0043b625: dword 2
0x00005a64: 0x11000001: MI_LOAD_REGISTER_IMM
0x00005a68: 0x00022040: dword 1
0x00005a6c: 0x0043b625: dword 2
0x00005a70: 0x10800001: MI_STORE_DATA_INDEX
0x00005a74: 0x00000080: index
0x00005a78: 0x0043b625: dword
0x00005a7c: 0x01000000: MI_USER_INTERRUPT
0x00005a80: 0x0b160001: MI_SEMAPHORE_MBOX compare semaphore, use compare reg 2
0x00005a84: 0x0043b625: value
0x00005a88: 0x00000000: address
0x00005a8c: 0x00000000: MI_NOOP

Chris?

Revision history for this message
In , Chris Wilson (ickle) wrote :

Weird? Did you just forget about that the hw does a strictly greater-than comparison?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #47)
> Today Fedora 18 updated kernel to 3.8.1 and message
> "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung"
> still here. Please look at my last log. Any updates?

We're still waiting upon you apply patches and report.

Revision history for this message
Stan Schymanski (schymans) wrote :

I experienced a permanent GPU lock-up today, the screen was unresponsive for half an hour before I decided to reboot by pressing Alt+SysRq-REISUB. The below is the the relevant section of kern.log before shutting down.

Revision history for this message
Stan Schymanski (schymans) wrote :

Just to add some background information to my previous post:
I have been having random hangups for more than a year now, but only since the beginning of this week, I have been getting the GPU hang-up error messages. Today, it coincided with one of the complete lock-ups, that could only be resolved by Alt+SysRq-REISUB. I will now try the approach proposed in Comment #81, hoping for the best. I hope that I am not seeing two different issues here. The GPU hang-up error message has been reoccurring without obvious hang-ups in the past few days, while my complete hang-ups have been happening randomly, sometimes with the possibility to reboot as outlined above, sometimes without (only the power button would do) and sometimes the computer turned itself off without any interaction from my side...
If anyone sees anything new in the kern.log section I pasted, please let me know.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

*** Bug 61925 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76196
i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on snb, v2)

I am applied patch "write mbox regs twice on snb, v2" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76208
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am also applied patch "Read back semaphore mboxes after update" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #52)
> Created attachment 76196 [details]
> i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> I am applied patch "write mbox regs twice on snb, v2" but still have problem
> [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

0x00052cc8: 0x18800100: MI_BATCH_BUFFER_START
0x00052ccc: 0x0d59b000: dword 1
0x00052cd0: 0x13000001: MI_FLUSH_DW post_sync_op='no write'
0x00052cd4: 0x000000c4: address
0x00052cd8: 0x00000000: dword
0x00052cdc: 0x00000000: MI_NOOP
0x00052ce0: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052ce4: 0x00002044: dword 1
0x00052ce8: 0x0007a582: dword 2
0x00052cec: 0x11000001: MI_LOAD_REGISTER_IMM
0x00052cf0: 0x00012040: dword 1
0x00052cf4: 0x0007a582: dword 2
0x00052cf8: 0x10800001: MI_STORE_DATA_INDEX
0x00052cfc: 0x00000080: index
0x00052d00: 0x0007a582: dword
0x00052d04: 0x01000000: MI_USER_INTERRUPT

That's only a single LRI per semaphore, the patch wasn't tested.

Revision history for this message
In , Chris Wilson (ickle) wrote :

I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your patched version.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76215
kernel.spec

(In reply to comment #55)
> I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your
> patched version.

It's impossible. Distro kernel is 3.8.1-201.fc18.i686.PAE. 3.8.1-202.fc18.i686.PAE and 3.8.1-203.fc18.i686.PAE is kernels patched by me.

You can sure if look at my build spec file.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76239
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

I am sorry. Seems I forgot add "ApplyPatch" to spec. I am rebuild kernel with "0001-drm-i915-Read-back-semaphore-mboxes-after-updating-t.patch" patch, but seems problem still here.

Does it make sense to check the "0001-write-mbox-regs-twice-on-gen6.patch" patch?

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76243
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76261
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

"write mbox regs twice on snb, v2" patch also not solve problem.

[ 1399.270341] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1399.270345] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1399.277331] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76293
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 76448
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)

Any updates?

Revision history for this message
Stan Schymanski (schymans) wrote :

GPU -lockup messages started re-appearing, despite an Upgrade to Kernel 3.8.2 in Ubuntu 12.10.

Revision history for this message
Chris Wilson (ickle) wrote :

#81 is only applicable to this bug, and is the correct workaround. Not all GPU hangs are this bug, most tend to be mesa...

Revision history for this message
Stan Schymanski (schymans) wrote :

Sorry, I haven't been able to attach any of the crash reports, because I get the message that the crash relates to Bug #1059737 and I don't get an opportunity to upload the report. Bug #1059737 is supposed to be a duplicate of this bug, which is why I started posting here. How can I verify whether it is this bug or a mesa bug?

Revision history for this message
Chris Wilson (ickle) wrote :

The most brutal way would be to remove /usr/lib/*/dri/i965_dri.so and so force it to use software.

Revision history for this message
Stan Schymanski (schymans) wrote :

Before I do this, could you confirm whether my new report of Bug #1154591 is related to this here or not? Should I still try removing i965_dri.so, or has this become irrelevant?

Revision history for this message
Chris Wilson (ickle) wrote :

Easy answer: probably. Impossible to tell without the error-state.

Revision history for this message
Bilal Shahid (s9iper1) wrote :

here is another bug and i frequently having this hang recently it hanged again
and i didnt report the new ones bug it open the link into the previous bug 1153202

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 62443 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

As a workaround, this

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

    drm/i915: Resurrect ring kicking for semaphores, selectively

should improve the recovery from the hangs.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in sandybridge-meta (Ubuntu):
status: New → Confirmed
Revision history for this message
In , cbrnr (cbrnr) wrote :

OK, I've been experiencing this bug from time to time on my Arch Linux box. No apparent reason, last time it happened I was watching a Youtube video, and it also seems to happen more often when I'm running VirtualBox. However, this might just be a coincidence.

Revision history for this message
In , Longerdev (longerdev) wrote :

I have this bug too.

Gentoo 64bit
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Samsung Electronics Co Ltd Device c0a0
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f5c00000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at e000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915

Kernel 3.8.0 gentoo-sources

I try patch a24a11e6b4e96bca817f854e0ffcce75d3eddd13, but nothing change.
Mar 31 15:14:37 localhost kernel: [64379.291736] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 31 15:14:37 localhost kernel: [64379.291742] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Revision history for this message
Spencer Mabrito (xoptics) wrote :

Just some further confirmation of this issue.

On a Sandy Bridge i5 Dell Latitude e6420 laptop with Intel HD 4000 integrated graphics. On 3.5.0-26-generic I experienced GPU hangs once or twice an hour (fixed themselves after a few seconds) and full lockups 2 to 5 times per day, depending on what I was doing (have to hard reboot the machine to recover). Having firefox open seemed to be a risk factor, and if FF had loaded flashplayer, I expected a full lockup at any moment.

As described in the following bug, which is a similar issue... https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1041790 ... I added the following to my /etc/default/grub options:

i915.semaphores=0

This decreased the frequency and severity of lockups/crashes, but they were still there. Upon further investigation, I found this bug to be the actual issue (my syslog showing the same 3 lines as the others in this thread), and have reverted to 3.5.0-25-generic and have been running without any issues for a day or two now. I still have i915-semaphores=0 in my grub boot options but don't think I still need that...

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

Created attachment 77475
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

(In reply to comment #61)
> Created attachment 76448 [details]
> i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> snb, v2)
>
> Any updates?

Mikhail,

Could you please try patch:
[PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Bryce Harrington (bryce)
description: updated
description: updated
description: updated
description: updated
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Patch is also included in latest drm-intel-nightly, linux-next. So you can test it by grabbing a distro-build of one of those.

Bryce Harrington (bryce)
description: updated
Bryce Harrington (bryce)
Changed in linux (Ubuntu):
status: Invalid → New
Revision history for this message
Bryce Harrington (bryce) wrote :

@kernel team - comment #111 has a test patch worth having a kernel built for that folks can test.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Seth Forshee (sforshee) wrote :

I've built a quantal kernel with the patch mentioned on the freedesktop.org bugzilla. Please test to see if it resolves the issue.

http://people.canonical.com/~sforshee/lp1041790/linux-3.5.0-27.46~lp1041790v201304052041/

Compiler issues are preventing me from building for raring atm, but I'll post a raring build as soon as I am able to do so.

Revision history for this message
In , Brian Ealdwine (eode) wrote :

Occurred while playing vessel. Never ran into the problem on 12.10.
I'm available to provide info.

Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #67)
> (In reply to comment #61)
> > Created attachment 76448 [details]
> > i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on
> > snb, v2)
> >
> > Any updates?
>
> Mikhail,
>
> Could you please try patch:
> [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively

Hm, seems better but problem still here

[59120.008798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[59120.008802] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[59120.012173] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 77692
i915_error_state (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 77693
dmesg (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)

Revision history for this message
In , Chris Wilson (ickle) wrote :

\o/ It kicked the right ring.

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

(In reply to comment #72)
> \o/ It kicked the right ring.

So is this normal?

Revision history for this message
In , Chris Wilson (ickle) wrote :

It's the expected 'improved' recovery behaviour for this bug.

Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 63542 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Chris, what is the upstream status for the ring kicker patch? Is that likely to get incorporated upstream, or do you feel it needs further polish before it's ready? Would this patch incur some risk of regressions in other areas were it be backported for inclusion in Ubuntu?

tags: added: kernel-handoff-graphics
Revision history for this message
Chris Wilson (ickle) wrote :

The ring kicker is upstream, but it is not a fix. It is just a ligherweight reset mechanism that should prevent a cascade of errors and corruption - but the user still suffers the 3s stall before the hang is detected.

However, the patch seems to be solid and has survived its trial by fire.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #76)
> Chris, what is the upstream status for the ring kicker patch? Is that
> likely to get incorporated upstream, or do you feel it needs further polish
> before it's ready? Would this patch incur some risk of regressions in other
> areas were it be backported for inclusion in Ubuntu?

Merged for 3.10 as

commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13
Author: Chris Wilson <email address hidden>
Date: Thu Mar 14 17:52:05 2013 +0200

    drm/i915: Resurrect ring kicking for semaphores, selectively

Nothing else planned for now, but I think we can just keep this bug here open in case we stumble across a new idea. And it seems to be good honey to attrack all the me,too reports ;-)

Revision history for this message
In , Tomwij-1 (tomwij-1) wrote :

(In reply to comment #65)
> Kernel 3.8.0 gentoo-sources

Did you report this at the Gentoo Bugzilla?

When you do, please attach /debug/dri/0/i915_error_state

Revision history for this message
In , Longerdev (longerdev) wrote :

>Did you report this at the Gentoo Bugzilla?

>When you do, please attach /debug/dri/0/i915_error_state

Now no report in gentoo bugzilla (so as in kernel they no have patches intel drivers). But now with it patch, I can't repeat bug 2 weeks on kernel 3.9-rc6. But I no test with blender (when I try use blender, GPU hung reapeted for 1-5 minutes).

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 78692
i915_error_state (kernel 3.9 Fedora)

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

Created attachment 78693
i915_error_state (kernel 3.9 Fedora)

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 64094 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Freedesktop-l (freedesktop-l) wrote :

Created attachment 79704
i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430

I can reproduce this bug every time I try to quickly drag a Chrome window with a YouTube movie to a secondary monitor connected to my laptop Dell E6430. It is very annoying. Tested on latest kernel 3.10-rc2.

I can give you any additional information you want, test patches, etc. Just please try to fix this :)

Revision history for this message
In , Freedesktop-l (freedesktop-l) wrote :

(In reply to comment #84)
> Created attachment 79704 [details]
> i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430
>
> I can reproduce this bug every time I try to quickly drag a Chrome window
> with a YouTube movie to a secondary monitor connected to my laptop Dell
> E6430.

One more information - you need to enable "Override software rendering list" in chrome://flags

Revision history for this message
Dac Chartrand (conner-bw) wrote :

Came here from bug #1129679 which more accurately describes my problem. Apparently that bug is a this is a dupe of this bug which IMHO is not really the same, but whatever...

I'm running 13.03 on Lenovo X220

Has anyone looked at?
https://bugs.freedesktop.org/show_bug.cgi?id=47535#c14

Did this patch get lost along the way?

Regards,

Revision history for this message
In , Cwawak (cwawak) wrote :

Created attachment 79979
i915_error_state - 3.9.2-201.rhbz879823.fc18.x86_64 (included patch write mbox regs twice on snb, v2)

Linux bobloblaw 3.9.2-201.rhbz879823.fc18.x86_64 #1 SMP Thu May 16 13:35:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[45482.757631] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[45482.757645] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[45482.766942] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
[45482.770617] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.

I added patch (drm/i915: Resurrect ring kicking for semaphores, selectively) to Fedora 18's 3.9.2-200 x86_64 kernel.

Revision history for this message
In , Cwawak (cwawak) wrote :

Is there any input or assistance I can give to help move this along?

Thanks!

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82747
New read-after-write patch

New patch for testing, thanks!

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82748
New read-after-write patch

Revision history for this message
In , Mikhail Gavrilov (mikegav) wrote :

For which version of the kernel this patch?

Revision history for this message
In , Longerdev (longerdev) wrote :
Download full text (4.6 KiB)

I tried it patch on linux-3.11_rc1, but when X starting I see:
791966 Jul 21 16:17:07 localhost kernel: [ 19.320879] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
791967 Jul 21 16:17:07 localhost kernel: [ 19.320948] IP: [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791968 Jul 21 16:17:07 localhost kernel: [ 19.320995] PGD b0d80067 PUD b0c18067 PMD 0
791969 Jul 21 16:17:07 localhost kernel: [ 19.321031] Oops: 0000 [#1] PREEMPT SMP
791970 Jul 21 16:17:07 localhost kernel: [ 19.321064] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec brcmsmac snd_hwdep snd_p cm cordic brcmutil bcma snd_page_alloc snd_timer snd soundcore
791971 Jul 21 16:17:07 localhost kernel: [ 19.321209] CPU: 0 PID: 2696 Comm: X Not tainted 3.11.0-rc1 #1
791972 Jul 21 16:17:07 localhost kernel: [ 19.321249] Hardware name: SAMSUNG ELECTRONICS CO., LTD. SF311/SF411/SF511/SF311/SF411/SF511, BIOS 06HW.M011.20110503.SCY 05 /03/2011
791973 Jul 21 16:17:07 localhost kernel: [ 19.321322] task: ffff8800b1c07590 ti: ffff8800b0c24000 task.ti: ffff8800b0c24000
791974 Jul 21 16:17:07 localhost kernel: [ 19.321370] RIP: 0010:[<ffffffff8136bfc0>] [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178
791975 Jul 21 16:17:07 localhost kernel: [ 19.321426] RSP: 0018:ffff8800b0c25bc8 EFLAGS: 00010286
791976 Jul 21 16:17:07 localhost kernel: [ 19.321461] RAX: 0000000000000000 RBX: ffff8800b1c3d4d8 RCX: 0000000000027330
791977 Jul 21 16:17:07 localhost kernel: [ 19.321506] RDX: 0000000000000080 RSI: ffffc900045c003c RDI: ffffc900045c0038
791978 Jul 21 16:17:07 localhost kernel: [ 19.321550] RBP: ffff8800b0c25c08 R08: ffff8800b0d97f00 R09: 00000000000145c0
791979 Jul 21 16:17:07 localhost kernel: [ 19.321594] R10: 0000000000001000 R11: ffff8800b1c3c000 R12: 0000000000000000
791980 Jul 21 16:17:07 localhost kernel: [ 19.321638] R13: 0000000000002044 R14: 0000000000000000 R15: ffff8800b1c3c000
791981 Jul 21 16:17:07 localhost kernel: [ 19.321682] FS: 00007ff167ae8880(0000) GS:ffff880100200000(0000) knlGS:0000000000000000
791982 Jul 21 16:17:07 localhost kernel: [ 19.321732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
791983 Jul 21 16:17:07 localhost kernel: [ 19.321767] CR2: 0000000000000010 CR3: 00000000b1cc9000 CR4: 00000000000407f0
791984 Jul 21 16:17:07 localhost kernel: [ 19.321810] Stack:
791985 Jul 21 16:17:07 localhost kernel: [ 19.321824] ffff8800b1c3d4d8 0000000000000000 ffff8800aff24000 0000000000000000
791986 Jul 21 16:17:07 localhost kernel: [ 19.321876] ffff8800b1c3c000 ffff8800b0d97f00 ffff8800b1f66a00 ffff8800b1c3d4d8
791987 Jul 21 16:17:07 localhost kernel: [ 19.321927] ffff8800b0c25c68 ffffffff81334b11 ffff880000000028 0000000000000000
791988 Jul 21 16:17:07 localhost kernel: [ 19.321979] Call Trace:
791989 Jul 21 16:17:07 localhost kernel: [ 19.322000] [<ffffffff81334b11>] __i915_add_request+0x6d/0x215
791990 Jul 21 16:17:07 localhost kernel: [ 19.322045] [<ffffffff8133b8d9>] i915_gem_do_execbuffer.isra.14+0xd07/0xdc5
791991 Jul 21 16:17:07 localhost kernel: [ 19.322089] [<ffffffff8133bd5e>] ? i915_gem_execbuffer2+0x5d/0x1e3
791992 Jul 21 1...

Read more...

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 82768
New read-after-write patch

Oops, my mistake, please try again.

Revision history for this message
In , Longerdev (longerdev) wrote :

Created attachment 82773
i915_error_state with new patch

(In reply to comment #92)
> Created attachment 82768 [details] [review]
> New read-after-write patch
>
> Oops, my mistake, please try again.

Now loading, but after five minutes test:
793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 ctx 1) at 0xbfe21dc

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #93)
> Created attachment 82773 [details]
> i915_error_state with new patch
>
> (In reply to comment #92)
> > Created attachment 82768 [details] [review] [review]
> > New read-after-write patch
> >
> > Oops, my mistake, please try again.
>
> Now loading, but after five minutes test:
> 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> ctx 1) at 0xbfe21dc

That is a blorp (mesa/i965) bug and not the semaphore deadlock.

Revision history for this message
Tomasz Melcer (liori) wrote :

I can reproduce this problem fairly regularly on a certain game inside wine (and nowhere else) on my up-to-date Debian Sid box. If there's any solution/patch to test and there's any benefit in testing it on a non-Ubuntu system, I volunteer to help.

Revision history for this message
Chris Wilson (ickle) wrote :
Revision history for this message
In , Chris Wilson (ickle) wrote :

Will someone please try https://bugs.freedesktop.org/attachment.cgi?id=82768 with a working mesa! :)

Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

The patch seems to have helped -- my box survived a couple days with the patch applied.

Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

The bad news is that I've just had the semaphore hang with all the read-after-write patch applied. :|

Revision history for this message
In , Januszmk6 (januszmk6) wrote :

(In reply to comment #94)
> (In reply to comment #93)
> > Created attachment 82773 [details]
> > i915_error_state with new patch
> >
> > (In reply to comment #92)
> > > Created attachment 82768 [details] [review] [review] [review]
> > > New read-after-write patch
> > >
> > > Oops, my mistake, please try again.
> >
> > Now loading, but after five minutes test:
> > 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel
> > 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting
> > 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085]
> > [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> > 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing
> > error event; look for more information in
> > /sys/kernel/debug/dri/0/i915_error_state
> > 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124]
> > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000
> > ctx 1) at 0xbfe21dc
>
> That is a blorp (mesa/i965) bug and not the semaphore deadlock.
Could you please provide some link to this blorp bug report?
I had problem with semaphore deadlock, seems that with kernel 3.11 problem does not occur (without patch), but now I have:

[22221.843000] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[22221.843483] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4dfb5000 ctx 1) at 0x4dfb5518

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 68913 has been marked as a duplicate of this bug. ***

description: updated
Revision history for this message
In , Dan Doel (dan-doel) wrote :

I have, I think, a reliable way to trigger this behavior, if that helps. It requires a non-trivial setup, though.

I have gnome-shell running on dual monitors. The first is 1920x1200, the second is 1920x1080 (not sure if the resolution difference matters). If I run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in the dmesg about hangcheck timers and kickrings ("stuck wait on blitter ring").

I believe OpenGL acceleration of the desktop is important, because the freezes are not triggered in fluxbox, for instance. I'm not sure if the game itself needs to be using OpenGL, or if the full-screen window is the triggering factor, or something else entirely. It is important that the game keep the monitors distinct, and only go full screen on one. I just tried it on Battle for Wesnoth, and full screen there sets the monitors to mirror, which doesn't trigger the problem.

This is on an i7 4770, if that matters.

I realize this is may be difficult to put together for a test setup, but I thought I'd mention it.

Revision history for this message
In , Januszmk6 (januszmk6) wrote :

(In reply to comment #100)
> I have, I think, a reliable way to trigger this behavior, if that helps. It
> requires a non-trivial setup, though.
>
> I have gnome-shell running on dual monitors. The first is 1920x1200, the
> second is 1920x1080 (not sure if the resolution difference matters). If I
> run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in
> the dmesg about hangcheck timers and kickrings ("stuck wait on blitter
> ring").
>
> I believe OpenGL acceleration of the desktop is important, because the
> freezes are not triggered in fluxbox, for instance. I'm not sure if the game
> itself needs to be using OpenGL, or if the full-screen window is the
> triggering factor, or something else entirely. It is important that the game
> keep the monitors distinct, and only go full screen on one. I just tried it
> on Battle for Wesnoth, and full screen there sets the monitors to mirror,
> which doesn't trigger the problem.
>
> This is on an i7 4770, if that matters.
>
> I realize this is may be difficult to put together for a test setup, but I
> thought I'd mention it.

I also have dual monitors and also gnome-shell, but I have on both 1920x1080px. I notice that when I am watching some videos on full screen on one monitor, this is happening more often (on non full-screen work, it's still happening)

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #100)
> This is on an i7 4770, if that matters.

No, that's something completely new. Please open a new bug report and attach your dmesg, Xorg.0.log and /sys/drm/card0/error from after one of the hangs.

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

I filed the duplicate of this bug 1222261 as I get this frequently on both my i7 laptop and i7 desktop.

I think I found a way to reproduce it easily on demand. Open chromium, and visit google maps. Sign up for "new maps". Visit somewhere on the map and zoom all the way in then zoom in and out a bit. I have triggered this bug a few times just by doing that. I'm not using dual screens or anything funky. Just a basic Ubuntu 13.10 install with Unity.

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

Forgot to mention, I am using the semaphores kernel parameter...

alan@wopr:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.11.0-4-generic root=UUID=3d53ca44-6af4-422d-b1b0-adf30c679a2f ro quiet splash i915.semaphores=0 vt.handoff=7

Revision history for this message
Lorant Nemeth (loci) wrote : Re: [Bug 1041790] Re: [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

Hi,

I can confirm, that this way I'm able to reproduce the bug as well.

Br,
     Loci

On 09/11/2013 03:02 AM, Alan Pope ㋛ wrote:
> I filed the duplicate of this bug 1222261 as I get this frequently on
> both my i7 laptop and i7 desktop.
>
> I think I found a way to reproduce it easily on demand. Open chromium,
> and visit google maps. Sign up for "new maps". Visit somewhere on the
> map and zoom all the way in then zoom in and out a bit. I have triggered
> this bug a few times just by doing that. I'm not using dual screens or
> anything funky. Just a basic Ubuntu 13.10 install with Unity.
>

Revision history for this message
bharath (bharath1097) wrote :

(In reply to #163)

I can reproduce this bug as well on a sandybridge i5 desktop

Revision history for this message
mike@papersolve.com (mike-papersolve) wrote :

I can also reproduce the bug using the new Google Maps (easiest way to do so seems to be zooming in/out). However it occurs for me even after I disable semaphores:

root@hawty:~# cat /sys/module/i915/parameters/semaphores
0

(after that):
[614445.590357] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[614445.590468] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5268000 ctx 10) at 0x5268210

this is ubuntu 13.04 with 2.21.6-0ubuntu4 intel driver, but kernel 3.11.1.

Revision history for this message
Chris Wilson (ickle) wrote :

That's because that would be NOT the same bug.

Revision history for this message
mike@papersolve.com (mike-papersolve) wrote :

Sorry. :) Unfortunately/fortunately I can no longer reproduce it after doing a BIOS update on my ASUS motherboard (was about 18 months behind). Based on your description of the bug as possibly related to the BIOS I decided to do that update, and even though it's only been about 10 minutes of usage, I can't reproduce this in Google Maps at all, where it was sure to cause the bug. I'd certainly recommend that anyone having this issue investigate a BIOS upgrade.

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

Created attachment 87101
i915_error_state (kernel 3.11.3)

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

After playing hedgewars for about half an hour, the gpu started to hang.
dmesg output:
[ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3442.907471] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5e52000 ctx 1) at 0x5e52220
[ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
I'm not sure my problem is related to this bug.

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

(In reply to comment #104)
> After playing hedgewars for about half an hour, the gpu started to hang.
> dmesg output:
> [ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3442.907471] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x5e52000 ctx 1) at 0x5e52220
> [ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
> [ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for
> forcewake old ack to clear.
> I'm not sure my problem is related to this bug.

My laptop is Thinkpad T420 with i5-2520M. The BIOS version is 1.44.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #104)
> I'm not sure my problem is related to this bug.

Most likely it isn't - gpu hang is similar to an application crashing. Please file a new bug report and don't forget to attach the error state file. That's the first thing we need to triage the bug.

And of course list the versions of all the userspace driver parts (mesa, ddx, ...) since like a normal application crash most often it's not a kernel bug, but a bug in the render commands submitted by userspace to the gpu.

Revision history for this message
In , Longerdev (longerdev) wrote :

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Why userspace drivers can breaking render and calling error in kernel part of driver? May be can add "filter" sent commands and ignore (or other reaction, but not execute their) their?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #107)
> (In reply to comment #106)
> > (In reply to comment #104)
> > > I'm not sure my problem is related to this bug.
> >
> > Most likely it isn't - gpu hang is similar to an application crashing.
> > Please file a new bug report and don't forget to attach the error state
> > file. That's the first thing we need to triage the bug.
> >
> > And of course list the versions of all the userspace driver parts (mesa,
> > ddx, ...) since like a normal application crash most often it's not a kernel
> > bug, but a bug in the render commands submitted by userspace to the gpu.
>
> Why userspace drivers can breaking render and calling error in kernel part
> of driver? May be can add "filter" sent commands and ignore (or other
> reaction, but not execute their) their?

The GPU is a full Turing complete computational engine (in fact, lots of them coupled in parallel and in series), see http://xkcd.com/1266/

Revision history for this message
Shuhao (shuhao) wrote :

Does anyone here notice graphics slow down? After a couple minutes of game play with CSS, my framerate would drop to about 20fps (status of other activities on the computer will also slowdown).

This did not happen in 13.04 but is occuring with 13.10.

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

(In reply to comment #106)
> (In reply to comment #104)
> > I'm not sure my problem is related to this bug.
>
> Most likely it isn't - gpu hang is similar to an application crashing.
> Please file a new bug report and don't forget to attach the error state
> file. That's the first thing we need to triage the bug.
>
> And of course list the versions of all the userspace driver parts (mesa,
> ddx, ...) since like a normal application crash most often it's not a kernel
> bug, but a bug in the render commands submitted by userspace to the gpu.

Someone has reported it here.
https://bugs.freedesktop.org/show_bug.cgi?id=70151

Revision history for this message
In , Honza-h (honza-h) wrote :

Hello. Same problem here.

[ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8
[ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8
[ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220

sys: fedora 19 64b
Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

WM: KDE with effects enabled

8G ram
300G SATA HDD
ntb Lenovo ThinkPad E320

problem occurs in:
- scrolling in firefox
- playing video in vlc and switch to KDE terminal or another app
- sometimes system hangs, cpu 100%, freeze and hard reboot needed
- sometimes happens if I work with ff or in terminal only (very frustrating)
- happening across many kernel versions 3.0 to newest I think

lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #110)
> Hello. Same problem here.
>
> [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 485.443467] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0xa637000 ctx 1) at 0xa6371c8

Unlikey that this is the same gpu hang. Please file a new bug report and attach the error state.

Revision history for this message
In , theghost (theghost) wrote :

Just a few remarks.
I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups highly increased (especially in games).
Additionally with running the latest drivers complete system lockups are gone, but it's still a lockup for multiple seconds with following VT switching.
Maybe these observations help somehow.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #112)
> Just a few remarks.
> I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
> Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups
> highly increased (especially in games).

On snb the blorp engine in mesa has become a bit more hang-happy, see bug #70151
Not all gpu hangs are created equal ;-)

> Additionally with running the latest drivers complete system lockups are
> gone, but it's still a lockup for multiple seconds with following VT
> switching.

You mean a gpu hang happens while when doing a vt switch?

Revision history for this message
In , theghost (theghost) wrote :

(In reply to comment #113)
> On snb the blorp engine in mesa has become a bit more hang-happy, see bug
> #70151
> Not all gpu hangs are created equal ;-)
>

Actually it was on Sandybridge.

> You mean a gpu hang happens while when doing a vt switch?

No I meant, if you suffer a lockup you just have to wait a few seconds and switch to another VT and back, then you can resume with your system (although sometimes fonts are broken).

Revision history for this message
In , Alexander (bay-hackerdom) wrote :

Created attachment 87857
i915_error_state

I also met this bug while I was watching video in mplayer. It every 1-2 hours.

[40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[40787.765852] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1fb63000 ctx 1) at 0x1fb63220

Revision history for this message
In , Alexander (bay-hackerdom) wrote :

Created attachment 87858
X -version output

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #115)
> Created attachment 87857 [details]
> i915_error_state
>
> I also met this bug while I was watching video in mplayer. It every 1-2
> hours.
>
> [40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [40787.765852] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x1fb63000 ctx 1) at 0x1fb63220

This looks like bug #70151, but is definitely not this bug here.

Revision history for this message
Thomas Mayer (thomas303) wrote :

It seems that ubuntu 12.04.3 is also affected.

I get the error using ubuntu 12.04.3 (after upgrading from 12.04.2 in the last days):
Oct 28 18:43:29 localhost kernel: [31236.041655] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Oct 28 18:43:29 localhost kernel: [31236.041664] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Oct 28 18:43:32 localhost kernel: [31239.040790] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Oct 28 18:43:32 localhost kernel: [31239.041127] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Oct 28 18:43:32 localhost kernel: [31239.041132] [drm:i915_reset] *ERROR* Failed to reset chip.
Oct 28 18:43:38 localhost gnome-session[3983]: WARNING: App 'gnome-wm.desktop' respawning too quickly
Oct 28 18:43:38 localhost gnome-session[3983]: CRITICAL: We failed, but the fail whale is dead. Sorry....

Kernel version:
3.8.0-32-generic #47~precise1-Ubuntu SMP Wed Oct 2 16:19:35 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

xserver-xorg-video-intel-lts-raring version 2:2.21.6-0ubuntu4.1~precise1

For me the error occurs when I move the mouse cursor in the PhpStorm IDE, which is based on oracle java (I use version 1.8). The error occurs every few hours when working with PhpStorm 7.0

Revision history for this message
theghost (theghost) wrote :

@thomas303: Since 12.04.3 uses the video stack and kernel of Raring it's no wonder that it's also affected.
If you didn't have the errors before 12.04.3 you can still revert to the video stack / kernel of 12.04.2 (Quantal) or 12.04 (Precise).

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

I have been running kernel 3.12.0-031200rc6-generic for a while now and in 8 days uptime I haven't had any lockups that I recall. Previously on older kernels on 13.10 I would get more than one lockup a day, sometimes many a day.

Revision history for this message
theghost (theghost) wrote :

@popey:

I tested kernel 3.12.0-031200rc7-generic with Mesa 9.2.2 and xf86-video-intel-2.99.905 running Dota 2 which, is a useful test case to produce hangs and I can assure that there are still plenty lockups. Only the output differs now:

[ 2937.818867] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1c703000 ctx 1) at 0x1c7037e0
[ 2943.810157] [drm] stuck on render ring
[ 2943.810208] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x7a57000 ctx 1) at 0x5c
[ 3152.914976] [drm] stuck on render ring
[ 3152.915045] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x7549000 ctx 1) at 0x7549288
[ 3568.158967] [drm] stuck on render ring
[ 3568.174992] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x12d63000 ctx 1) at 0x12d637e0
[ 3568.175030] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x1b78f000 ctx 1) at 0x12d637e0
[ 3839.310462] [drm] stuck on render ring
[ 3839.310463] [drm] stuck on blitter ring
[ 4292.575683] [drm] stuck on render ring
[ 4292.575684] [drm] stuck on blitter ring

So it's still in the kernel. ;)

Revision history for this message
In , Yjcoshc (yjcoshc) wrote :

Created attachment 89314
i915_error_state (kernel 3.11.6, mesa 9.2.2, xf86-video-intel 2.99.906)

GPU hangs after playing hedgewars for a few minutes. Thinkpad T420 laptop, i5-2520M.
dmesg error message:
[16901.286432] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16901.286441] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[16901.286444] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[16908.287504] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16908.287508] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring

Revision history for this message
theghost (theghost) wrote :

If you have these problems running Dota 2, you should try Mesa Git or wait for Mesa 10. It contains several patches to remove lockups.
For me on Dota 2 the lockups are completely gone, probably they're also gone in other applications.

Revision history for this message
In , Kenxeth (kenxeth) wrote :

*** Bug 71890 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 72048 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 72829 has been marked as a duplicate of this bug. ***

Revision history for this message
penalvch (penalvch) wrote :

Rocko, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux REPLACE-WITH-BUG-NUMBER

Please note, given that the information from the prior release is already available, doing this on a release prior to the development one would not be helpful.

If reproducible, could you also please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc7

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

description: updated
tags: added: bios-outdated-a12
Changed in linux (Ubuntu):
importance: High → Low
status: Confirmed → Incomplete
tags: added: needs-upstream-testing regression-potential
Revision history for this message
Hamish MacEwan (hamish-macewan) wrote : Re: [Bug 1041790] Re: [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

On 8 January 2014 07:20, Christopher M. Penalver
<email address hidden> wrote:

> Rocko, this bug was reported a while ago and there hasn't been any
> activity in it recently. We were wondering if this is still an issue? If
> so, could you please test for this with the latest development release
> of Ubuntu? ISO images are available from http://cdimage.ubuntu.com
> /daily-live/current/ .

Hi Christopher, I'm not Rocko, but haven't had any trouble with this
bug on Debian of late.

Hamish.
--
http://About.me/Hamish.MacEwan

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 73659 has been marked as a duplicate of this bug. ***

Revision history for this message
In , A-bugzilla (a-bugzilla) wrote :

Created attachment 92710
i915_error_state

I'm also getting regular Sandybridge GPU lockups with Mesa 10.0.1 and Linux kernel 3.13.

dmesg output:

[ 918.876872] [drm] stuck on render ring
[ 918.876876] [drm] stuck on blitter ring
[ 918.876878] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 918.876879] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 918.876879] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 918.876880] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 918.876880] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 932.923240] [drm] stuck on render ring
[ 932.923242] [drm] stuck on blitter ring

Unfortunately the crash dump doesn't help - it's an empty file!

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74180 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74265 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74452 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74473 has been marked as a duplicate of this bug. ***

Revision history for this message
Adam Conrad (adconrad) wrote :

This is still happening (although very infrequently) on current trusty. I just hit it this morning.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 74867 has been marked as a duplicate of this bug. ***

Revision history for this message
Steven Goris (sg-steven13) wrote :

I experience this bug on Linux Mint 16 Cinnamon. It drives me crazy. My computer hangs approx every 2 hours. I tried the fix in grub. I hope it works as a temporary fix, because I can't work like this on my computer.
Linux 3.11.0-15-generic

no longer affects: linuxmint
Revision history for this message
unksi (unksi) wrote :

I found it a big help to switch to tty1 with ctrl+alt+F1 and then back with ctrl+alt+F7/F8. This would make it return to normal a lot faster, and unstuck it most of the times it seems to be totally stuck.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 75163 has been marked as a duplicate of this bug. ***

Revision history for this message
Spect (al106208) wrote :

I experience this bug on Ubuntu 12.04.4.
System: Ubuntu 12.04.4 LTS x86_64
Kernel: 3.11.0-17-generic DE: Unity Session: ubuntu
Use: xserver-xorg-video-intel-lts-saucy vers.2:2.99.904-0ubuntu2.1~precise1
----------------------------------
Processor: Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz Memory (Gb): 7.53
Video: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Subsystem: Gigabyte Technology Co., Ltd Device d000 Kernel driver in use: i915
----------------------------------
kern.log:
Feb 28 15:24:44 specttop kernel: [42677.565850] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1c85f000 ctx 1) at 0x1c85f220
Feb 28 15:24:44 specttop kernel: [42677.565904] [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x4d8f000 ctx 0) at 0x1c85f220

Revision history for this message
In , Simtn (simtn) wrote :

Created attachment 95090
Another version of the same hang - directed here from bug 75502

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 75999 has been marked as a duplicate of this bug. ***

Changed in xserver-xorg-video-intel:
status: Confirmed → In Progress
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76408 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76677 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 76801 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

For what its worth, running 3.13.7 greatly mitigates this bug, to where the dead time is barely noticeable. It happened three times in short order here and I didn't notice any of them:

[ 4562.551141] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4582.530028] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4633.476199] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77043 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77058 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

My stuck ring faults are completely gone with i915.i915_enable_rc6=0. Fan stays on a bit more (subjectively) seems to be the only side effect. HP Pavilion dv6 (Sandybridge).

Revision history for this message
In , Chris Wilson (ickle) wrote :

Oh that's interesting. We might be able to find a register to prevent rc6 whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just frob forcewake directly.)

Revision history for this message
In , Phil Turmel (pturmel-lp) wrote :

(In reply to comment #139)
> Oh that's interesting. We might be able to find a register to prevent rc6
> whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just
> frob forcewake directly.)

Happy to test patches. I'm updating to 3.13.9 tonight. I could add something on top if you have ideas. If you need more info than my attachment to #76801 just let me know.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77147 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 77974 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 78317 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Hoek (artjom-simon) wrote :

Created attachment 98589
Kernel 3.14.2-1-ARCH, xf86-video-intel 2.99.911-2, mesa 10.1.2-1

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 78785 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 79500 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 79640 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

commit ca79d888eb63cdacf80653ae23ce8f7d9ac52c68
Author: Chris Wilson <email address hidden>
Date: Fri Jun 6 10:22:29 2014 +0100

    drm/i915: Reorder semaphore deadlock check

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80055 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80125 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80168 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80401 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80592 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 80935 has been marked as a duplicate of this bug. ***

Revision history for this message
DooMMasteR (winrootkit) wrote :
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81064 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

Can someone indicate what the current status of this is?

Revision history for this message
In , Yunloh (yunloh) wrote :

I haven't seen it with xorg-x11-drv-intel-2.99.912-4 (built for fc20) from kojipkgs.

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

I'm using 2.21.15 which as far as I know is the latest release.

Revision history for this message
In , Andre Robatino (robatino) wrote :

I am seeing

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

followed by a graphics freeze and the need to reboot (if I can) in Fedora 20 with the latest updates including the 3.15.4 kernel.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81402 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Matteo Croce (teknoraver) wrote :

same happens with 3.15.0 on Ubuntu 14.04 64 bit

Jul 11 12:43:41 localhost kernel: [42049.462542] [drm] stuck on render ring
Jul 11 12:43:41 localhost kernel: [42049.463330] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:41 localhost kernel: [42049.463334] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 11 12:43:41 localhost kernel: [42049.463335] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jul 11 12:43:41 localhost kernel: [42049.463336] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 11 12:43:41 localhost kernel: [42049.463337] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jul 11 12:43:41 localhost kernel: [42049.463338] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jul 11 12:43:43 localhost kernel: [42051.464623] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:47 localhost kernel: [42055.468816] [drm] stuck on render ring
Jul 11 12:43:47 localhost kernel: [42055.469614] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset
Jul 11 12:43:49 localhost kernel: [42057.470899] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Jul 11 12:43:53 localhost kernel: [42061.439056] [drm] stuck on render ring
Jul 11 12:43:53 localhost kernel: [42061.439867] [drm] GPU HANG: ecode 0:0xfeffffff, in chrome [2172], reason: Ring hung, action: reset

Revision history for this message
In , Cwawak (cwawak) wrote :

[872948.822279] [drm] stuck on render ring
[872948.822291] [drm] stuck on blitter ring
[872948.823041] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [30647], reason: Ring hung, action: reset
[872948.823045] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[872948.823046] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[872948.823047] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[872948.823048] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[872948.823049] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[872948.823168] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
[872950.821912] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Linux bobloblaw 3.15.0-1.fc20.x86_64 #1 SMP Sat Jun 14 11:22:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Attaching gpu crash dump as card0-error.071714-cwawak

Revision history for this message
In , Cwawak (cwawak) wrote :

Created attachment 102991
card0-error.071714-cwawak - gpu dump

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81673 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81676 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81710 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81844 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 81990 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82277 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82301 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82399 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82451 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mstahl (mstahl) wrote :

*** Bug 82620 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82631 has been marked as a duplicate of this bug. ***

Timo Aaltonen (tjaalton)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: Timo Aaltonen (tjaalton) → nobody
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82666 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82691 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 82901 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83098 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83156 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83326 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83473 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83661 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Manuel Widmer (m-widmer-d) wrote :

Is there any ongoing development to fix this bug? I still see it with
Linux <hostname> 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

And the latest intel drivers as provided by intel linux graphics installer from
https://01.org/linuxgraphics/

Many times my system freezes few minutes after starting to watch a movie with vlc. I have my screen connected through a receiver (hdmi for audio + video) with the linux system. The probability for a freeze is higher when the hdmi receiver was powered of for some time before playing the movie than when I do a reboot and hdmi is always on.

I'm happy to help with crashdumps as far as I'm able to collect them.

Revision history for this message
In , Bartosz Brachaczek (b-brachaczek) wrote :

(In reply to comment #183)

I recommend configuring i915.semaphores=0. I did it and it doesn't freeze anymore.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83721 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 83783 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Frank Stephan (f-st) wrote :

Hi Chris,

meanwhile my current kernel is 3.16.1-46.1.g90bc0f1
I'm wondering (after a reinstall) that the semaphore bug hasn't occured yet, which was the case before (after a fresh install).

This leads me to 4 definable possible reasons:

1. the named kernel revision somehow contains a fix for it. looking at the changes I could'nt get an affirmation to that assumption.
2. cgroup_memory=disabled has a relation to it. (That's why I removed it for now).
3. the BIOS settings (which could be different now) might have something to do with it.
4. I haven't installed KVM suppport yet.

I'll post again if I find a reproducible explanation.
Frank

Revision history for this message
In , Frank Stephan (f-st) wrote :

2. of course I meant cgroup_disable=memory

Revision history for this message
In , Frank Stephan (f-st) wrote :

Hi Chris,

OK, nothing of the above was the reason. In my case it's simply this:

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "TearFree" "true"
EndSection

I added it when the tearing scrolling through large webpages annoyed me.
As soon as I added it, the problems quickly started.

Selfmade problem.

Frank

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #189)
> Hi Chris,
>
> OK, nothing of the above was the reason. In my case it's simply this:
>
> /etc/X11/xorg.conf.d/20-intel.conf
>
> Section "Device"
> Identifier "Intel Graphics"
> Driver "intel"
> Option "TearFree" "true"
> EndSection
>
>
> I added it when the tearing scrolling through large webpages annoyed me.
> As soon as I added it, the problems quickly started.
>
> Selfmade problem.

Not really, https://bugs.freedesktop.org/show_bug.cgi?id=70764 tracks that this hang is more likely with TearFree (fundamentally the hang is still the same hardware issue, but it is interesting that TearFree has a higher chance of hitting it).

If you want to experiment:

 http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=requests

should have an interesting fix, at least for trying to prevent the TearFree leading to the semaphore hang.

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

What information is most useful for these repeating issues, as it just happened again:

 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140239] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140750] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:33:01 arrowsmithlap1 kernel: [1182244.142445] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
 Sep 16 08:33:01 arrowsmithlap1 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

The only thing under my /etc/X11/xorg.conf.d/ is 00-keyboard.conf (system generated).

Do you want a copy of /sys/class/drm/card0/error every time?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #191)
> What information is most useful for these repeating issues, as it just
> happened again:
>
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> render ring
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> blitter ring

So long as it is the same event, there is no more information we need other than testing feedback for an eventual workaround.

Revision history for this message
In , Manuel Widmer (m-widmer-d) wrote :

(In reply to comment #184)
> (In reply to comment #183)
>
> I recommend configuring i915.semaphores=0. I did it and it doesn't freeze
> anymore.

Meanwhile I tested both i915.semaphores=0 and i915.semaphores=1 neither of which did help in my case. But with i915.semaphores=0 my system became much more unstable and even crashed on its own after some days without stress on graphics (just ran some desktop apps like thunar or vlc for music only - no movies). With i915.semaphores=1 the system is at least stable (for some weeks) as long as I don't heavily use desktop applications.

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

*** Bug 85194 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85333 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85609 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Josh Glover (jmglov) wrote :

I am also experiencing this, on a Gentoo system running on a ThinkPad T440s. I'm not doing anything related to XBMC, simply using xrandr for multihead. The interesting thing is that DRI works fine on my laptop screen (glxgears reports 60fps, which is the refresh rate of my screen), but breaks when I move a window trying to use DRI (e.g. Chrome, glxgears) to the external monitor connected to the mini Display Port output.

I see this stuff in dmesg:

[ 3561.424762] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3561.424770] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 3561.424772] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 3561.424774] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 3561.424776] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 3561.424778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 3566.422957] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3571.425143] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3575.423680] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring

Seems like the same issue. I'm trying to downgrade X, mesa, et al., to try and get the system back in working order.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

*** Bug 79675 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 85972 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86058 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fritsch-b (fritsch-b) wrote :

For those running Ubuntu, here is a build of a kernel based on 3.17.1 with the patches Chris Willson wants you to test:

- Those patches have other regressions (so be careful to only test your specific issue).

https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb
https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb

Those kernels are based on: https://bugs.freedesktop.org/show_bug.cgi?id=83677#c35

Beware, don't switch VTs.

Revision history for this message
In , Tomas Huryn (thuryn1) wrote :

I've tryed the mentioned kernel on my Fedora 21 Beta and still hangs after for example Netbeans opens main window for the whole screen.

Revision history for this message
In , Smruti-patil (smruti-patil) wrote :

*** Bug 86437 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86765 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86836 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 86925 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 87710 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 87776 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88541 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

*** Bug 88626 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88723 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 88789 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89078 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89299 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89570 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89671 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89774 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89771 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89981 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90106 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90146 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90271 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90473 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 90835 has been marked as a duplicate of this bug. ***

Revision history for this message
In , helios (martin-lichtvoll) wrote :

Chris, you referred me to this bug as I reported

Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring

I skimmed through it and it appears that there are some patches to test? But I am not sure which ones these are. Can you or someone else enlighten me?

Also I note that I still use

        Option "AccelMethod" "uxa"

and I have

martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
options i915 modeset=1 i915_enable_rc6=7

thus maximum energy saving. But according to powertop it never enters the highest sleep state anyway.

I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it.

And its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang? I didn´t have this before 4.1 kernel?

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to Martin Steigerwald from comment #225)
> Chris, you referred me to this bug as I reported
>
> Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> semaphore on render ring
>
> I skimmed through it and it appears that there are some patches to test? But
> I am not sure which ones these are. Can you or someone else enlighten me?

There's likely a modest improvement in 4.2.

> Also I note that I still use
>
> Option "AccelMethod" "uxa"
>
> and I have
>
> martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> options i915 modeset=1 i915_enable_rc6=7

Fortuitously that dangerous option doesn't do anything for your kernel.

> ffffffff813a4b0e
> thus maximum energy saving. But according to powertop it never enters the
> highest sleep state anyway.
>
> I will remove the AccelMethod setting now and see whether it helps. If not,
> I downgrade to 4.1-rc4 for now, as issues have been at least much less
> frequent with it.

Purely circumstantial.

> And its really that for me 4.1-rc6 makes things much *worse*. I am typing
> this after a clean reboot and already got the GPU hang again. It happens
> about every few minutes. Are you really sure this is the same GPU hang? I
> didn´t have this before 4.1 kernel?

Yes.

Revision history for this message
In , helios (martin-lichtvoll) wrote :

(In reply to Chris Wilson from comment #226)
> (In reply to Martin Steigerwald from comment #225)
> > Chris, you referred me to this bug as I reported
> >
> > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> > semaphore on render ring
> >
> > I skimmed through it and it appears that there are some patches to test? But
> > I am not sure which ones these are. Can you or someone else enlighten me?
>
> There's likely a modest improvement in 4.2.

Nice.

> > Also I note that I still use
> >
> > Option "AccelMethod" "uxa"
> >
> > and I have
> >
> > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> > options i915 modeset=1 i915_enable_rc6=7
>
> Fortuitously that dangerous option doesn't do anything for your kernel.

Well I found out why, I compiled i915 into the kernel it seems, at least I don´t have an i915 module in lsmod. But also i915.i915_enable_rc6=7 on kernel command line does not seem to have any effect. I removed the option.

> > ffffffff813a4b0e
> > thus maximum energy saving. But according to powertop it never enters the
> > highest sleep state anyway.
> >
> > I will remove the AccelMethod setting now and see whether it helps. If not,
> > I downgrade to 4.1-rc4 for now, as issues have been at least much less
> > frequent with it.
>
> Purely circumstantial.

Since using SNA I didn´t see a GPU hang so far. Too early to say for sure, but it seems something in UXA may have triggered it more easily.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91212 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91662 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91810 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91832 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

(In reply to Chris Wilson from comment #192)
> (In reply to comment #191)
> > What information is most useful for these repeating issues, as it just
> > happened again:
> >
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> > render ring
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> > blitter ring
>
> So long as it is the same event, there is no more information we need other
> than testing feedback for an eventual workaround.

Is this the same bug?

$ journalctl -p 3 -b -1
Ruj 25 02:13:01 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:01 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
... [ repeated messages ] ...
Ruj 25 02:13:33 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:33 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
Ruj 25 02:13:34 crnigrom kernel: [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring
Ruj 25 02:13:34 crnigrom kernel: [drm:init_ring_common [i915]] *ERROR* render ring initialization failed ctl 00000000 (valid? 0) head 00000000 tail 00000000 start 00000000 [expected 00000000]
Ruj 25 02:13:34 crnigrom kernel: [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5
Ruj 25 02:13:34 crnigrom gnome-session[1823]: Unrecoverable failure in required component gnome-shell.desktop

After which gnome crashes with "Oh No Something Is Wrong" screen

$ uname -r
4.1.7-200.fc22.x86_64

Hardware i3-2100 CPU/GPU

This bug is going on already for a long long time, but at least computer is not hard freezing anymore, although gnome is crashing so any gtk applications running doing something stalls.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92118 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92739 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was resolved by uninstalling various components, re-installing and updating them. I have a hunch (completely unproven) that it was a transparent bit-fail issue from the SSD. By un-installing and re-installing, the files were likely installed to a different location on the drive. It wasn't configuration, as I tried erasing, and even rolling back to defaults, with the problem still persisting. As it was almost daily, prior to uninstall, and hasn't happened since the install, this is all I can attribute it to.

HTH someone.

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119432
attachment-28908-0.html

I reported this bug from a system without an SSD. Recently, I have not
seen the kernel messages appear however--currently on linux 4.2.5.

On Sun, Nov 1, 2015 at 10:04 PM, <email address hidden> wrote:

> *Comment # 235 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c235>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was
> resolved by uninstalling various components, re-installing and updating them. I
> have a hunch (completely unproven) that it was a transparent bit-fail issue
> from the SSD. By un-installing and re-installing, the files were likely
> installed to a different location on the drive. It wasn't configuration, as I
> tried erasing, and even rolling back to defaults, with the problem still
> persisting. As it was almost daily, prior to uninstall, and hasn't happened
> since the install, this is all I can attribute it to.
>
> HTH someone.
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

(In reply to Jeffrey E. Bedard from comment #236)
> Created attachment 119432 [details]
> attachment-28908-0.html
>
> I reported this bug from a system without an SSD. Recently, I have not
> seen the kernel messages appear however--currently on linux 4.2.5.

Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an SSD. There was lots of clicking. Upgraded packages as they came in, but no change. Only the uninstall and re-install cleared the repeat button. :)

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119433
attachment-32271-0.html

I think this bug can be marked as closed with the latest linux/mesa/xorg
versions :)

On Fri, Nov 6, 2015 at 1:47 AM, <email address hidden> wrote:

> *Comment # 237 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c237>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> (In reply to Jeffrey E. Bedard from comment #236 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c236>)> Created attachment 119432 <https://bugs.freedesktop.org/attachment.cgi?id=119432> [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432&action=edit>
> > attachment-28908-0.html
> >
> > I reported this bug from a system without an SSD. Recently, I have not
> > seen the kernel messages appear however--currently on linux 4.2.5.
>
> Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an
> SSD. There was lots of clicking. Upgraded packages as they came in, but no
> change. Only the uninstall and re-install cleared the repeat button. :)
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92927 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

Created attachment 120189
error state with 4.2 kernel

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93331 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93482 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93493 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89524 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93595 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93876 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93824 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 94057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , sander eikelenboom (b-linux) wrote :

Tuesday, March 1, 2016, 9:43:23 PM, you wrote:

> Chris Wilson changed bug 54226
> WhatRemovedAddedCC <email address hidden>
>

> Comment # 249 on bug 54226 from Chris Wilson
> *** Bug 94057 has been marked as a duplicate of this bug. ***
>

> You are receiving this mail because:
> You are on the CC list for the bug.
>

Sorry to say, but:
Is there a way to get off the CC-list of this slightly depressing kind of "catch-all" bug ?
It unfortunately doesn't seem to have be going anywhere for the last 3 to 4 years accept
for an endless stream of duplicates being appended.

--
Sander

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

(In reply to Sander Eikelenboom from comment #250)
> Is there a way to get off the CC-list of this slightly depressing kind of
> "catch-all" bug ?

CC list is at the top right corner. Choose the address, tick "Remove selected CCs", and hit Save Changes.

I've done this for you now.

Revision history for this message
In , fjgaude (tanzen) wrote : Re: [Bug 1041790]

Please take me off too.

frank

On 03/02/2016 03:33 AM, Jani-nikula wrote:
> (In reply to Sander Eikelenboom from comment #250)
>> Is there a way to get off the CC-list of this slightly depressing kind of
>> "catch-all" bug ?
> CC list is at the top right corner. Choose the address, tick "Remove
> selected CCs", and hit Save Changes.
>
> I've done this for you now.
>

penalvch (penalvch)
no longer affects: sandybridge-meta (Ubuntu)
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 95238 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samantham (samantham) wrote :

Chris, I seem to be experiencing this bug in Linux 4.7rc3 on an x220 ThinkPad with Intel HD 3000 chipset. I was getting random full system freeze, non responsive over network.

The main messages before the crash were:
Jun 23 19:11:18 athena kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Jun 23 19:11:18 athena kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.7 [i915]] *ERROR* GT thread status wait timed out.

The original crash I haven't been able to reproduce easily but I CAN reproduce every time a full system lockup running the following intel-gpu-tools tests (I have not even close to run all the tests though) [**This may or may not be related to the original crash**]

gem_sync, subtest: bsd2-hang
drv_hangman, subtest: error-state-capture-bit

I do not know if these tests are helpful or related (maybe some are known to fail? not sure).
I have drm debugging turned on for when I ran those tests. (drm.debug=0x1e log_buf_len=1M)
I can post logs of the hangs associated with the two tests/subtests and run any other tests if you desire (with kernel drm debug on), I will wait for the issue to reappear with the drm debug on before posting that log though. By the number of similar bugs you may already have the CALL TRACE and non-debug level logs.

I know how to patch and am able to compile kernels to test. The bug effects me maybe once every 1 or 2 days. I use XOrg with Glamor. I have been seeing these crashes since 4.6 (maybe 4.5 or earlier not sure).

I know how to apply patches and am able to compile drm-next or any patches you have to see if this issue can be isolated. Thanks, sorry for the long response.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97451 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Yann-argotti (yann-argotti) wrote :

*** Bug 98294 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 98807 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 100245 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ricardo-vega-u (ricardo-vega-u) wrote :

Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

I doesn't seem to be getting mentioned Gnome crashes on my sandybridge anymore with mainline kernels, that is currently 4.11 and I think even with 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default centos 7 kernels I am definitely getting very frequent GPU crashes that brings down Gnome.

So it is either fixed for good, or it become much rarer. The issue I am/was experiencing happens when Gnome is running, it does not happen when only GDM is loaded. System load seems to not have effect on the bug triggering, seems to happen any time, on idle, or when machine is loaded.

Revision history for this message
In , Elizabethx-de-la-torre-mena (elizabethx-de-la-torre-mena) wrote :

(In reply to samuel.rakitnican from comment #260)
> I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> anymore with mainline kernels, that is currently 4.11 and I think even with
> 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> centos 7 kernels I am definitely getting very frequent GPU crashes that
> brings down Gnome.
>
> So it is either fixed for good, or it become much rarer. The issue I am/was
> experiencing happens when Gnome is running, it does not happen when only GDM
> is loaded. System load seems to not have effect on the bug triggering, seems
> to happen any time, on idle, or when machine is loaded.
Hopefully, is fixed for good. I'm closing this bug, if problem arise with latest kernel versions https://www.kernel.org/ please open a NEW bug with HW and SW information, steps to reproduce and relevant logs.Thank you.

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to Elizabeth from comment #261)
> (In reply to samuel.rakitnican from comment #260)
> > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> > anymore with mainline kernels, that is currently 4.11 and I think even with
> > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> > centos 7 kernels I am definitely getting very frequent GPU crashes that
> > brings down Gnome.
> >
> > So it is either fixed for good, or it become much rarer. The issue I am/was
> > experiencing happens when Gnome is running, it does not happen when only GDM
> > is loaded. System load seems to not have effect on the bug triggering, seems
> > to happen any time, on idle, or when machine is loaded.
> Hopefully, is fixed for good. I'm closing this bug, if problem arise with
> latest kernel versions https://www.kernel.org/ please open a NEW bug with HW
> and SW information, steps to reproduce and relevant logs.Thank you.

There was no fix for this HW issue.

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Aaron-lu-a (aaron-lu-a) wrote :

Created attachment 135173
gpu error file on 4.13.5-200.fc26.x86_64

This problem reappeared on 4.13.5-200.fc26.x86_64 last Friday.

[774249.632109] [drm] GPU HANG: ecode 6:0:0x85fffff8, in Xorg [696], reason: Hang on rcs0, action: reset
[774249.632110] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[774249.632111] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[774249.632111] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[774249.632111] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[774249.632112] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[774249.632172] drm/i915: Resetting chip after gpu hang

Revision history for this message
In , Chris Wilson (ickle) wrote :

commit 0da715ee60774401bea00dc71fca6fd1096c734a
Author: Chris Wilson <email address hidden>
Date: Mon Nov 20 20:55:02 2017 +0000

    drm/i915: Disable semaphores on Sandybridge

Changed in xserver-xorg-video-intel:
status: Confirmed → Won't Fix
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104243 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104772 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jani-saarinen-g (jani-saarinen-g) wrote :

I will close this now.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 106119 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.