[i915_bpo] Fix RC6 on SKL GT3 & GT4

Bug #1564759 reported by Timo Aaltonen
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
DRI
Fix Released
Critical
linux (Ubuntu)
Fix Released
High
Timo Aaltonen
Xenial
Fix Released
High
Timo Aaltonen

Bug Description

Runtime power management causes GPU hangs on SKL GT3 & GT4 ("Iris" & "Iris Pro"), so fix it by adding two patches to extend certain workarounds to cover all chip revisions.

Revision history for this message
In , Mikael-w (mikael-w) wrote :

Created attachment 121772
GPU crash dump

I have experienced GPU hangs with all kernels after 4.3.

I'm running a MS Surface Pro 4.

Feb 15 17:27:22 hat kernel: [ 478.912402] [drm] stuck on render ring
Feb 15 17:27:22 hat kernel: [ 478.913345] [drm] GPU HANG: ecode 9:0:0x85df9fff, in gnome-shell [1956], reason: Ring hung, action: reset
Feb 15 17:27:22 hat kernel: [ 478.913357] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 15 17:27:22 hat kernel: [ 478.913361] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 15 17:27:22 hat kernel: [ 478.913364] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 15 17:27:22 hat kernel: [ 478.913367] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 15 17:27:22 hat kernel: [ 478.913371] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 15 17:27:22 hat kernel: [ 478.915833] drm/i915: Resetting chip after gpu hang
Feb 15 17:27:24 hat kernel: [ 480.901312] [drm] RC6 on

Revision history for this message
In , Christophe Prigent (christophe-prigent-0) wrote :

Hi Mikael,
Which GPU is it: m3 Intel HD graphics 515 / i5 Intel HD graphics 520 / i7 Intel Iris graphics?
Which steps are causing the GPU hang?

Revision history for this message
In , Mikael-w (mikael-w) wrote :

It's Intel Iris (HD 540).

It's hard to say what exactly is causing it. Once it was caused by a "tail -f /var/log/syslog" scrolling text in Gnome Terminal. Another time it was caused by a web page being displayed in Firefox. A third time it was caused by switching workspace in Gnome Shell.

Revision history for this message
In , Mikael-w (mikael-w) wrote :

I should add that these hangs happen every other minute when things change on the screen.

Revision history for this message
In , Ivan Giuliani (giuliani.v) wrote :

I seem to be affected by this as well (same GPU, on an XPS 13" (2016)). Tried any kernel from 4.3 to 4.5 on Ubuntu, including the drm-intel-next kernel (4.5.0-997-generic).

A workaround is to add i915.enable_rc6=0 to the kernel boot parameters.

Revision history for this message
In , Mikael-w (mikael-w) wrote :

I have now tried this with the latest drm-intel kernel and the newest skl-dcm firmware (1.26). My libdrm is 2.4.67.

The problem still persists.

A deterministic way to provoke the hang is to run glmark2 (github.com/glmark2).

I can confirm that if I give i915.enable_rc6=0 as a kernel option, the problem disappears.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 94029 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

To attempt to distinguish another source of bugs, does intel_pstate=disable make any difference?

Revision history for this message
In , Mikael-w (mikael-w) wrote :

(In reply to Chris Wilson from comment #7)
> To attempt to distinguish another source of bugs, does intel_pstate=disable
> make any difference?

Replacing i915.enable_rc6=0 with intel_pstate=disable reintroduces the GPU crashes.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 94462 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

Next on the possible list of interactions, can we please test rc6 vs iommu? Leave rc6 as default (remove it from the command line) and add intel_iommu=igfx_off

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :

adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of 4.5-rc4 reintroduces hang problems.

Revision history for this message
In , Mikael-w (mikael-w) wrote :

Created attachment 122204
gpu-rc4-crash.log.gz

I tested this both with Linus rc4 and drm-intel-nightly from today (rc7).
In both cases I still experience a GPU hang with the single (apart from
noresume) kernel cmd line option intel_iommu=igfx_off.

For rc4, I saw a new error code, though:

Mar 10 14:27:46 hat kernel: [ 56.843611] [drm] GPU HANG: ecode
9:0:0x87f99ff9, in gnome-shell [1742], reason: Ring hung, action: reset

I attach the corresponding crash dump file.

This means that the only way, so far, to avoid hangs is i915.enable_rc6=0.
I have confirmed that this is also true for rc7 (drm-intel-nightly).

On Thu, Mar 10, 2016 at 2:13 PM, <email address hidden> wrote:

> *Comment # 11 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c11> on
> bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from
> <email address hidden> <email address hidden> *
>
> adding intel_iommu=igfx_off and removing rc6=0 frpm kernel boot parameters of
> 4.5-rc4 reintroduces hang problems.
>
> ------------------------------
> You are receiving this mail because:
>
> - You reported the bug.
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :

fyi tried 4.5.0-994-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-11-wily/

...and still see hangs without i915.enable_rc6=0

cheers,

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :

still present in daily build 14th of march found in http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-03-14-wily/

cheers

Revision history for this message
In , Lister-lists (lister-lists) wrote :

I have the XPS 13 with the Iris 540. I managed to get Arch working about a week ago. At the time, the core repo included 4.4.1 and had some problems. About the next day I think 4.4.3 hit and I managed to get a working system with that following the Arch wiki (mkinitcpio "... intel_agp i915 ...").
However, after upgrading to 4.4.5 I encounter problems. I don't know if they're hangs per say. Mostly I get blackscreens on boot. But either way, visually the only nicely working system I've got on Iris 540 is:
4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux

Revision history for this message
In , Lister-lists (lister-lists) wrote :

(In reply to lister.lists from comment #15)
> I have the XPS 13 with the Iris 540. I managed to get Arch working about a
> week ago. At the time, the core repo included 4.4.1 and had some problems.
> About the next day I think 4.4.3 hit and I managed to get a working system
> with that following the Arch wiki (mkinitcpio "... intel_agp i915 ...").
> However, after upgrading to 4.4.5 I encounter problems. I don't know if
> they're hangs per say. Mostly I get blackscreens on boot. But either way,
> visually the only nicely working system I've got on Iris 540 is:
> 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux

I realise my timeline is out, but the point remains...

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :
Revision history for this message
In , kang (dump-tzib) wrote :

*** Bug 94575 has been marked as a duplicate of this bug. ***

Revision history for this message
In , 0obert (0obert) wrote :

I can confirm on Dell XPS 13. i915.enable_rc6=0 works. I've tried i915.enable_rc6=1 to see it was a deep sleep problem, but it shows up with i915.enable_rc6=1 as well. I tried turning semaphores 0 and 1 and neither of those helped either. Another bug report mentioned commenting out a couple of lines in the kernel helped him, but it didn't help me on 4.5.0.

I'm running Debian Stretch with KDE and can reproduce very quickly by logging in, opening chrome, visit youtube and play a video and set it to full screen.

Good luck

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :
Revision history for this message
In , C-jess (c-jess) wrote :

I am experiencing this as well. Happens on the Dell XPS 13 (2016) w 6th Generation Intel Core i7-6560U (4M Cache, up to 3.2 GHz), Intel® Iris™ Graphics 540.

Kernel 4.4.6, also experienced on 4.4.2.

Let me know what you need to help. Its super easy to trigger by just playing a video or even just using chrome for more than 5 min.

Can confirm i915.enable_rc6=0 fixes, there still get hiccups fi watching a video etc, but no full crashes. Kinda hate doing that to my battery life though.

Revision history for this message
In , Timo Aaltonen (tjaalton) wrote :

*** Bug 94768 has been marked as a duplicate of this bug. ***

Timo Aaltonen (tjaalton)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Timo Aaltonen (tjaalton)
importance: Undecided → High
status: New → Triaged
Changed in dri:
importance: Unknown → Critical
status: Unknown → Confirmed
Revision history for this message
In , Di-michael (di-michael) wrote :

I can confirm this issue on a Intel NUC6i5SYH, Iris Graphics 540 with kernel 4.4.6-300.fc23.x86_64 (exact same symptoms, logs ...). I bet a whole lot of people must be affected ...
Is this driver supported by Intel themselves or the community?

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

Created attachment 122661
drm/i915/skl: Use WaForceContextSaveRestoreNonCoherent for all revs

Revision history for this message
In , Timo Aaltonen (tjaalton) wrote :

#24 didn't fix it for me

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :

Negative on #24.

Revision history for this message
In , Timo Aaltonen (tjaalton) wrote :

I heard this might be due to old bios, which my system certainly has.. so verify you have the latest from the vendor (mine is from intel, and no updates available for test hw, so..)

Revision history for this message
In , Mika-kuoppala (mika-kuoppala) wrote :

Created attachment 122664
drm/i915/skl: Use WaRsDisableCoarsePowerGating for all revs

Revision history for this message
In , Mikesart (mikesart) wrote :

(In reply to Timo Aaltonen from comment #27)
> I heard this might be due to old bios, which my system certainly has.. so
> verify you have the latest from the vendor

I've got a Skylake Dell XPS 13 9350 with the very latest bios from a couple days ago (1.3.3), and the bug still happens on this machine if I remove rc6=0 from my boot line.

Revision history for this message
In , kang (dump-tzib) wrote :

I'm running patch from comment 28 over the mainline kernel (4.6rc1)
No freeze/crash so far even when i stress test it.

Thanks Mika!

Revision history for this message
In , Timo Aaltonen (tjaalton) wrote :

#28 plus #5 from 93491 seem to have fixed glmark2 here, could be that #28 alone would be enough but doesn't hurt to test with both..

Revision history for this message
In , miiiiitico (miticotoby) wrote :

Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me too. has been stable for a few hours now without disabling rc6

Changed in dri:
status: Confirmed → Incomplete
Timo Aaltonen (tjaalton)
summary: - [i915_bpo] Disable RC6 on SKL GT3 & GT4
+ [i915_bpo] Fix RC6 on SKL GT3 & GT4
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
In , kang (dump-tzib) wrote :

Note that I notice sluggishness (my 3y old intel 2D graphics - and CPU rendered graphics on this computer are faster) and display freezes with the fix and DRM enalbed, though this might need a separate bug (not sure if its related or just another bug)

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.9 KiB)

This bug was fixed in the package linux - 4.4.0-18.34

---------------
linux (4.4.0-18.34) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1566868

  * [i915_bpo] Fix RC6 on SKL GT3 & GT4 (LP: #1564759)
    - SAUCE: i915_bpo: drm/i915/skl: Fix rc6 based gpu/system hang
    - SAUCE: i915_bpo: drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs

  * CONFIG_ARCH_ROCKCHIP not enabled in armhf generic kernel (LP: #1566283)
    - [Config] CONFIG_ARCH_ROCKCHIP=y

  * [Feature] Memory Bandwidth Monitoring (LP: #1397880)
    - perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
    - perf/x86/cqm: Fix CQM memory leak and notifier leak
    - x86/cpufeature: Carve out X86_FEATURE_*
    - Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    - x86/topology: Create logical package id
    - perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
    - perf/x86/mbm: Add memory bandwidth monitoring event management
    - perf/x86/mbm: Implement RMID recycling
    - perf/x86/mbm: Add support for MBM counter overflow handling

  * User namespace mount updates (LP: #1566505)
    - SAUCE: quota: Require that qids passed to dqget() be valid and map into s_user_ns
    - SAUCE: fs: Allow superblock owner to change ownership of inodes with unmappable ids
    - SAUCE: fuse: Don't initialize user_id or group_id in mount options
    - SAUCE: cgroup: Use a new super block when mounting in a cgroup namespace
    - SAUCE: fs: fix a posible leak of allocated superblock

  * [arm64] kernel BUG at /build/linux-StrpB2/linux-4.4.0/fs/ext4/inode.c:2394!
    (LP: #1566518)
    - arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
    - arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission

  * [Feature]USB core and xHCI tasks for USB 3.1 SuperSpeedPlus (SSP) support
    for Alpine Ridge on SKL (LP: #1519623)
    - usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devices
    - usb: set USB 3.1 roothub device speed to USB_SPEED_SUPER_PLUS
    - usb: show speed "10000" in sysfs for USB 3.1 SuperSpeedPlus devices
    - usb: add device descriptor for usb 3.1 root hub
    - usb: Support USB 3.1 extended port status request
    - xhci: Make sure xhci handles USB_SPEED_SUPER_PLUS devices.
    - xhci: set roothub speed to USB_SPEED_SUPER_PLUS for USB3.1 capable controllers
    - xhci: USB 3.1 add default Speed Attributes to SuperSpeedPlus device capability
    - xhci: set slot context speed field to SuperSpeedPlus for USB 3.1 SSP devices
    - usb: Add USB3.1 SuperSpeedPlus Isoc Endpoint Companion descriptor
    - usb: Parse the new USB 3.1 SuperSpeedPlus Isoc endpoint companion descriptor
    - usb: Add USB 3.1 Precision time measurement capability descriptor support
    - xhci: refactor and cleanup endpoint initialization.
    - xhci: Add SuperSpeedPlus high bandwidth isoc support to xhci endpoints
    - xhci: cleanup isoc tranfers queuing code
    - xhci: Support extended burst isoc TRB structure used by xhci 1.1 for USB 3.1
    - SAUCE: (noup) usb: fix regression in SuperSpeed endpoint descriptor parsing

  * wrong/missing permissions for device f...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
In , jason bishop (jason-bishop) wrote :

(In reply to miticotoby from comment #32)
> Tested as Timo using #28 plus #5 from 93491. seems to fix the issue for me
> too. has been stable for a few hours now without disabling rc6

seconding miticotoby. applied #28 plus #5 from 93491 working fine for few days now. (without disabling rc6)

Revision history for this message
In , gerard (gerar-f87) wrote :

Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28) and everything is working fine, no gpu hang at the moment (4 days testing).
Why this patch is not merged? Maybe because needs more testing?

Thanks Mika.

Revision history for this message
In , Oddrunesl (oddrunesl) wrote :

Created attachment 122856
attachment-11432-0.html

As far as I understand, the patch disables power gating which is a very bad
thing in terms of power usage so this is not a fix, just a temp workaround.
On 10 Apr 2016 14:32, <email address hidden> wrote:

> *Comment # 35 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c35> on
> bug 94161 <https://bugs.freedesktop.org/show_bug.cgi?id=94161> from Gerard
> Farré <email address hidden> *
>
> Compiled kernel 4.6 rc2 drm-intel-nightly with the Mika patch (comment #28 <https://bugs.freedesktop.org/show_bug.cgi?id=94161#c28>) and
> everything is working fine, no gpu hang at the moment (4 days testing).
> Why this patch is not merged? Maybe because needs more testing?
>
> Thanks Mika.
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Myemailu (myemailu) wrote :

I can confirm that the patch in comment 28 (Use WaRsDisableCoarsePowerGating)
solved the issue on my Intel NUC6i5 with only a moderate increase in power consumption.

With an idle desktop using kernel 4.6.0-rc3, the system consumes:

7 Watts without patch, RC6 enabled, frequent crashes
17 Watts with i915.enable_rc6=0, no crashes
9 Watts with patch, no crashes

Revision history for this message
In , C-jess (c-jess) wrote :

I just used the patch on 4.6-rc4 from upstream source and it works for me!

Revision history for this message
In , C-jess (c-jess) wrote :

Nevermind, I compiled a binary and my whole computer froze

Revision history for this message
In , Kimmo Nikkanen (knikkane) wrote :

This fixed by,

commit d528a6a0f3fd346bd7cc2de611a4149b6ebaab41
Author: Mika Kuoppala <email address hidden>
Date: Tue Apr 5 15:56:16 2016 +0300

drm/i915/skl: Fix rc6 based gpu/system hang

Changed in dri:
status: Incomplete → Fix Released
Revision history for this message
melenzb (woodhouser) wrote :

Whereas this seems fixed in current 4.4 kernels as supplied by 16.04, it has returned in 4.8 kernels supplied with Yakkety Yak 16.10.

It is hardly possible to keep a system running for more than half an hour in yakkety, without sudden reboot. Especially video playback triggers the problem

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.