10de:0422 bringing up dash causes screen corruption on nouveau

Bug #1158689 reported by Amit Kucheria
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned

Bug Description

Hitting the "Windows" key to bring up the dash corrupts the screen from where everything goes downhill. See attached screenshot for example after a few window launches, dash enable/disable, etc.

WORKAROUND: Logging in with gnome-shell doesn't show these artifacts.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-13-generic 3.8.0-13.23
ProcVersionSignature: Ubuntu 3.8.0-13.23-generic 3.8.3
Uname: Linux 3.8.0-13-generic x86_64
ApportVersion: 2.9.2-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: amit 2192 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
CurrentDmesg:
 [ 28.341465] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 32.135937] init: plymouth-stop pre-start process (1666) terminated with status 1
Date: Fri Mar 22 15:06:56 2013
HibernationDevice: RESUME=UUID=fb5ae586-aed6-4e8b-8884-21fef7bf242d
InstallationDate: Installed on 2013-02-01 (48 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64+mac (20130130)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Intel To be filled by O.E.M.
MarkForUpload: True
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-13-generic root=UUID=0084a9a5-74cd-47ba-afe6-a47d02d0b262 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-13-generic N/A
 linux-backports-modules-3.8.0-13-generic N/A
 linux-firmware 1.104
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/29/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 4.6.3
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: To be filled by O.E.M.
dmi.board.vendor: Intel
dmi.board.version: To be filled by O.E.M.
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4.6.3:bd10/29/2009:svnIntel:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnIntel:rnTobefilledbyO.E.M.:rvrTobefilledbyO.E.M.:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: Intel

CVE References

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

Created attachment 71610
stack trace

I have a NVIDIA GeForce 8400M G graphics card. I've been using nouveau drive for a long time without any kind of problems. After upgrade the kernel to 3.7.0 version I have a lot of issues. After login in to the system and after having spent some time using the system the graphics are corrupted. The graphics show up with mixed colors.

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

Created attachment 71612
Screenshot

Screenshot showing the problem.

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

Today messages from dmesg:

[ 4115.879007] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 07ff00 warp 0, opcode 00000000 00000000
[ 4115.879007] nouveau [ PGRAPH][0000:01:00.0] TRAP
[ 4115.879007] nouveau E[ PGRAPH][0000:01:00.0] ch 5 [0x00077db000] subc 3 class 0x8297 mthd 0x1694 data 0x00010031

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

Created attachment 71674
my graphics are a mess.

my graphics are a mess.

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

more dmesg messages:

[ 1123.476832] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 07ff00 warp 0, opcode ffffffff ffffffff
[ 1123.476839] nouveau [ PGRAPH][0000:01:00.0] TRAP
[ 1123.476844] nouveau E[ PGRAPH][0000:01:00.0] ch 6 [0x000765e000] subc 3 class 0x8297 mthd 0x1694 data 0x00010031

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

Created attachment 71698
Another screenshot

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

# lspci -nnvv

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86 [GeForce 8400M G] [10de:0428] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: Micro-Star International Co., Ltd. Device [1462:3fe9]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 32 bytes
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
 Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
 Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
 Region 5: I/O ports at cc00 [size=128]
 Expansion ROM at fe0e0000 [disabled] [size=128K]
 Capabilities: [60] Power Management version 2
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
  Address: 0000000000000000 Data: 0000
 Capabilities: [78] Express (v1) Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
   ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
   MaxPayload 128 bytes, MaxReadReq 512 bytes
  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
   ClockPM- Surprise- LLActRep- BwNot-
  LnkCtl: ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
 Kernel driver in use: nouveau

Revision history for this message
In , Henrique-ribeiro-dias (henrique-ribeiro-dias) wrote :

The problem persist with 3.7.1 kernel.

Revision history for this message
In , Nemasu (nemasu) wrote :

I am having the same problems post kernel version 3.7.0 with a GeForce 8800 GTS. Even glxgears will lock up.

I get a ton of these messages:
[ 83.399004] nouveau [ PFIFO][0000:01:00.0] CACHE_ERROR - Ch 2/3 Mthd 0x108c Data 0x2036652f

with the occasional:
[ 83.418650] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811
[ 83.418659] nouveau [ PGRAPH][0000:01:00.0] TRAP
[ 83.418663] nouveau E[ PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000
[ 83.418672] nouveau E[ PFB][0000:01:00.0] trapped read at 0x0000000000 on channel 0x00027948 PFIFO/PFIFO_READ/SEMAPHORE reason: DMAOBJ_LIMIT
[ 83.431368] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811
[ 83.431376] nouveau [ PGRAPH][0000:01:00.0] TRAP
[ 83.431379] nouveau E[ PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000

Revision history for this message
In , Blackberryqueen (blackberryqueen) wrote :

Same here with nVidia GeForce 8400M G videocard in an Acer Aspire 7520 G laptop running Ubuntu 12.10 64bit AMD64. My first impression was a heat problem due to dust. So i cleaned the laptop fan and refitted the heatsink and heatpipes with new thermal (silver) contact paste, but the video-error reoccurs. When only two webpages are opened: no problem. Starting a Youtube video: screen is a mass, like Henrique Dias reported.

Is there a relation to the reported failure of nVidia GeForce 8 series?? http://news.cnet.com/8301-13924_3-10037632-64.html

Carolien.

25 comments hidden view all 111 comments
Revision history for this message
Amit Kucheria (amitk) wrote :
Revision history for this message
Amit Kucheria (amitk) wrote :

WifiSyslog.txt attached above seems to show the nouvea errors from my last crash.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Amit Kucheria (amitk) wrote : Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

https://bugs.freedesktop.org/show_bug.cgi?id=62035 seems to be related and might point to potential fixes

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.9 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc3-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Amit Kucheria (amitk) wrote : Re: [Bug 1158689] Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

On 13 Mar 22, Joseph Salisbury wrote:
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v3.9 kernel[0] (Not a kernel in the daily directory) and install both
> the linux-image and linux-image-extra .deb packages.
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
> Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc3-raring/

I tried -rc4. I still see a lot of artifacts when I hit the "win" key to
display the dash. And it definitely crashed atleast once with the display
becoming black.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

This may be fixed in v3.9-rc4 with the following commit:

 cf9a625 - Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Can you the rc4 kernel to see if it resolves this bug:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc4-raring/

20 comments hidden view all 111 comments
Revision history for this message
In , Pauloedgarcastro (pauloedgarcastro) wrote :

Hi.

I have exactly the same issue.
I seem to be able to trigger it faster by opening firefox on a page with many images.

Current Kernel: 3.8.3-103.fc17.x86_64
Other kernels affected:

kernel-3.7.9-104.fc17.x86_64
kernel-3.7.9-101.fc17.x86_64

01:00.0 VGA compatible controller: nVidia Corporation G86 [GeForce 8300 GS] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: nVidia Corporation Device 0494
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at df00 [size=128]
        [virtual] Expansion ROM at fb000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000 Data: 0000
        Capabilities: [78] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM unknown, Latency L0 <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Virtual Channel
                Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb: Fixed- WRR32- WRR64- WRR128-
                Ctrl: ArbSelect=Fixed
                Status: InProgress-
                VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau

19 comments hidden view all 111 comments
Revision history for this message
Amit Kucheria (amitk) wrote : Re: [Bug 1158689] Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

On 13 Mar 25, Joseph Salisbury wrote:
> This may be fixed in v3.9-rc4 with the following commit:
>
> cf9a625 - Merge branch 'drm-nouveau-fixes-3.9' of
> git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
>
> Can you the rc4 kernel to see if it resolves this bug:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc4-raring/

I've already tested with -rc4 above. It still shows problems.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
Amit Kucheria (amitk) wrote : Re: [Bug 1158689] Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

On 13 Mar 29, Joseph Salisbury wrote:
> This issue appears to be an upstream bug, since you tested the latest
> upstream kernel. Would it be possible for you to open an upstream bug
> report[0]? That will allow the upstream Developers to examine the
> issue, and may provide a quicker resolution to the bug.
>
> Please follow the instructions on the wiki page[0]. The first step is
> to email the appropriate mailing list. If no response is received, then
> a bug may be opened on bugzilla.kernel.org.
>
> [0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

I've tested 3.9-rc6 build now. It is slightly more stable but still corrupts the
display on launching the dash.

Nouveau upstream expects me to run the latest code in their git (including
DRM and other pieces of freedesktop.org). I'm afraid I don't have the time to
mess with that. Running the latest kernel isn't enough to file a bug
according to their wiki.

If there is a PPA containing the latest nouveau stack, I might try that.

18 comments hidden view all 111 comments
Revision history for this message
In , Pauloedgarcastro (pauloedgarcastro) wrote :

After further investigation, this issue only seems to happen to applications using the gtk libs.
In my case at least ...

After triggering the bug, any app which is using the GTK libs will be affected.
It does not seem to affect other app's ( not using gtk ) rendering process.

Also, the same issue doesn't happen whilst using the NVIDIA drivers, which are just impossible to use as in my case the system is just unusable slow.

17 comments hidden view all 111 comments
Revision history for this message
Amit Kucheria (amitk) wrote :

Retried again with 3.9 and there is no change. Nouveau or Nvidia are busted
for this card in this release. I've attached a screenshot.

Revision history for this message
Dariusz Duma (dhor) wrote : Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

I can confirm this weird bug. Nvidia 8400M G can't work in the latest 13.04 (Ubuntu/Xubuntu), no matter what driver I install. Nouveau with stock kernel 3.8.x after few minutes of work drives to rubbish as shown up. Nouveau with kernel 3.9.x - the same result.

Stock kernel 3.8.x or 3.9.x and almost any NVIDIA driver (304.x, 310.x, 313.x) also won't work. System hangs up when specific OpenGL (???) is used - for example: glxinfo in terminal locks system, launching Firefox (fullscreen) locks system, and so on.

The only error that Nouveau reports:

kernel: nouveau E[ PFB][0000:01:00.0] trapped read at 0x0020361400 on channel 0x0000fb3a PGRAPH/TEXTURE/00 reason: PAGE_NOT_PRESENT

Revision history for this message
Amit Kucheria (amitk) wrote : Re: [Bug 1158689] Re: nouveau and nvidia binary drivers are broken for GeForce 8400 GS

I've been investigating this further by running the xorg-edgers PPA. No luck.
It is still broken.

I then downloaded the Fedora19 Alpha Live image and tried it. It works just
fine! No artifacts visible.

I checked that the nouveau driver version in xorg-edgers PPA and Fedora19
Alpha are the same (1.0.7). My suspicion is that this problem is related to
unity depending on some bits of OpenGL that gnome-shell doesn't. To verify
that I'll need to install a gnome-shell desktop on my machine.

Revision history for this message
Amit Kucheria (amitk) wrote :

On 13 May 09, Amit Kucheria wrote:
> I checked that the nouveau driver version in xorg-edgers PPA and Fedora19
> Alpha are the same (1.0.7). My suspicion is that this problem is related to
> unity depending on some bits of OpenGL that gnome-shell doesn't. To verify
> that I'll need to install a gnome-shell desktop on my machine.

Confirmed. I don't see this corruption with gnome-shell. So I'm starting to
think this is related to unity as opposed to the kernel + X.

Will update the description to reflect this.

summary: - nouveau and nvidia binary drivers are broken for GeForce 8400 GS
+ bringing up dash causes screen corruption on nouveau/nvidia
Amit Kucheria (amitk)
description: updated
summary: - bringing up dash causes screen corruption on nouveau/nvidia
+ bringing up dash causes screen corruption on nouveau
Revision history for this message
Amit Kucheria (amitk) wrote : Re: bringing up dash causes screen corruption on nouveau

Updated original report to remove reference to xorg-edgers PPA. Even with stock raring, gnome-shell works just fine (been using it for over a day now) while unity is unusable.

description: updated
Changed in unity (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unity (Ubuntu):
status: New → Confirmed
13 comments hidden view all 111 comments
Revision history for this message
In , Davebjorkl (davebjorkl) wrote :

Hello!

New to ubuntu. I have an old acer 5520g with the exact same problem you are describing in the comments above. I also tought it was a heat problem and found alot of dust in the graphics cards fan. My computer completely locks down and I am unable to even login or open a terminal at the loginscreen after the first glitch.

Dave

Revision history for this message
In , Torsten-stocklossa-g (torsten-stocklossa-g) wrote :

Hi, same here after updating to Ubuntu 12.04.3

Kernel 3.8.0-33-generic

lspci -nnvv says:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [GeForce 8400M G] [10de:0428] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: Fujitsu Limited. Device [10cf:1422]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
 Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
 Region 3: Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
 Region 5: I/O ports at 2000 [size=128]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: nouveau
 Kernel modules: nouveau, nvidiafb

Graphic is distorted once it happens the system is frozen ( with some luck I may reach a terminal )

Before it happens the fontcolor in Windowframes changes to "white on white " e.g. same as the background color
I run a E8410 Lifebook

BTW : Using the Nvidia proprietary drivers is not an option they made the system unusable at all and forced me to reinstall several times

Revision history for this message
In , Torsten-stocklossa-g (torsten-stocklossa-g) wrote :

HI again, in addition some error messages

Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.215782] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304180] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304188] nouveau E[ PGRAPH][0000:01:00.0] TRAP
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304193] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304477] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304483] nouveau E[ PGRAPH][0000:01:00.0] TRAP
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304487] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000

and

Nov 29 11:26:04 torsten-LIFEBOOK-E8410 kernel: [ 323.106306] nouveau E[ DRM] GPU lockup - switching to software fbcon
Nov 29 11:27:07 torsten-LIFEBOOK-E8410 kernel: [ 386.736037] nouveau E[ 3431] failed to idle channel 0xcccc0001
Nov 29 11:27:09 torsten-LIFEBOOK-E8410 kernel: [ 388.735098] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:27:12 torsten-LIFEBOOK-E8410 kernel: [ 391.732025] nouveau E[ 3431] failed to idle channel 0xcccc0000
Nov 29 11:27:14 torsten-LIFEBOOK-E8410 kernel: [ 393.731221] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout
Nov 29 11:28:09 torsten-LIFEBOOK-E8410 kernel: [ 448.580025] nouveau E[ 4056] failed to idle channel 0xcccc0001
Nov 29 11:28:11 torsten-LIFEBOOK-E8410 kernel: [ 450.579162] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:28:14 torsten-LIFEBOOK-E8410 kernel: [ 453.576022] nouveau E[ 4056] failed to idle channel 0xcccc0000
Nov 29 11:28:16 torsten-LIFEBOOK-E8410 kernel: [ 455.575198] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout
Nov 29 11:29:17 torsten-LIFEBOOK-E8410 kernel: [ 516.552036] nouveau E[ 4211] failed to idle channel 0xcccc0001
Nov 29 11:29:19 torsten-LIFEBOOK-E8410 kernel: [ 518.553893] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:29:22 torsten-LIFEBOOK-E8410 kernel: [ 521.556024] nouveau E[ 4211] failed to idle channel 0xcccc0000
Nov 29 11:29:24 torsten-LIFEBOOK-E8410 kernel: [ 523.555077] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout

For both the session is Gnome. Now when running on Gnome (no effects ) ist is slighly more stable.

As mentioned I also tried NVIDIA drivers .... with the effect that the system was unusable at all.

Since the issue seems to be quite old . . . there should be an appropriate solution by now !

cheers
TS

penalvch (penalvch)
description: updated
summary: - bringing up dash causes screen corruption on nouveau
+ 10de:0422 bringing up dash causes screen corruption on nouveau
no longer affects: unity (Ubuntu)
14 comments hidden view all 111 comments
Revision history for this message
penalvch (penalvch) wrote :

Amit Kucheria, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please just make a comment to this.

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
15 comments hidden view all 111 comments
Revision history for this message
In , Torsten-stocklossa-g (torsten-stocklossa-g) wrote :

HI,
I wonder if this is still alive ?? Any news on this

cheers
T

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Messing with priority just annoys the developers.

In the meanwhile, try new kernels. I only see up to 3.8 tested. Do a bisect. There was a major driver rewrite in 3.7, but it might have been something else that causes the issue. Make sure you're running an updated DDX.

As you might imagine, none of the devs are seeing this, so you'll have to do the debugging if you want it fixed.

Revision history for this message
In , Awl1 (awl1) wrote :

Created attachment 90715
Distorted graphics with RHEL6/OL6 showing uname -a kernel 3.12.4

Revision history for this message
In , Awl1 (awl1) wrote :

Created attachment 90717
Distorted graphics: Icons (on kernel 3.12.4)

Revision history for this message
In , Awl1 (awl1) wrote :

Hello,

I would like to join discussions in this bug, as I have found myself affected after the recent update from Red Hat Enterprise Linux/Oracle Linux 6.4 (stock RHEL kernel 2.6.32-358.23.2) to RHEL/OL 6.5 (RHEL kernel 2.6.32-431).

My graphics card is NVidia Quadro NVS 130M:
BOOT0 : 0x086a00a2
Chipset: G86 (NV86)
Family : NV50

It seems that RHEL 6.5 kernel 2.6.32-431 has updated its kernel modules for nouveau DRM to a codebase level that matches official Linux kernels 3.7, and therefore introduced this severe graphics distortion issue into mainline RHEL 6.

In order to verify that it indeed is the nouveau DRM kernel module resonsible for the distortion, I have upgraded my OL6 packages to the following versions:

* mesa-9.2.0.5 (including support for nouveau, which is commented out by default in RHEL6)
* libdrm-2.4.50
* xorg-x11-drv-nouveau-1.0.9

but this does NOT affect the issue at all.

But reverting back to RHEL stock kernel 2.6.32-358.23.2 makes the issue vanish, also when using the above updated library versions.

I then tried Oracle's UEK kernels, and while the current UEK2 kernel (2.6.39-400.211.2) does NOT have the issue, the current UEK3 kernel (3.8.13-16.2.2) also shows it.

I then tried to find out about the exact "versions" (git commit levels?) of the nouveau libdrm modules, and found out the following:

(1) Oracle UEK2 kernel 2.6.39-400.211.2 - NO ISSUE:
[drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0 on minor 0

(2) RHEL stock kernel 2.6.32-358.23.2 - NO ISSUE:
[drm] Initialized nouveau 1.0.0 20120316 for 0000:01:00.0 on minor 0

(3) RHEL stock kernel 2.6.32-431 - DOES SHOW THE ISSUE:
[drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0

(4) more recent kernels, such as Oracle UEK3 (3.8.13-16.2.2) and the most recent Oracle "playground" kernel from public-yum.oracle.com (3.12.4-3.12.y.20131210) all DO SHOW THE ISSUE:
[drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0

So to me it now seems as if the issue has been introduced with the massive changes to nouveau/DRM that went into 3.7:

http://www.phoronix.com/scan.php?page=news_item&px=MTE1NDg

and affects ALL subsequent versions since then... :-(

I would be very interested and willing to help in debugging/tracking this down, but I don't have any git background, so you would have to guide me through how to do the "bisect"...

Hope this helps & looking forward to your feedback! :-)

Best regards,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

Had been missing my "lspci -nnvv" information:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [Quadro NVS 130M] [10de:042a] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: Toshiba America Info Systems Device [1179:0002]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 32 bytes
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
 Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
 Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
 Region 5: I/O ports at cf00 [size=128]
 [virtual] Expansion ROM at fc000000 [disabled] [size=128K]
 Capabilities: [60] Power Management version 2
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
  Address: 0000000000000000 Data: 0000
 Capabilities: [78] Express (v1) Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
   ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
   MaxPayload 128 bytes, MaxReadReq 512 bytes
  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
   ClockPM- Surprise- LLActRep- BwNot-
  LnkCtl: ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+
   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
 Capabilities: [100 v1] Virtual Channel
  Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
  Arb: Fixed- WRR32- WRR64- WRR128-
  Ctrl: ArbSelect=Fixed
  Status: InProgress-
  VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
   Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
   Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
   Status: NegoPending- InProgress-
 Capabilities: [128 v1] Power Budgeting <?>
 Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Kernel driver in use: nouveau
 Kernel modules: nouveau, nvidiafb

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(a) Can we see a full boot log (e.g. output of dmesg) with a recent kernel? Ideally it would include the time that the visual issues happen.

(b) This looks like it could be a fencing issue, i.e. we try to draw to a texture, but then instead of waiting, we don't wait. There were some fixes that went into 3.13-rc1, so perhaps trying the latest and greatest (e.g. 3.13-rc3, or the latest Linus HEAD) would be good to test out.

(c) There are many bisection guides on the internet. You will also need to figure out how to make the compiled kernel play nice with your distribution. The basics are simple though:

1. git bisect start v3.7 v3.6 -- drivers/gpu/drm/nouveau
2. build/install/boot/test
3. if it's good, "git bisect good", if it's bad, "git bisect bad"
4. goto 2

At some point running the step 3 command will tell you "first bad commit is xyz". That's when you're done. I suspect it might be the giant mega "rewrite nouveau" commit, in which case we're screwed and this will have been a huge time-waster (apologies in advance if it turns out this way). But it might be one of the many other commits that went into 3.7, which would be nice and indicate an area to focus on.

Revision history for this message
In , Awl1 (awl1) wrote :

Hello Ilia,

regarding (a) and (b): I am just waiting for a rpmbuild of an OL6 version of 3.13-rc3 to finish and will report back on my findings and include a dmesg output from that version.

Regarding (c):

Would'nt it make more sense than starting with 3.6 release and 3.7 release tags to first rule out the "mega commit"?

Can you give me the git commands (or point me to a doc that tells me how to produce them) for getting "ordinary kernel tarballs" out of the DRM nouveau git just like the ones published on

https://www.kernel.org/pub/linux/kernel/v3.0/testing/

for two points in time in between 3.6 and 3.7:

(1) for the version up to the immediate commit BEFORE the "mega commit"
(2) for the version exactly matching the "mega commit"?

Using these two kernel tarballs, I could then either confirm or rule out the "mega commit" as the root cause for the issue, and in the (unlikely) case the mega commit can indeed be ruled out, I could then concentrate on further narrowing down the commits

* either between 3.6 and the mega commit if build (1) is already broken
* or between the mega commit and 3.7 if build (2) still works, but 3.7 fails?

Sorry, but rather than pulling the whole git on my poor old laptop and starting a huge number of bisection attemps "into the blue", I think that this makes more sense and does not require me to become a git expert in order to try and help tracking this down... ;-)

What do you think?

I will report back shortly with my 3.13-rc3 results...

BR,
Andreas

31 comments hidden view all 111 comments
Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #52)
> Hello again, Ilia,
>
> > Can you grab envytools (https://github.com/envytools/envytools) and run
>
> bad news (or maybe expected from what we have been seeing earlier):
>
> [aloew@aloew-lap envytools-master]$ ./nva/nvapeek 10200c
> WARN: Can't probe 0000:01:00.0
> PCI init failure!
>
> [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> WARN: Can't probe 0000:01:00.0
> PCI init failure!

You need to run these as root.

> > I'd also still be interested in knowing whether a previously-known-good
> > version of the blob still works.
>
> I am 99.9% certain it does, as my Windows install with NVidia 285.09 driver
> also still runs fine, while any more recent Windows driver from NVidia hangs
> with the same symptoms as their Linux "blob" - I had just checked this last
> week with their latest Windows version 331.82, once again without any luck.

Ah OK, that's probably good enough of a test.

> Maybe indeed you could ask your new friends/contacts at NVidia about this?

I just bugged them about video decoding stuff a few weeks ago, don't want to use up all of my brownie points :)

Revision history for this message
In , Awl1 (awl1) wrote :

> > [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> > WARN: Can't probe 0000:01:00.0
> > PCI init failure!

> You need to run these as root.

Ouch - sorry - could have indeed had this idea myself... :-(

Here are the results as root:

[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapoke 10200c 10
0010200c: ERR S
[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
[aloew@aloew-lap envytools-master]$

And no new messages in "dmesg" output at all. Still not enlightening... :-(

BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

> > I am 99.9% certain it does, as my Windows install with NVidia 285.09 driver
> > also still runs fine, while any more recent Windows driver from NVidia hangs
> > with the same symptoms as their Linux "blob" - I had just checked this last
> > week with their latest Windows version 331.82, once again without any luck.

> Ah OK, that's probably good enough of a test.

So I don't need to do this any more? That would be great, because I am pretty certain that it won't give any new results other than the Linux 285.09 driver still works fine.

My card definitely has no new hardware defect. In case it might indeed be defective in some sense, then it has been from the very beginning...

BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #55)
> > > [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> > > WARN: Can't probe 0000:01:00.0
> > > PCI init failure!
>
> > You need to run these as root.
>
> Ouch - sorry - could have indeed had this idea myself... :-(
>
> Here are the results as root:
>
> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS

nvapeek 10200c without the 10. (Not sure what that does.... maybe reads out 0x10 regs)

> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapoke 10200c 10
> 0010200c: ERR S

Oh well. Some sort of error.

> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
> [aloew@aloew-lap envytools-master]$
>
> And no new messages in "dmesg" output at all. Still not enlightening... :-(

Well, no one's heard of a "missing" PCRYPT before, but it's certainly conceivable that certain blocks were omitted. I'd feel better with that diagnosis if more people chimed in saying that they had the same issue.

Revision history for this message
In , Awl1 (awl1) wrote :

> > [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> > 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
>
> nvapeek 10200c without the 10. (Not sure what that does.... maybe reads out
> 0x10 regs)

yes - seems to read 10 registers:

[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c
0010200c: SS

> Well, no one's heard of a "missing" PCRYPT before, but it's certainly
> conceivable that certain blocks were omitted. I'd feel better with that
> diagnosis if more people chimed in saying that they had the same issue.

From the screenshots of the corrupted graphics, I definitely think that this is the exact same issue.

But I fully agree that it is a pity that nobody of the folks who had raised this previously and/or commented, do react now that it has probably been tracked down to its root cause.

And something else seems interesting:

All other people who saw the corruption issue (except nemasu with his/her 8800 GTS, who might have seen a different issue indeed from the dmesg output) also were using early G86 chips, particularly 8400M-based, and mostly the "mobile" variants...

Maybe NVidia omitted part of the 8400 functionality in the mobile variants? This would again make up a nice (and easy) question to them...!? ;-)

Thanks again & BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

There was a bug in nvapeek/poke (it was using the wrong address space by default), can you update your pull and try again? [That explains why you saw 'S' in the output.]

Revision history for this message
In , Awl1 (awl1) wrote :

> There was a bug in nvapeek/poke (it was using the wrong address space by
> default), can you update your pull and try again? [That explains why you saw
> 'S' in the output.]

Of course - here you are:

The version I used is from the "Download ZIP" button in GitHub:
https://github.com/envytools/envytools/archive/master.zip

[aloew@aloew-lap nva]$ sudo ./nvapeek 10200c
...
[aloew@aloew-lap nva]$ sudo ./nvapoke 10200c 10
[aloew@aloew-lap nva]$ sudo ./nvapeek 10200c
...

Maybe now another bug, as we don't seem to get any hex address and/or value output?

Please advise if we need to pass any additional parameters to get hex ouput...

BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

Hmm... Looking at the code for nvapeek, I fear that nva_rd(...) still did not return any meaningful data, as it look like we get s == 0...!?

                int s = 0;
                for (i = j = 0; i < 16 && i < b; i+=rs.regsz, j++) {
                        e[j] = nva_rd(&rs, a+i, &z[j]);
                        if (e[j] || z[j])
                                s = 1;
                }
                if (s) {
                        ls = 1;
                        printf ("%08x:", a);
                        for (i = j = 0; i < 16 && i < b; i+=rs.regsz, j++) {
                                nva_rsprint(&rs, e[j], z[j]);
                        }
                        printf ("\n");
                } else {
                        if (ls) printf ("...\n"), ls = 0;
                }

BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

But generally, nvapeek seems to work fine now:

[aloew@aloew-lap nva]$ sudo ./nvapeek 0
00000000: 086a00a2

Looking forward to your comments...

BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Yeah, it prints "..." instead of 0. This makes a lot of sense when you're peeking a large range full of 0's. Anyways, were there any additional messages in dmesg, e.g. MMIO read/write failures as a result?

Revision history for this message
In , Awl1 (awl1) wrote :

Yes, we indeed see the same well-known:

nouveau E[ PBUS][0000:01:00.0] MMIO read of 0x00010000 FAULT at 0x10200c

at any nvpeek read attempt, and

nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at 0x10200c

at any nvpoke attempt... :-(

BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

Oops - I just remembered that I am booting my kernel with "nouveau.config=PCRYPT=0" in the meantime...

Does this make any difference, i.e. do I need to retry the nvapeek/nvapoke sequence without this kernel option?

Sorry & thanks,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

A Happy New Year to everybody! :-)

Just wondering whether you intend to simply close this issue down with the workaround "solution" for me to set kernel option

nouveau.config=PCRYPT=0

or whether you are still interested in finding out *why* my Quadro NVS 130M and other 8400M-based cards do not seem support this functionality (or what might need to be done differently in the driver to ensure they do).

Additional interesting information:

I have been informed that folks at NVIDIA have recently succeeded to track down a Solaris hang issue in their proprietary Unix drivers ("blob") that affected exactly Quadro NVS 130M cards (AFAIK, NVIDIA IR # 1172500).

I can indeed reproduce these hangs on Solaris 11.1, so this issue probably matches the unpredictable hangs that I have been also seeing with the Linux blob versions > 285.05.09 that made their drivers unusable for me.

AFAIK, their fix is scheduled to be fixed in the third update to their R331 series in February.

So how would you like to proceed regarding this issue?

Thanks & BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Could you provide the output of

nvapeek 154c
nvapeek 1540

Those registers specify which engines are there. I think we're ignoring them in nouveau...

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Created attachment 91714
patch to honor disabled engines

Give this a shot (without forcing PCRYPT=0). You should hopefully see a message saying that it and a few other engines are disabled. This needs some more testing on a wider variety of cards before I'll send it upstream, but it may be what you need.

Revision history for this message
In , Awl1 (awl1) wrote :

> Could you provide the output of
>
> nvapeek 154c
> nvapeek 1540
>
> Those registers specify which engines are there. I think we're ignoring them
> in nouveau...

OK - that might indeed explain the issues seen...
Here you are:

[aloew@aloew-lap nva]$ sudo ./nvapeek 154c
0000154c: 0000009c

[aloew@aloew-lap nva]$ sudo ./nvapeek 1540
00001540: b1010001

Looking at the patch you provided, if the rusty binary arithmetics chip in my brain is still valid, this means for my case:

vdec = nv_rd32(device, 0x1540) & 0x40000000;

0xb(...) = 1011(...)
0x4(...) = 0100(...)

=> for me, vdec indeed is 0x00000000, i.e. false

and as my chipset is 0x86, furthermore:

MPEG -> disabled
VP -> disabled

and for the dynamic features,

0x9c = 10011100 binary

0x20 = 00100000 binary
0x40 = 01000000 binary

as 0x9c & 0x20 == 0x00, BSP -> disabled
as 0x9c & 0x40 == 0x00, PCRYPT -> disabled

which would probably confirm your that for me your patch is correct.

That said, I will apply this patch to my current stock RHEL6 kernel and report back later today on whether this works fine for me (which it indeed should, based on the above considerations!).

Thanks a million - great work! :-)

Best regards,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Created attachment 91765
patch to honor hw disables after vbios

Unfortunately the first patch runs before VBIOS, so if the manufacturer explicitly disables an engine for some reason (by writing a 0 to those bits) we should probably honor that. This patch does that (actually 2 patches munged into 1). I've tested it on my NV98 and it correctly doesn't disable anything, but would be nice to test it on a card that _does_ disable stuff.

[note, this patch replaces the first patch, not in addition to it]

Revision history for this message
In , Awl1 (awl1) wrote :

Hello Ilia,

hmm - you just caught me with the update five minutes after I had started the rpmbuild with the previous version... ;-)

Unfortunately, while I could make the first patch apply to a current RHEL kernel source with only one change (core/engine/device.c -> core/subdev/device.c), the new patch will need much more rework to make it compile against a RHEL kernel.

I am therefore looking into getting a 3.12 kernel from the Oracle Linux "playground":

http://public-yum.oracle.com/repo/OracleLinux/OL6/playground/latest/x86_64/

Would 3.12.6 be an appropriate version to apply your updated patch to successfully?

Thanks & BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #71)
> Would 3.12.6 be an appropriate version to apply your updated patch to
> successfully?

I'm working against, effectively, 3.13-rc8. I'd think it would apply to 3.12, and just about any other semi-recent kernel, but I guess RHEL does something special? Not sure. That subdev -> engine move happened in dded35dee3 which went into 3.10, so I guess you're using something old.

Revision history for this message
In , Awl1 (awl1) wrote :

Yes, definitely, a RHEL6 stock kernel is *very* old (2.6.32.*) - but due to a kernel drm/nouveau module update from 3.x source that they recently did for RHEL 6.5, it also suddenly became new enough to make me see this issue... ;-)

Have just successfully applied the updated patch to 3.12.6, so my rpmbuild is running! :-)

You can expect my results in about two hours or so (will have dinner inbetween).

Thanks & BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

Just received

drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c:164: error: 'NVDEV_ENGINE_VIC' undeclared (first use in this function)

but "fixed" it for me by commenting out the lines for a 0xaf card (I have a 0x86 type anyway, so this code does not apply to me):

+ case 0xaf:
+ /* if (!(r154c & 0x40)) */
+ /* device->disable_mask |= 1ULL << NVDEV_ENGINE_VIC; */
+ /* fallthrough */

BR,
Andreas

Revision history for this message
In , Awl1 (awl1) wrote :

Created attachment 91786
Complete dmesg output booting 3.12.6 with "hwunits.patch" applied (nouveau.debug=debug)

Revision history for this message
In , Awl1 (awl1) wrote :

Created attachment 91787
nouveau-related dmesg output booting 3.12.6 with "hwunits.patch" applied (nouveau.debug=debug)

Revision history for this message
In , Awl1 (awl1) wrote :

Sorry that it took me longer to get back here - I needed an additional rpmbuild run due to running out of disk space for my first attempt...

But I can give an all clear signal - at least for my machine, AFAIK, everything seems to be fine:

Kernel command line: ro root=UUID=034d34cd-a464-4ee3-8db9-d6061a318a16 rd_NO_LUKS LANG=en_US.UTF-8 KEYBOARDTYPE=pc KEYTABLE=de-latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM nouveau.debug=debug rhgb quiet

nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x086a00a2
nouveau [ DEVICE][0000:01:00.0] Chipset: G86 (NV86)
nouveau [ DEVICE][0000:01:00.0] Family : NV50
nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image...
nouveau [ VBIOS][0000:01:00.0] ... appears to be valid
nouveau [ VBIOS][0000:01:00.0] using image from PRAMIN
nouveau [ VBIOS][0000:01:00.0] BIT signature found
nouveau [ VBIOS][0000:01:00.0] version 60.86.49.00.27
(...)
nouveau [ PMPEG][0000:01:00.0] hardware is marked as disabled
nouveau [ PVP][0000:01:00.0] hardware is marked as disabled
nouveau [ PCRYPT][0000:01:00.0] hardware is marked as disabled
nouveau [ PBSP][0000:01:00.0] hardware is marked as disabled

and also, everything is fine afterwards (as PCRYPT seems to indeed have been properly disabled). :-)

What do you say? Do you agree that everything turned out as expected from my nvpeek results?

Thanks & BR,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

Great news! I'll update the bug when this makes it upstream (or if we have further questions about your hardware). FWIW I've been going around asking people to report registers 1540/154c to me, and so far everyone except you and one other person having trouble with nouveau has had them listed as everything enabled.

Thanks for tracking down the commit that caused the issue, that was instrumental!

Revision history for this message
In , Awl1 (awl1) wrote :

You're welcome! :-)

I did do this in my very own interest, because the OL6/RHEL6 install on my main work laptop all of a sudden had this distortion issue when RHEL updated the drm/nouveau module to an affected codebase in RHEL 6.5, so I definitely needed a solution for this (other than get a new laptop)...

One final request from my side, as I don't have commercial RHEL6 support (I am using the free OL6 clone):

Hoping that you have pretty good contact/access to Ben Skeggs (who I think officially owns the nouveau modules at Red Hat), can you please approach him and ask him to please take care of the fact that Red Hat also applies a (backported) version of this patch to their mainline stock RHEL 6.5 kernels?

That would be great, as this is definitely needed to ensure that all those people with the affected older/low-end NVIDIA notebook chips - such as myself (and all the other now unfortunately silent people who initially created this issue) - will no longer be affected by this issue in the current RHEL 6 kernels (or don't need the explicit workaround using the kernel parameter PCRYPT=0)?

Thanks a million for your kind help & best regards from Germany,
Andreas

Revision history for this message
In , Tomshere (tomshere) wrote :

Hello,

I have the same NVIDIA GeForce NVS 130M with the disabled functions.
I checked with nvapeek:
0000154c: 0000009c
00001540: b1010001

uname -a delivers
Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I do not have any issues with distorted graphics during normal usage but my problem is that resume from suspend mode makes X hang.

I also have these errors in dmesg

[ 18.985158] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00fd94
[ 18.986213] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x103d94
[ 19.026027] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at 0x10200c

but also

[ 18.984164] nouveau E[ PTHERM][0000:01:00.0] unhandled intr 0x00000161

When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate the errors and X still hangs when resuming.
I was not sure if I have to set the parameter in quotes.
As you can see I'm not a linux specialist ;)

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #80)
> Hello,
>
> I have the same NVIDIA GeForce NVS 130M with the disabled functions.
> I checked with nvapeek:
> 0000154c: 0000009c
> 00001540: b1010001
>
> uname -a delivers
> Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux
>
> I do not have any issues with distorted graphics during normal usage but my
> problem is that resume from suspend mode makes X hang.
>
> I also have these errors in dmesg
>
> [ 18.985158] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x00fd94
> [ 18.986213] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x103d94
> [ 19.026027] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000010
> FAULT at 0x10200c

These errors should go away with the patch.

>
> but also
>
> [ 18.984164] nouveau E[ PTHERM][0000:01:00.0] unhandled intr 0x00000161

I believe this is unrelated.

>
> When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> the errors and X still hangs when resuming.

It should eliminate the 10200c error. The others are from PVP and PBSP, you could do like nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0 -- that should have the same effect as my patch for your hardware. (I think.)

> I was not sure if I have to set the parameter in quotes.

Not necessary, but I *think* it'll work with quotes as well. Not sure.

> As you can see I'm not a linux specialist ;)

OK, then you have some different issue. I would recommend filing a fresh issue with all the relevant info.

Revision history for this message
In , Awl1 (awl1) wrote :

Hi Thomas,

> I have the same NVIDIA GeForce NVS 130M with the disabled functions.
> I checked with nvapeek:
> 0000154c: 0000009c
> 00001540: b1010001

great - finally somebody who confirms this issue.

> uname -a delivers
> Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux

> I do not have any issues with distorted graphics during normal usage but my
> problem is that resume from suspend mode makes X hang.

> I also have these errors in dmesg
>
> [ 18.985158] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x00fd94
> [ 18.986213] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x103d94
> [ 19.026027] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000010
> FAULT at 0x10200c

Hmm - your kernel and your nvapeek results clearly suggest you should be affected...

Have you enabled compiz (i.e. OpenGL-based 3D acceleration features)? I assume that so far, you haven't (it does not seem to be active in Ubuntu by default), which most likely is the only reason why you are not seeing the distortion issue (so far).

See e.g.

http://www.howtoforge.com/install-compiz-on-the-unity-desktop-on-ubuntu-12.04-precise-pangolin

(depending on your particular Ubuntu version) on how to enable compiz. I am almost certain that once you have done so, you will also run see the distorted graphics, but you now already know the fix... ;-)

> [ 18.984164] nouveau E[ PTHERM][0000:01:00.0] unhandled intr 0x00000161

This last "PTHERM" error seems to be a different, unrelated issue.

> When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> the errors and X still hangs when resuming.

Hmm - interesting, as I clearly don't have any issues with suspend/resume. Which laptop do you have? Did you already update your BIOS to the latest available version?

> I was not sure if I have to set the parameter in quotes.

No, you don't (and AFAIK, you even must not). Ilia has already proposed the correct workaround for the distortion issue (until your distro of choice has integrated the new fix) - add this:

nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0

to your grub kernel parameters. Having done so, all "MMIO write" errors in dmesg must be gone (they are for me!), otherwise something else is still wrong for you in addition.

Hope this helps & best regards,
Andreas

BTW @ Ilia:
Did you already have a chance to contact Ben Skeggs about applying the fix to mainline RHEL 6.5 (and above) kernels?

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #82)
> BTW @ Ilia:
> Did you already have a chance to contact Ben Skeggs about applying the fix
> to mainline RHEL 6.5 (and above) kernels?

That seems a little premature given that it's not even in the mainline kernel. However I would recommend that once it is, you file a redhat issue to make sure it gets backported to the whatever. I have no knowledge of, and do not care about RHEL or any non-mainline kernel. If you do, work with whatever processes they have. I bug Ben about enough stuff already :)

Revision history for this message
In , Awl1 (awl1) wrote :

Hi Ilia,

> That seems a little premature given that it's not even in the mainline
> kernel. However I would recommend that once it is, you file a redhat issue
> to make sure it gets backported to the whatever. I have no knowledge of, and
> do not care about RHEL or any non-mainline kernel. If you do, work with
> whatever processes they have. I bug Ben about enough stuff already :)

ouch - that's a pity... :-(

As stated earlier, as I am using the free (only as in beer...) Oracle Linux version rather than a commercial pais RHEL license, I cannot file any issues with them, so I was hoping about you being able to raise this with him within the nouveau team. It clearly deserves a fix, but I won't be able to drive anything myself here due to the lack of a paid license... :-(

Oh, and one more thing by the way:

Interestingly, I can also confirm that for me, the proprietary NVidia "blob" Unix driver version 331.38 (which has just been released this week):

https://devtalk.nvidia.com/default/topic/672875

indeed has also fixed the long-standing hang issue with their drivers for my Quadro NVS 130M on both Linux and Solaris. Even better news is that the fix will also be integrated into the next R331 release for Windows (it is not yet in Windows versions 331.93 or 332.21)!

But while it took NVidia a little less than two years between introducing their regression bug (all releases since 285.x are affected), you/the nouveau team have tracked down and fixed this issue in just a couple of days... :-)

So thanks again for your great work on nouveau! :-)

BR,
Andreas

Revision history for this message
In , Tomshere (tomshere) wrote :

(In reply to comment #82)
> > I also have these errors in dmesg
> >
> > [ 18.985158] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> > FAULT at 0x00fd94
> > [ 18.986213] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000
> > FAULT at 0x103d94
> > [ 19.026027] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000010
> > FAULT at 0x10200c
>
> Hmm - your kernel and your nvapeek results clearly suggest you should be
> affected...
>
> Have you enabled compiz(...)

No I don't think so.

> > [ 18.984164] nouveau E[ PTHERM][0000:01:00.0] unhandled intr 0x00000161
>
> This last "PTHERM" error seems to be a different, unrelated issue.
>
> > When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> > the errors and X still hangs when resuming.
>
> Hmm - interesting, as I clearly don't have any issues with suspend/resume.
> Which laptop do you have? Did you already update your BIOS to the latest
> available version?
>
> > I was not sure if I have to set the parameter in quotes.
>
> No, you don't (and AFAIK, you even must not). Ilia has already proposed the
> correct workaround for the distortion issue (until your distro of choice has
> integrated the new fix) - add this:
>
> nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0
>
> to your grub kernel parameters. Having done so, all "MMIO write" errors in
> dmesg must be gone (they are for me!), otherwise something else is still
> wrong for you in addition.

Ok, now after adding the whole bunch to the kernel opts the three PBUS errors are gone. For the resume failure I open a new issue.

How can I push the integration of such a fix into another distro?

Many thanks!
-Thomas

Revision history for this message
In , Awl1 (awl1) wrote :

Hello Ilia,

has there been any progress so far in getting this into the mainstream Linux kernel (or mainstream git) for the next official kernel release?

I'd like to make an attempt to get this patch (or rather, a backport of it) into official RHEL 6.x kernels, but I'd like to point to an official kernel patch in order to do so.

Many thanks & best regards,
Andreas

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to comment #86)
> Hello Ilia,
>
> has there been any progress so far in getting this into the mainstream Linux
> kernel (or mainstream git) for the next official kernel release?

This should be upstream as of commit 4019aaa2b314a5be9886ae1db64ff8c6d3c060ed, available in 3.14-rc1.

Revision history for this message
In , Awl1 (awl1) wrote :

(In reply to comment #87)

> > has there been any progress so far in getting this into the mainstream Linux
> > kernel (or mainstream git) for the next official kernel release?

> This should be upstream as of commit
> 4019aaa2b314a5be9886ae1db64ff8c6d3c060ed, available in 3.14-rc1.

Many thanks, Ilia! :-)

BR,
Andreas

Changed in linux:
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

I have pulled in the patch identified in the upstream bug, and two foundational pieces, for testing. These are applied to a test kernel at the URL below:

    http://people.canonical.com/~apw/lp1158689-trusty/

Could those affected test these kernels and report back here. I will say that this is a fairly large change and there is no guarentee we can get this into a released kernel.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Trusty):
status: Incomplete → Fix Committed
Andy Whitcroft (apw)
Changed in linux (Ubuntu Trusty):
status: Fix Committed → In Progress
Revision history for this message
Amit Kucheria (amitk) wrote :
Revision history for this message
Amit Kucheria (amitk) wrote :
Revision history for this message
Amit Kucheria (amitk) wrote :

Confirmed. This kernel makes nouveau work on with the Nvidia 8400 GS and the stock Ubuntu desktop (with unity dash) after a very long time.

Filtered kernel logs for the buggy and bugfree version of the 3.13 kernel are attached.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-24.46

---------------
linux (3.13.0-24.46) trusty; urgency=low

  [ Andy Whitcroft ]

  * [Config] d-i -- add nvme devices to block-modules udeb
    - LP: #1303710

  [ Paolo Pisati ]

  * [Config] build vexpress a9 dtb
    - LP: #1303657
  * [Config] disable HVC_DCC
    - LP: #1303657

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1305158
  * rebase to v3.13.9
  * CONFIG_RTLBTCOEXIST=m
    - LP: #1296591

  [ Upstream Kernel Changes ]

  * HID: Bluetooth: hidp: make sure input buffers are big enough
    - LP: #1252874
  * ACPI / video: Add systems that should favour native backlight interface
    - LP: #1303419
  * rds: prevent dereference of a NULL device in rds_iw_laddr_check
    - LP: #1302222
    - CVE-2014-2678
  * x86/efi: Fix 32-bit fallout
    - LP: #1301590
  * drm/nouveau/devinit: tidy up the subdev class definition
    - LP: #1158689
  * drm/nouveau/device: provide a way for devinit to mark engines as
    disabled
    - LP: #1158689
  * drm/nv50-/devinit: prevent use of engines marked as disabled by
    hw/vbios
    - LP: #1158689
  * rtlwifi: btcoexist: Add new mini driver
    - LP: #1296591
  * rtlwifi: Prepare existing drivers for new driver
    - LP: #1296591
  * rtlwifi: add MSI interrupts mode support
    - LP: #1296591
  * rtlwifi: rtl8188ee: enable MSI interrupts mode
    - LP: #1296591

  [ Upstream Kernel Changes ]

  * rebase to v3.13.9
 -- Tim Gardner <email address hidden> Fri, 04 Apr 2014 09:26:27 -0400

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Displaying first 40 and last 40 comments. View all 111 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.