10de:0a3c [Lenovo ThinkPad W510] X freeze/crash with nouveau driver

Bug #553789 reported by Joe Barnett
160
This bug affects 29 people
Affects Status Importance Assigned to Milestone
Nouveau Xorg driver
Fix Released
High
Baltix
Invalid
Undecided
Unassigned
linux (Ubuntu)
Invalid
High
Unassigned
xserver-xorg-video-nouveau (Ubuntu)
Invalid
Wishlist
Unassigned

Bug Description

Binary package hint: xorg

playing around w/ cheese appication & video froze

could move mouse, but not click on anything

ssh in from remote host showed X process taking up 100% cpu, and backtrace in Xorg.log (believe attached as GdmLog1)

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a3248]
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a2ac4]
2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47cea4]
3: /usr/bin/X (xf86PostMotionEvent+0xa9) [0x47d049]
4: /usr/lib/xorg/modules/input/synaptics_drv.so (0x7fbd1ae3c000+0x39d4) [0x7fbd1ae3f9d4]
5: /usr/lib/xorg/modules/input/synaptics_drv.so (0x7fbd1ae3c000+0x5f48) [0x7fbd1ae41f48]
6: /usr/bin/X (0x400000+0x6fca7) [0x46fca7]
7: /usr/bin/X (0x400000+0x11d1f3) [0x51d1f3]
8: /lib/libpthread.so.0 (0x7fbd20497000+0xf8f0) [0x7fbd204a68f0]
9: /lib/libc.so.6 (ioctl+0x7) [0x7fbd1f24f197]
10: /lib/libdrm.so.2 (drmIoctl+0x23) [0x7fbd1d8005b3]
11: /lib/libdrm.so.2 (drmCommandWrite+0x1b) [0x7fbd1d80083b]
12: /lib/libdrm_nouveau.so.1 (0x7fbd1d1c2000+0x2fbd) [0x7fbd1d1c4fbd]
13: /lib/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfc) [0x7fbd1d1c51bc]
14: /lib/libdrm_nouveau.so.1 (0x7fbd1d1c2000+0x2166) [0x7fbd1d1c4166]
15: /lib/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x29c) [0x7fbd1d1c44fc]
16: /usr/lib/xorg/modules/libexa.so (0x7fbd1c31d000+0x9f53) [0x7fbd1c326f53]
17: /usr/lib/xorg/modules/libexa.so (0x7fbd1c31d000+0xeb1d) [0x7fbd1c32bb1d]
18: /usr/bin/X (0x400000+0xd8150) [0x4d8150]
19: /usr/bin/X (0x400000+0xd186e) [0x4d186e]
20: /usr/bin/X (0x400000+0x30c3c) [0x430c3c]
21: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
22: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fbd1f18fc4d]
23: /usr/bin/X (0x400000+0x25d59) [0x425d59]

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xorg 1:7.5+3ubuntu1
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Thu Apr 1 22:00:54 2010
DkmsStatus: nvidia-current, 195.36.15, 2.6.32-19-generic, x86_64: installed
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100113)
MachineType: LENOVO 43192PU
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=a1bfc043-8a23-40f2-a7a1-311c657b3df0 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: xorg
dmi.bios.date: 01/20/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6NET46WW (1.09 )
dmi.board.name: 43192PU
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6NET46WW(1.09):bd01/20/2010:svnLENOVO:pn43192PU:pvrThinkPadW510:rvnLENOVO:rn43192PU:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 43192PU
dmi.product.version: ThinkPad W510
dmi.sys.vendor: LENOVO
glxinfo: Error: [Errno 2] No such file or directory
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: 2.6.32-19-generic

Revision history for this message
In , Vvv-oktetlabs (vvv-oktetlabs) wrote :

Created an attachment (id=33898)
dmesg output

Revision history for this message
In , Vvv-oktetlabs (vvv-oktetlabs) wrote :

Created an attachment (id=33899)
lspci output

Revision history for this message
In , Xavier (chantry-xavier) wrote :

This seems related to two existing fedora bugs :
https://bugzilla.redhat.com/show_bug.cgi?id=567412
https://bugzilla.redhat.com/show_bug.cgi?id=566987

When using fedora nouveau code, it's usually advised to use fedora bug report.
People following this bug tracker might expect you to run latest git of all nouveau components :
http://nouveau.freedesktop.org/wiki/InstallNouveau
(two main reasons for that : 1) git is usually better and has more fixes 2) we do not know what code exactly nouveau is shipping)

As said in https://bugzilla.redhat.com/show_bug.cgi?id=567412#c6 , using the latest git of xf86-video-nouveau would at least fix the DATA ERROR you see in kernel log and rules that out.

However, according to mwk, the data error is unlikely to be related to the hangs. Also he believes he might have the same problem using recent nouveau code. To confirm that, he would like to know the value of the 400700 register next time the machine hangs :
$ wget http://0x04.net/~mwk/pgtest/{peek.c,libio.{c,h}}
$ gcc peek.c libio.c -lpciaccess -o peek
# ./peek 0x400700

Revision history for this message
In , Vvv-oktetlabs (vvv-oktetlabs) wrote :

Sorry for misplaced report.

It happens again - ./peek 0x400700 output is
00400700: 00100001

So, is there chance this problem resolved in git driver?

Revision history for this message
In , Xavier (chantry-xavier) wrote :

(In reply to comment #4)
> Sorry for misplaced report.
>
> It happens again - ./peek 0x400700 output is
> 00400700: 00100001
>
> So, is there chance this problem resolved in git driver?
>

17:53 < mwk> shining^: heh, ok, so the bug report looks like it could be related...
17:53 < mwk> his failing status is 00100001, mine is 01b00001....
17:54 < shining^> mwk: well thats different :)
17:54 < mwk> not so different
17:54 < mwk> mine includes all of his bits
17:54 < mwk> too bad I don't know what most of these bits mean...
17:55 < mwk> 00800000 is CUDA MP execution, 00000001 is the whole PGRAPH... and that's about it
17:57 < shining^> mwk: ok. and it also hangs regularly for you ?
17:57 < shining^> "usually once per 1-2 days"
17:57 < mwk> shining^: much more often when I'm playing some 3d on it
17:58 < shining^> ok
17:58 < mwk> possibly I could get more lockups out of it, and of the 00100001 kind, if I left X running for long
17:58 < shining^> I suppose the answer to that question is no then : "So, is there chance this problem resolved in git driver?"
17:59 < mwk> with current git? /me doubts that.

Revision history for this message
In , Vvv-oktetlabs (vvv-oktetlabs) wrote :

Thanks. I called ./peek 0x400700 when X hangs another time, its output was:
00400700: 001e0001

Just in case if it helps somehow:
- when X hangs, mouse cursor still running
- since update to 2.6.32.9, it appears X hangs more often than before, and usually system can not be restored by killing X (X restarts, but screen remains the same as in moment when hang occurs)

Revision history for this message
Joe Barnett (thejoe) wrote : X freeze/crash with noveau driver

Binary package hint: xorg

playing around w/ cheese appication & video froze

could move mouse, but not click on anything

ssh in from remote host showed X process taking up 100% cpu, and backtrace in Xorg.log (believe attached as GdmLog1)

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xorg 1:7.5+3ubuntu1
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Thu Apr 1 22:00:54 2010
DkmsStatus: nvidia-current, 195.36.15, 2.6.32-19-generic, x86_64: installed
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100113)
MachineType: LENOVO 43192PU
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=a1bfc043-8a23-40f2-a7a1-311c657b3df0 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: xorg
dmi.bios.date: 01/20/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6NET46WW (1.09 )
dmi.board.name: 43192PU
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6NET46WW(1.09):bd01/20/2010:svnLENOVO:pn43192PU:pvrThinkPadW510:rvnLENOVO:rn43192PU:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 43192PU
dmi.product.version: ThinkPad W510
dmi.sys.vendor: LENOVO
glxinfo: Error: [Errno 2] No such file or directory
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: 2.6.32-19-generic

Revision history for this message
Joe Barnett (thejoe) wrote :
Bryce Harrington (bryce)
description: updated
affects: xorg (Ubuntu) → xserver-xorg-video-nouveau (Ubuntu)
Bryce Harrington (bryce)
tags: added: crash
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

I see you still have some leftover bits from a prior -nvidia binary driver installation. Likely it's causing your issue.

NonfreeKernelModules: nvidia

See http://wiki.ubuntu.com/X/Troubleshooting/ for a guide in how to clean up after the binary drivers. If you need more help, ask on http://answers.launchpad.net/ubuntu/

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Joe Barnett (thejoe) wrote :

non free nvidia driver was actually installed after the freeze happened, ...

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Invalid → New
Bryce Harrington (bryce)
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
Revision history for this message
Anders Jacob Hansen (ajh) wrote :

My new laptop does exactly the same thing even though I haven't gathered any info the behavior is the same. I have the problem that I can't use the NonFree nvidia module since it doesn't support my nVidia GT330M chip.

I check with nVidia and they don't provide any driver (stable or beta) and if I install the driver jockey want's me to install it just makes X screw up upon reboot.

By screw up I mean, it says that the video configuration is not valid or something like that and then i can chose between a few options what to do.

I'm running Kubuntu 10.04 beta 2 but the same problem happens on Ubuntu 10.04 beta 2 ofc.
I do realize that I can't really complain since it's beta :-) but my feeling is the driver wont change much untill 10.04 LTS stable comes out.

Revision history for this message
Chris Halse Rogers (raof) wrote :

I believe this is likely to be this upstream bug: https://bugs.freedesktop.org/show_bug.cgi?id=26980 - could you please attach dmesg after reproducing this bug? That should give us enough information to tell if they're the same bug.

Changed in xserver-xorg-video-nouveau (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Anders Jacob Hansen (ajh) wrote :

So I cat dmesg to a file when it has started up after a crash?

Revision history for this message
Chris Halse Rogers (raof) wrote :

Yes; “dmesg > dmesg-aftercrash.log” will work. From the crash description you'll probably need to SSH into the machine to capture the dmesg - doing this after a reboot won't catch the necessary information.

Revision history for this message
Anders Jacob Hansen (ajh) wrote :

Okay will do, I will return after the next crash ;)
Thanks!

Revision history for this message
Johan Euphrosine (proppy) wrote : Re: X freeze/crash with nouveau driver

I'm also affected by that bug, I will try to submit dmesg-aftercrash.log when it happens again.

summary: - X freeze/crash with noveau driver
+ X freeze/crash with nouveau driver
Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

This bug is a huge mystery and I'm starting to think it's a hw bug. One possibility is that it's triggered by a particular insn sequence that DDX uses. Could you please try http://0x04.net/~mwk/nva5-hack.patch and see if the lockups still happen?

If anyone hits this bug, with the above patch or without, please compile pgtest from http://0x04.net/cgit/index.cgi/pgtest and give the output of ./peek 0x400000 0x10000 after the lockup.

Revision history for this message
In , Axel Beckert (xtaran) wrote :
Download full text (3.5 KiB)

I have observed this bug also on Ubuntu 10.04 (Details below):

[…]
(II) NOUVEAU(0): Modeline "1152x864"x0.0 108.00 1152 1216 1344 1600 864 865 868 900 +hsync +vsync (67.5 kHz)
(II) NOUVEAU(0): Modeline "1400x1050"x0.0 156.00 1400 1504 1648 1896 1050 1053 1057 1099 -hsync +vsync (82.3 kHz)
(II) NOUVEAU(0): Modeline "1440x900"x0.0 106.50 1440 1520 1672 1904 900 903 909 934 -hsync +vsync (55.9 kHz)
(II) NOUVEAU(0): Modeline "1440x900"x0.0 136.75 1440 1536 1688 1936 900 903 909 942 -hsync +vsync (70.6 kHz)
(II) NOUVEAU(0): Modeline "1680x1050"x0.0 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync (65.3 kHz)
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a3248]
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a2ac4]
2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47cea4]
3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f5182f10000+0x53cf) [0x7f5182f153cf]
4: /usr/bin/X (0x400000+0x6fca7) [0x46fca7]
5: /usr/bin/X (0x400000+0x11d1f3) [0x51d1f3]
6: /lib/libpthread.so.0 (0x7f518823e000+0xf8f0) [0x7f518824d8f0]
7: /lib/libc.so.6 (ioctl+0x7) [0x7f5186ff6157]
8: /lib/libdrm.so.2 (drmIoctl+0x28) [0x7f51855a75b8]
9: /lib/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f51855a783b]
10: /lib/libdrm_nouveau.so.1 (0x7f5184f69000+0x2f7d) [0x7f5184f6bf7d]
11: /lib/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfc) [0x7f5184f6c1bc]
12: /lib/libdrm_nouveau.so.1 (0x7f5184f69000+0x2166) [0x7f5184f6b166]
13: /lib/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x29c) [0x7f5184f6b4fc]
14: /usr/lib/xorg/modules/libexa.so (0x7f51840c4000+0x8876) [0x7f51840cc876]
15: /usr/lib/xorg/modules/libexa.so (0x7f51840c4000+0x9415) [0x7f51840cd415]
16: /usr/bin/X (0x400000+0xd8a7b) [0x4d8a7b]
17: /usr/bin/X (0x400000+0x2ebb4) [0x42ebb4]
18: /usr/bin/X (0x400000+0x30c3c) [0x430c3c]
19: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
20: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f5186f36c4d]
21: /usr/bin/X (0x400000+0x25d59) [0x425d59]

strace shows tons of this:

[…]
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
ioctl(8, 0x40086485, 0x7fff98e3a6d0) = ? ERESTARTSYS (To be restarted)
[…]

neper:~# uname -a
Linux neper 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64 GNU/Linux
neper:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 10.04 LTS
Release: 10.04
Codename: lucid
neper:~# dpkg -l | fgrep nouveau
ii libdrm-nouveau1 2.4.18-1ubuntu3 […]
ii xserver-xorg-video-nouveau 1:0.0.15...

Read more...

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

*** Bug 28320 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mark-hagger (mark-hagger) wrote :

(In reply to comment #7)
> If anyone hits this bug, with the above patch or without, please compile pgtest
> from http://0x04.net/cgit/index.cgi/pgtest and give the output of ./peek
> 0x400000 0x10000 after the lockup.

I can get this to hang almost at will, on a fresh Fedora 13 install, I've attached output from the peek run, hope it helps.

Revision history for this message
In , Mark-hagger (mark-hagger) wrote :

Created an attachment (id=36471)
Xorg and peek output after hang

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

(In reply to comment #7)
> This bug is a huge mystery and I'm starting to think it's a hw bug.

I'm getting the same symptoms on a fresh install of ubuntu 10.04 on a brand new lenovo w510. X locks up about 2 or 3 times per day - screen is frozen, even if X is restarted.

  2.6.32-22-generic-pae #36-Ubuntu SMP Thu Jun 3 23:14:23 UTC 2010 i686

next time it happens, I'll ssh into the machine and gather more information.

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

Created an attachment (id=36485)
dmesg taken during X lockup

It happened again....
here is the dmesg output that I captured by ssh-ing in while X was in 100% CPU state.

I also thought I had captured and strace... but I made a mistake. I'll capture that next time. Is there anything else you need, this is happening every few hours for me?

Revision history for this message
In , Mark-hagger (mark-hagger) wrote :

(In reply to comment #13)
> that next time. Is there anything else you need, this is happening every few
> hours for me?
See comment#7, Marcin says he wants output from "peek" after a hang has occurred.

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

Status update.

First, we don't need any additional dumps - Ben found a reliable way to reproduce this bug semi-instantly, so we can make any dumps we need.

But, we still have no idea what's causing this bug. And right now all developers hunting for this bug are either too busy with other things, or on vacation.

So, please no more dumps. You have four options:

 - use other driver
 - disable acceleration [nouveau.noaccel=1 on kernel command line]
 - decide that rebooting every few hours isn't that bad after all
 - debug and fix that bug on your own

Yes, we know the situation sucks. Sorry for that.

Revision history for this message
In , bedahr (grasch-simon-listens) wrote :

Would you mind posting the method to reproduce the bug?

Thanks!

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

06:16:48 <darktama> I can reproduce the issue *very* quickly with http://www.nvnews.net/vbulletin/showthread.php?t=150598
06:16:57 <darktama> the first post has an image, and a smaller thumbnail image
06:17:42 <darktama> if I scroll up and down for a bit (<1min) with the the thumbnail going off/on the screen, it'll hang

Revision history for this message
In , bedahr (grasch-simon-listens) wrote :

Hm I don't have any problems with this site (have been scrolling for almost 2 minutes now without any issues).

I am using a 8600M GT, tough but I'd thought I'd test it because my X has hung twice in the last three days with the same symptoms as well...

Revision history for this message
In , Picogeyer (picogeyer) wrote :

(In reply to comment #17)
> 06:16:48 <darktama> I can reproduce the issue *very* quickly with
> http://www.nvnews.net/vbulletin/showthread.php?t=150598
> 06:16:57 <darktama> the first post has an image, and a smaller thumbnail image
> 06:17:42 <darktama> if I scroll up and down for a bit (<1min) with the the
> thumbnail going off/on the screen, it'll hang

I can't reproduce the bug with that site either using a GeForce 210, though my X hangs about twice a day.

Revision history for this message
Greg Wilkins (gregw-wiltel) wrote : Re: X freeze/crash with nouveau driver

I'm also having the same problem and will gather the info next time it happens

Revision history for this message
Greg Wilkins (gregw-wiltel) wrote :

here is my dmesg taken during a lockup. I'll do an strace next time

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

Yeah, that's a funny thing about this testcase. Apparently it works for NVA3 and NVA5 chipsets, but not NVA8 which gt210 is based on. Maybe it's related to different TP configuration or something...

As for 8600M problem - this is almost certainly some other bug. Not all random hangs are instance of the particular bug. Does that hang give something suspicious in dmesg?

Revision history for this message
In , bedahr (grasch-simon-listens) wrote :

I'll try to debug it the next time it happens...

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

Created an attachment (id=36508)
strace taken during 100% X lockup

here is an strace of the X process during the 100% cpu lockup. Looks like a loop to me.

I'm going to switch to the proprietary driver for a while and see if I get lockups with that. But I really want to run the open driver (it's so much more flexible), so I'm happy to switch back if you want me to try anything.

Revision history for this message
Greg Wilkins (gregw-wiltel) wrote : Re: X freeze/crash with nouveau driver

here is an strace of the X process during the 100% cpu lockup.

I'm going to switch to the proprietary driver for a while and see if I get lockups with that. But I really want to run the open driver (it's so much more flexible), so I'm happy to switch back if you want me to try anything.

Revision history for this message
Greg Wilkins (gregw-wiltel) wrote :

I've run the proprietary driver now for 2 days without any lock ups, so it does look like a software problem.

Revision history for this message
papukaija (papukaija) wrote :

Greg has provided the requested information at comment 11.

@Greg: For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
craigster0 (charmer-launchpad-net) wrote :

i ran into this problem as well on a Lenovo Thinkpad T510 with nVidia NVS 3100M (rev a2) graphics.

I've run into this problem four or five times in the week that i've owned the laptop. in this particular case the freeze occurred after the screen saver locked the screen, and i entered my password. entering my password unlocked the screen, but that was the last thing i could type or click on.

here's the info from the tail of Xorg.0.log.old (grabbed after a reboot):

...
(II) NOUVEAU(0): Modeline "1920x1080"x0.0 115.83 1920 1980 2028 2050 1080 1090 1100 1130 -hsync -vsync (56.5 kHz)
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a3248]
1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a2ac4]
2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47cea4]
3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f72a1d24000+0x53cf) [0x7f72a1d293cf]
4: /usr/bin/X (0x400000+0x6fca7) [0x46fca7]
5: /usr/bin/X (0x400000+0x11d1c3) [0x51d1c3]
6: /lib/libpthread.so.0 (0x7f72a7143000+0xf8f0) [0x7f72a71528f0]
7: /lib/libc.so.6 (ioctl+0x7) [0x7f72a5efa187]
8: /lib/libdrm.so.2 (drmIoctl+0x28) [0x7f72a44ab5b8]
9: /lib/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f72a44ab83b]
10: /lib/libdrm_nouveau.so.1 (0x7f72a3e6d000+0x2f7d) [0x7f72a3e6ff7d]
11: /lib/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfc) [0x7f72a3e701bc]
12: /lib/libdrm_nouveau.so.1 (0x7f72a3e6d000+0x2166) [0x7f72a3e6f166]
13: /lib/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x29c) [0x7f72a3e6f4fc]
14: /usr/lib/xorg/modules/libexa.so (0x7f72a2fc8000+0x9555) [0x7f72a2fd1555]
15: /usr/lib/xorg/modules/libexa.so (0x7f72a2fc8000+0xa0ea) [0x7f72a2fd20ea]
16: /usr/bin/X (0x400000+0xd8a4b) [0x4d8a4b]
17: /usr/lib/xorg/modules/libexa.so (0x7f72a2fc8000+0xb3bd) [0x7f72a2fd33bd]
18: /usr/bin/X (0x400000+0xd844e) [0x4d844e]
19: /usr/bin/X (0x400000+0xd27ce) [0x4d27ce]
20: /usr/bin/X (0x400000+0x30c3c) [0x430c3c]
21: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
22: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f72a5e3ac4d]
23: /usr/bin/X (0x400000+0x25d59) [0x425d59]

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

As per comment #15, I can confirm that setting [nouveau.noaccel=1 on kernel command line] does prevent lockups (at least for 3 days). It does result in a occasional strange display artefacts (eg black boxes left after tool tips close), but is entirely usable and causes less issues than using the proprietary driver.

Revision history for this message
craigster0 (charmer-launchpad-net) wrote : Re: X freeze/crash with nouveau driver

Once this bug occurs, it is readily reproducible (or at least I think this is a reproduction case).

On my system, once X server wedges I can login to the machine via ssh and "kill -9" the X server process. A new process starts, and immediately throws the same stack traceback.

So there's the potential of intervening with /usr/bin/X, or /usr/lib/gdm/gdm-simple-slave (it's parent) to collect additional strace or debug information.

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

squeak?

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

Well... the status is still the same. We [me and darktama / Ben Skeggs] both have no fucking idea what's
causing this problem.

It turns out, stuff is complicated. NVA3+ card introduced some sort of
microcontroller on-board that runs all the time and controls stuff like power
management. In all NVA3+ traces, we see a lot of places where blob is talking
to that controller and does stuff to PGRAPH. Atm we can only assume that
lockups are caused by us not doing this.

We have no idea what exactly that microcontroller does and how to talk to it.
Worse, this microcontroller needs microcode that is uploaded by the driver.
And judging by what I've seen already, ctxprogs were a walk in the park
compared to this new uc. The progs are huge, and there are shitloads of
opcodes to RE.

So - we're blocked again on REing some magical code. And we're talking real
code this time, as opposed to ctxprogs which were mostly just a list of regs
to copy...

Until we understand what's going on, status for NVA3+ cards stays the same -
we have absoluely no idea how to fix that bug and ETA is on the order of at
least months.

And for the record: I'm going to kill anyone who calls this new progs
"voodoo", "magic", or anything like that. We already decided on the name of
"fuc progs". And the PM microkontroller is not the only place where there are
used. Right now we know of the following:

 - NV98+ cryptographic engine setup
 - NV98+ cryptographic engine ctxprogs
 - NV98+ video decoding ctxprogs
 - NVA3+ unknown engine 104xxx [DMA copier?] ctxprogs
 - NVA3+ PM microcontroller
 - NVC0 PGRAPH ctxprogs

With such a huge list of users, REing fuc progs is a high priority for me.
Keep your fingers crossed. And find lots of RE-capable people to help, this is
going to be a long ride...

PS. Note that chipset ordering is a bit funny and it does NOT follow NVxx
numerical values. The saner ordering I've come up with is at
http://github.com/pathscale/envytools/blob/master/nvchipsets.xml - it best
matches order of adding new functionality. Also, for video decoding units,
NVA0 and NV98 should be swapped, ie. NV98+ video decoding does not include
NVA0. So NVA3+ cards are NVA3, NVA5, NVA8, NVAF, NVC0. But not NVAA/NVAC.

PPS. In other words, if you want your card to work with nouveau, don't buy NVA3+ cards yet. Or, following nvidia codenames, don't buy stuff with GT21x chipset in it. Nor MCP89, but that one is too damn rare anyway.

PPPS. If you think you're able to RE microcodes, contact me. Srsly. We need more people. Badly.

Revision history for this message
In , Greg Wilkins (gregw-wiltel) wrote :

Marcin,

thanks for the update.
Sorry I can't help with the RE side of things (got my own open source projects keeping me 200% busy). But I am happy to help debug/test on my hardware.

Note that I still find using the nouveau driver without acceleration is much better than using the nvidea driver with acceleration.

Revision history for this message
In , Dmytro-poplavskiy (dmytro-poplavskiy) wrote :

I also have the similar problem on GT240 on OpenSuse 11.3,
I hope the backtrace may help.

#0 0x00007fddfac33e87 in ioctl () from /lib64/libc.so.6
#1 0x00007fddf93ebc38 in drmIoctl (fd=10, request=1074291842, arg=0x7fffca3b0330) at xf86drm.c:184
#2 0x00007fddf93edf3b in drmCommandWrite (fd=<value optimized out>, drmCommandIndex=<value optimized out>, data=<value optimized out>, size=<value optimized out>)
    at xf86drm.c:2398
#3 0x00007fddf8dad07d in nouveau_bo_wait (bo=0x829b00, cpu_write=<value optimized out>, no_wait=<value optimized out>, no_block=<value optimized out>)
    at nouveau_bo.c:385
#4 0x00007fddf8dad68e in nouveau_bo_map_range (bo=0x829b00, delta=0, size=<value optimized out>, flags=8) at nouveau_bo.c:428
#5 0x00007fddf8dac21a in nouveau_pushbuf_space (chan=0x8366c0, min=<value optimized out>) at nouveau_pushbuf.c:53
#6 0x00007fddf8dac760 in nouveau_pushbuf_flush (chan=0x8366c0, min=0) at nouveau_pushbuf.c:273
#7 0x00007fddf7f0e7a5 in ?? () from /usr/lib64/xorg/modules/libexa.so
#8 0x00007fddf7f10272 in ?? () from /usr/lib64/xorg/modules/libexa.so

Revision history for this message
Chris Halse Rogers (raof) wrote : Re: X freeze/crash with nouveau driver

The underlying problem here is that the GPU is locked up. Nouveau doesn't have the same sort of GPU hang detection that Intel (or Radeon, for that matter) have, so what happens is that any time something tries to touch the card it never gets a reply. After a while, X notices, and you get a EQ overflow.

Nouveau has no way of resetting the card, so it's going to remain locked up until you restart.

Upstream is looking at reverse-engineering a processor on the newer nouveau GPUs which they believe to be behind this instability.

Until they've got that done, it might be worth quirking off acceleration on these cards.

affects: xserver-xorg-video-nouveau (Ubuntu) → linux (Ubuntu)
tags: added: kernel-graphics
Revision history for this message
Chris Halse Rogers (raof) wrote :

And by “these cards”, I mean anything with a NVA3, NVA5, NVA8, NVAF or NVC0 chipset.

Revision history for this message
In , Mark Carey (careym) wrote :

Adding a me too.

NVA8, GT218

[drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2

Fedora 13. Seems to happen reproducably after playing freeciv for 45 - 60 minutes.

[ 3407.261] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 3407.262]
Backtrace:
[ 3407.272] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80ad11c]
[ 3407.272] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x809b8f7]
[ 3407.272] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd2) [0x80c8232]
[ 3407.272] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0xe7c000+0x30a2) [0xe7f0a2]
[ 3407.272] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xe7c000+0x3349) [0xe7f349]
[ 3407.272] 5: /usr/bin/Xorg (0x8047000+0x76c30) [0x80bdc30]
[ 3407.272] 6: /usr/bin/Xorg (0x8047000+0x12db64) [0x8174b64]
[ 3407.272] 7: (vdso) (__kernel_sigreturn+0x0) [0xdba400]
[ 3407.272] 8: (vdso) (__kernel_vsyscall+0x2) [0xdba416]
[ 3407.272] 9: /lib/libc.so.6 (ioctl+0x19) [0x74c0b9]
[ 3407.272] 10: /usr/lib/libdrm.so.2 (drmIoctl+0x2e) [0x7fd6a7e]
[ 3407.272] 11: /usr/lib/libdrm.so.2 (drmCommandWrite+0x3c) [0x7fd6e0c]
[ 3407.273] 12: /usr/lib/libdrm_nouveau.so.1 (0xfb5000+0x2a9a) [0xfb7a9a]
[ 3407.273] 13: /usr/lib/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xf1) [0xfb7c91]
[ 3407.273] 14: /usr/lib/libdrm_nouveau.so.1 (nouveau_bo_map+0x34) [0xfb7d64]
[ 3407.273] 15: /usr/lib/libdrm_nouveau.so.1 (0xfb5000+0x1c2e) [0xfb6c2e]
[ 3407.273] 16: /usr/lib/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x1ca) [0xfb700a]
[ 3407.273] 17: /usr/lib/xorg/modules/drivers/nouveau_drv.so (0x131000+0x1b76f) [0x14c76f]
[ 3407.273] 18: /usr/lib/xorg/modules/libexa.so (0x163000+0x8ac3) [0x16bac3]
[ 3407.273] 19: /usr/lib/xorg/modules/libexa.so (0x163000+0x9673) [0x16c673]
[ 3407.273] 20: /usr/bin/Xorg (0x8047000+0xe6f26) [0x812df26]
[ 3407.273] 21: /usr/bin/Xorg (0x8047000+0x17967d) [0x81c067d]
[ 3407.273] 22: /usr/bin/Xorg (miCompositeRects+0x7f) [0x81c079f]
[ 3407.273] 23: /usr/bin/Xorg (CompositeRects+0x74) [0x811da34]
[ 3407.273] 24: /usr/bin/Xorg (0x8047000+0xe036d) [0x812736d]
[ 3407.273] 25: /usr/bin/Xorg (0x8047000+0xdc494) [0x8123494]
[ 3407.273] 26: /usr/bin/Xorg (0x8047000+0x50a37) [0x8097a37]
[ 3407.273] 27: /usr/bin/Xorg (0x8047000+0x1b595) [0x8062595]
[ 3407.273] 28: /lib/libc.so.6 (__libc_start_main+0xe6) [0x68dcc6]
[ 3407.273] 29: /usr/bin/Xorg (0x8047000+0x1b181) [0x8062181]

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

Can we please stop with the "me too"s and useless backtraces already? We already know the bug affects anyone with NVA3+ cards and the backtraces only show the fallout of the fallout of the fallout of the original problem...

Also: if you get a DMA_PUSHER, that's another bug. This bug happens with total silence from the kernel - the card just hangs without telling us why.

Now, a status update. So far, I REd the instruction set and disaassembled a good chunk of that microcode. Sure enough, it pokes PGRAPH and watches for stuff on it. But the code doesn't yet fully make sense to me, and I'm temporarily away from my NVA5 card, so... could someone dump some registers for me?

I need a dump of the registers both before and after the hang. The interesting stuff is in 10axxx range, and ideally I'd want output of "./peek 10a000 1000". However, if the card doesn't like it and hangs or something upon that command, dump registers individually instead by doing "./peek X" for X being 10a008, 10a6fc, 10a4fc, 10a4f4, 10a714, 10a700, 10a704, 10a4dc, 10a690, 10a688.

Repeating that 2 or 3 times may be useful, esp. across different chipsets. But not TOO much, please.

Revision history for this message
In , Mark Carey (careym) wrote :

Created an attachment (id=37625)
peek results from before lockup

Revision history for this message
In , Mark Carey (careym) wrote :

Created an attachment (id=37626)
peek results after lockup

Revision history for this message
In , Mark Carey (careym) wrote :

Created an attachment (id=37627)
Xorg.0.log from lockup

Revision history for this message
In , Richard-coe (richard-coe) wrote :

I found this bug while researching my X locks up issue.
This solution does not address the hardware issue documented here, but
I want help anyone who is in a similar situation that does *NOT* have
the hardware lockup and that the noaccel=1 does not work.

In my case all you have to do is move a window and then immediately
click in another window. There may be other ways to reproduce this.

The only workaround is to restart the window manager or Xorg.

Investigating the issue I found this patch:
http://cgit.freedesktop.org/xorg/xserver/patch/?id=1884db430a5680e37e94726dff46686e2218d525
Subject: Revert "dix: use the event mask of the grab for TryClientEvents."

I am using xorg-server-1.8.0 and found the issue was introduced in
xorg-server-1.6.3. The patch fix appears in xorg-server-1.8.2 and later

Revision history for this message
In , Bugs-sthias (bugs-sthias) wrote :

./peek'ing my QUADRO NVS 140M results in all zeroes ("...") before and after the lockup, regardless of whether I peek the range "10a000 1000" or the single addresses. Am I doing anything wrong?

Things that cause lockups with nouveau appear to increasingly slow down the blob-driver, which because of this is also no option for me to use. After some time using KDE4/Skype/Firefox/OpenOffice the system becomes terribly slow with blob.

If I can be of any help with testing, please let me know.

Revision history for this message
In , Marcin Kościelnicki (koriakin) wrote :

(In reply to comment #34)
> ./peek'ing my QUADRO NVS 140M results in all zeroes ("...") before and after
> the lockup, regardless of whether I peek the range "10a000 1000" or the single
> addresses. Am I doing anything wrong?

Yes. you didn't listen when I said this bug is NVA3+ only. Your card is NV86.

Do you get any interesting messages in kernel around the lockup? This is almost certainly some other bug.

Revision history for this message
In , Bugs-sthias (bugs-sthias) wrote :

(In reply to comment #35)
> Yes. you didn't listen when I said this bug is NVA3+ only. Your card is NV86.
>
> Do you get any interesting messages in kernel around the lockup? This is almost
> certainly some other bug.

OK, sorry for that. Yes, my kernel gives me

[29470.176297] [drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
[29470.176975] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x1560 Data 0x00000000:0xcccccccc
[29470.176979] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_VALUE
[29470.176990] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x1564 Data 0x00000000:0xcccccccc
[29470.176993] [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_BITFIELD

So, I think I'll try my luck with bug 26733 or open a new one, if they don't like my problem there either :)

Changed in nouveau:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Kavol (kavol) wrote :

Created attachment 39004
./peek 10a000 1000 after and before the crash

07:00.0 VGA compatible controller: nVidia Corporation Device 0a65 (rev a2)

here are my dumps

note that "before" is actually after reboot - don't know how to capture right before the freeze

Revision history for this message
In , Marcin Slusarz (marcin-slusarz) wrote :

*** Bug 30817 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jeffm-9 (jeffm-9) wrote :

Created attachment 39589
Before and after peek 10a000 on ION2

Here are 3 more peeks, before and after, using a different chipset as you said that might be interesting to you.

01:00.0 VGA compatible controller: nVidia Corporation GT218 [ION] (rev a2) (prog-if 00 [VGA controller])
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
 Memory at d0000000 (64-bit, prefetchable) [size=256M]
 Memory at ce000000 (64-bit, prefetchable) [size=32M]
 I/O ports at dc00 [size=128]
 Expansion ROM at fe980000 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [b4] Vendor Specific Information: Len=14 <?>
 Capabilities: [100] Virtual Channel
 Capabilities: [128] Power Budgeting <?>
 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Kernel driver in use: nouveau

I do see messages in the kernel, but as you mentioned it's a separate bug. They appear inconsistently during this failure.
[drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
and sometimes also:
[drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/2 Class 0x502d Mthd 0x0240 Data 0x00000000:0x00086184
[drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_VALUE

This is on openSUSE Factory, kernel 2.6.36-rc4
xorg-x11-server-7.5_1.9.0.902-69.3.x86_64
xorg-x11-driver-video-nouveau-0.0.16_20101010_8c8f15c-14.1.x86_64
libdrm-2.4.22-27.2.x86_64

Revision history for this message
In , Johannes Obermayr (jobermayr) wrote :

Jeff, please try whether it is valid with packages from home:jobermayr.

Revision history for this message
In , Jeffm-9 (jeffm-9) wrote :

(In reply to comment #40)
> Jeff, please try whether it is valid with packages from home:jobermayr.

I'm afraid so. Same symptoms. Is another dump needed or are the ones I've provided already sufficient?

Revision history for this message
In , Alex Mayorga (alex-mayorga) wrote :
Download full text (8.6 KiB)

(In reply to comment #3)
> However, according to mwk, the data error is unlikely to be related to the
> hangs. Also he believes he might have the same problem using recent nouveau
> code. To confirm that, he would like to know the value of the 400700 register
> next time the machine hangs :
> $ wget http://0x04.net/~mwk/pgtest/{peek.c,libio.{c,h}}
> $ gcc peek.c libio.c -lpciaccess -o peek
> # ./peek 0x400700

Landed here from bug 33357 that might be a duplicate of this one.
For n00bs like me the commands have changed a bit, I managed to figure out the first one to be:
$ wget http://0x04.net/cgit/index.cgi/pgtest/{peek.c,libio.{c,h}}

The second one gives me the following:

$ gcc peek.c libio.c -lpciaccess -o peek
peek.c:1:1: error: expected identifier or ‘(’ before ‘<’ token
peek.c:3:13: warning: character constant too long for its type
peek.c:3:53: warning: multi-character character constant
peek.c:3:63: warning: multi-character character constant
peek.c:6:12: warning: character constant too long for its type
peek.c:6:32: warning: character constant too long for its type
peek.c:7:12: warning: character constant too long for its type
peek.c:7:29: warning: character constant too long for its type
peek.c:8:11: warning: character constant too long for its type
peek.c:8:29: warning: character constant too long for its type
peek.c:8:45: warning: character constant too long for its type
peek.c:9:11: warning: character constant too long for its type
peek.c:9:29: warning: character constant too long for its type
peek.c:9:46: warning: character constant too long for its type
peek.c:9:106: warning: character constant too long for its type
peek.c:12:9: warning: multi-character character constant
peek.c:12:26: warning: character constant too long for its type
peek.c:14:11: warning: multi-character character constant
peek.c:14:38: warning: character constant too long for its type
peek.c:14:66: warning: character constant too long for its type
peek.c:14:87: warning: character constant too long for its type
peek.c:15:11: warning: multi-character character constant
peek.c:15:26: warning: character constant too long for its type
peek.c:15:66: warning: character constant too long for its type
peek.c:15:80: warning: character constant too long for its type
peek.c:15:131: warning: multi-character character constant
peek.c:15:151: warning: multi-character character constant
peek.c:15:164: error: empty character constant
peek.c:16:27: warning: character constant too long for its type
peek.c:17:23: warning: character constant too long for its type
peek.c:17:37: error: empty character constant
peek.c:17:46: warning: character constant too long for its type
peek.c:18:15: warning: multi-character character constant
peek.c:18:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘this’
peek.c:18:56: warning: character constant too long for its type
peek.c:19:16: warning: character constant too long for its type
peek.c:19:1: error: stray ‘\305’ in program
peek.c:19:1: error: stray ‘\233’ in program
peek.c:20:14: warning: multi-character character constant
peek.c:21:10: warning: character constant too long for its type
peek.c:21:24: warning: character cons...

Read more...

Changed in nouveau:
importance: Medium → Unknown
Changed in nouveau:
importance: Unknown → Medium
Revision history for this message
In , =?ISO-8859-15?Q?Tiziano_M=FCller?= (tm-dev-zero) wrote :

I'd say the current instructions to build the peek (and other utilities) are:

git clone git://0x04.net/pgtest
cd pgtest
make

Revision history for this message
In , Rtguille (rtguille) wrote :

I have a GT220 and sometimes freezes randomly:

[ 3675.146]
Backtrace:
[ 3675.153] 0: /usr/bin/X (xorg_backtrace+0x28) [0x460d18]
[ 3675.153] 1: /usr/bin/X (0x400000+0x63509) [0x463509]
[ 3675.153] 2: /lib64/libc.so.6 (0x34eca00000+0x32a20) [0x34eca32a20]
[ 3675.153] 3: /usr/lib64/xorg/modules/extensions/libdri2.so
(DRI2CloseScreen+0x24) [0x7fb95ba39a14]
[ 3675.153] 4: /usr/lib64/xorg/modules/drivers/nouveau_drv.so
(0x7fb95b809000+0xd6ab) [0x7fb95b8166ab]
[ 3675.154] 5: /usr/bin/X (0x400000+0xa3b49) [0x4a3b49]
[ 3675.154] 6: /usr/bin/X (0x400000+0x15daec) [0x55daec]
[ 3675.154] 7: /usr/bin/X (0x400000+0x2193c) [0x42193c]
[ 3675.154] 8: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x34eca1ec5d]
[ 3675.154] 9: /usr/bin/X (0x400000+0x21449) [0x421449]
[ 3675.154] Segmentation fault at address 0x10
[ 3675.154]
Fatal server error:
[ 3675.154] Caught signal 11 (Segmentation fault). Server aborting
[ 3675.154]
[ 3675.154]
Please consult the Fedora Project support
         at http://bodhi.fedoraproject.org/
 for help.
[ 3675.154] Please also check the log file at "/var/log/Xorg.0.log" for
additional information.
[ 3675.154]
[ 3675.157] (II) NOUVEAU(0): NVLeaveVT is called.

That error is a bit old.
Ok, i read about the random lockus some time ago.

The interesting thing is that at some point i started to use (i still do) use
the nVIDIA (P)drivers, and to my surprise, it also has random gpu lockus (granted, it is another piece of software)
But when i put my new ati HD5670 and found to also random freeze...

I run F13.
I never had issues with Slackware 13.1 and nvidia(P).
The OtherOS never freezed. with any card.

For F13 i must use pcie_aspm=off, because of issues with the sata controller.
But pcie_aspm=off also seems to set the pcie bus into gen1, (it halves the
link speed and the de-emphasys, and also changes. I do not know if it is normal to pcie_aspm=off to do that and other thins that do.

For example the nvidia(p) report that the card is at gen1 speed. (the mb is Gen2 and the card is also Gen2) M4N72-E

Is expected for the nouvau driver to work with any pcie configuration, different link/speed aspm/no_aspm/ ?

I know that it may not be related, but i do not know if subtle different pcie configurations (or pcie driver bugs) may lead to the vga to behave in that way.

Thanks in advance.

Revision history for this message
In , Frédéric Crozat (fcrozat) wrote :

Created attachment 43519
before and after peek on nv40 from Lenovo T410

Here is peek before and after free, on kernel 2.6.38rc5, on Lenovo T410 laptop with integrated nividia GPU:

01:00.0 VGA compatible controller: nVidia Corporation GT218 [NVS 3100M] (rev a2) (prog-if 00 [VGA controller])
 Subsystem: Lenovo Device 2142
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at cc000000 (32-bit, non-prefetchable) [size=16M]
 Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
 Region 3: Memory at ce000000 (64-bit, prefetchable) [size=32M]
 Region 5: I/O ports at 2000 [size=128]
 [virtual] Expansion ROM at cd000000 [disabled] [size=512K]
 Capabilities: <access denied>
 Kernel driver in use: nouveau

Revision history for this message
In , Frepdesktop (frepdesktop) wrote :
Download full text (27.5 KiB)

My Xorg was also blocking - only the mouse pointer keeps moving, but no other thing happends.

I solved the problem (for now), by removing the comment for the option NoAccel and setting the value to true. I'm using the xorg.conf generated with Xorg -configure with just that change, and everything seem to be working. I'm even using the composite from xfce4 (for the real transparent xfce4-terminal), and it's working ok - maybe not as fast as with accelaration.

For me it was very simple to reproduce the error. As soon as gdm was active (the default theme on debian - the one with the stars and the star rocket), a click on any menu would block everything but the mouse movement.

Here is the log for my X when it was blocking.

Anything else I can do to help solve this?

X.Org X Server 1.9.4
Release Date: 2011-02-04
[ 147.356] X Protocol Version 11, Revision 0
[ 147.356] Build Operating System: Linux 2.6.32.28-dsa-ia32 i686 Debian
[ 147.356] Current Operating System: Linux voyager 2.6.37-1-686 #1 SMP Tue Feb 15 18:21:50 UTC 2011 i686
[ 147.356] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.37-1-686 root=UUID=b69dfdff-c215-473f-9f91-1680b2773ef1 ro single
[ 147.356] Build Date: 17 February 2011 01:25:01AM
[ 147.356] xorg-server 2:1.9.4-2 (Cyril Brulebois <email address hidden>)
[ 147.356] Current version of pixman: 0.21.4
[ 147.356] Before reporting problems, check http://wiki.x.org
 to make sure that you have the latest version.
[ 147.356] Markers: (--) probed, (**) from config file, (==) default setting,
 (++) from command line, (!!) notice, (II) informational,
 (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 147.356] (==) Log file: "/var/log/Xorg.0.log", Time: Fri Feb 18 03:17:25 2011
[ 147.356] (==) Using config file: "/etc/X11/xorg.conf"
[ 147.356] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[ 147.373] (==) ServerLayout "X.org Configured"
[ 147.373] (**) |-->Screen "Screen0" (0)
[ 147.373] (**) | |-->Monitor "Monitor0"
[ 147.374] (**) | |-->Device "Card0"
[ 147.374] (**) |-->Input Device "Mouse0"
[ 147.374] (**) |-->Input Device "Keyboard0"
[ 147.374] (==) Automatically adding devices
[ 147.374] (==) Automatically enabling devices
[ 147.452] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[ 147.452] Entry deleted from font path.
[ 147.541] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[ 147.541] Entry deleted from font path.
[ 147.541] (**) FontPath set to:
 /usr/share/fonts/X11/misc,
 /usr/share/fonts/X11/100dpi/:unscaled,
 /usr/share/fonts/X11/75dpi/:unscaled,
 /usr/share/fonts/X11/Type1,
 /usr/share/fonts/X11/100dpi,
 /usr/share/fonts/X11/75dpi,
 /var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType,
 built-ins,
 /usr/share/fonts/X11/misc,
 /usr/share/fonts/X11/100dpi/:unscaled,
 /usr/share/fonts/X11/75dpi/:unscaled,
 /usr/share/fonts/X11/Type1,
 /usr/share/fonts/X11/100dpi,
 /usr/share/fonts/X11/75dpi,
 /var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType,
 built-ins
[ 147.541] (**) ModulePath set to "/usr/lib/xorg/modules"
[ 147.541] (WW) AllowEmptyInput is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' wi...

Revision history for this message
In , Timo Aaltonen (tjaalton) wrote :

*** Bug 33357 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jordan Bradley (jordan-w-bradley) wrote :

Is there anything the non-programmer can do t

Revision history for this message
Alex Mayorga (alex-mayorga) wrote : Re: X freeze/crash with nouveau driver
Revision history for this message
In , Alex Mayorga (alex-mayorga) wrote :

FWIW you can see a peek of my frozen nVidia Corporation GT216 [GeForce GT 230M] (rev a2) at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/553789/+attachment/1863113/+files/peek-696104.txt

Timo Aaltonen (tjaalton)
Changed in xserver-xorg-video-nouveau (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
In , Thomas Schwinge (tschwinge) wrote :

Created attachment 43864
peek_10a000_1000-tschwinge.tar.bz2

Marcin, in case you need further data, see the
peek_10a000_1000-tschwinge.tar.bz2 which I'm just attaching. The files'
names should be explanatory. I ran each of the invokations three times.

I will now switch to using nouveau.noaccel=1, but please tell if you need
further data or need something tested.

This system is a DELL PRECISION M4500, the graphics card's lspci output:

01:00.0 VGA compatible controller: nVidia Corporation GT216 [Quadro FX 880M] (rev a2)

I hit this while setting up the system, roughly one hour after finishing
a fresh Ubuntu 10.10 maverick installation, while browsing in Firfox some
web pages about how to get suspend / hibernate / resume working reliably
-- oh the joy of installing GNU/Linux systems on new hardware. :-)

Changed in nouveau:
importance: Medium → High
Revision history for this message
Bryce Harrington (bryce) wrote : Re: [MASTER] [GT21x] X freeze/crash with nouveau driver

[I'm setting this to wishlist rather than high priority because the issue is lack of support for this hardware in the upstream driver, rather than merely a bug. Also, because it sounds like it's unlikely to get solved any time soon. For now, owners of these cards will have to use -nvidia as the workaround.]

summary: - X freeze/crash with nouveau driver
+ [MASTER] [GT21x] X freeze/crash with nouveau driver
Changed in xserver-xorg-video-nouveau (Ubuntu):
importance: High → Wishlist
status: Confirmed → Triaged
Revision history for this message
johanneswilm (j-indymedia) wrote :

For me this actually happens both with nvidia and nouveau. It used to work with both drivers and now it does with neither one.

Revision history for this message
In , Pav-s (pav-s) wrote :

I have been using the git sources for two weeks, and updated to 2.6.39 yesterday:

since somewhere after 25c68aef4e6abcc3c10f593fc565c342ebe2ded8 the lockups disappeared - and I hope they wont return.

great work, thank you

Revision history for this message
In , Pav-s (pav-s) wrote :

I am sorry, the last comment was probably unclear:

Hardware: Thinkpad T410, NVS 3100M (nva8?)
Kernel: vanilla 2.6.39+ from kernel.org + nouveau tree merged

Revision history for this message
In , Emil-l-velikov (emil-l-velikov) wrote :

For anyone still experiencing this bug, here is a possible workaround (thanks to ColdFeetBob for pointing it out)

Modify* /xserver/mi/mieq.c and rebuild xserver
* Increase the "#define QUEUE_SIZE 512" to 1024, 2048 or 4096 [1]

Note that this is *not* a proper fix as you can see in the discussion [2], and it isn't recommended for unexperienced users

[1] http://www.nvnews.net/vbulletin/showpost.php?p=2398370&postcount=2
[2] https://bugs.freedesktop.org/show_bug.cgi?id=15473

Revision history for this message
In , Frédéric Crozat (fcrozat) wrote :

I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3 days (I had several freezes per hour before), using GNOME Shell.

Revision history for this message
Michael Sparmann (theseven) wrote :

This bug seems to have gone away in 2.6.39, as indicated by the upstream bug report.
I updated to 2.6.39 amd64 yesterday and haven't had a single freeze since that, while it happened every couple of minutes before.
(Lenovo T410, Nvidia NVS 3100m)

Revision history for this message
Stephen M. Webb (bregma) wrote :

Confirmed fixed in the 3.0.0 kernel, same hardware, same previous behaviour.

Revision history for this message
In , Ildar (ildar-users) wrote :

(In reply to comment #54)
> I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3
> days (I had several freezes per hour before), using GNOME Shell.

Frederic, you forgot to mention that it's now terribly unstable. I have regular crashes like below, especially on: VT switching, Suspend-to-RAM, etc.

Details:
(II) NOUVEAU(0): NVLeaveVT is called.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x45de88]
1: /usr/bin/Xorg (0x400000+0x62449) [0x462449]
2: /lib64/libpthread.so.0 (0x7fc7e6fb2000+0xefc0) [0x7fc7e6fc0fc0]
3: /lib64/libc.so.6 (0x7fc7e5f00000+0x7427c) [0x7fc7e5f7427c]
4: /lib64/libc.so.6 (__libc_malloc+0x70) [0x7fc7e5f76600]
5: /usr/lib64/X11/modules/libexa.so (0x7fc7e3297000+0x8d44) [0x7fc7e329fd44]
6: /usr/lib64/X11/modules/libexa.so (0x7fc7e3297000+0x53c2) [0x7fc7e329c3c2]
7: /usr/bin/Xorg (0x400000+0xa4a20) [0x4a4a20]
8: /usr/bin/Xorg (ChangeWindowAttributes+0x2e3) [0x455e23]
9: /usr/bin/Xorg (0x400000+0x27c38) [0x427c38]
10: /usr/bin/Xorg (0x400000+0x2db61) [0x42db61]
11: /usr/bin/Xorg (0x400000+0x215ce) [0x4215ce]
12: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fc7e5f1ec5d]
13: /usr/bin/Xorg (0x400000+0x21179) [0x421179]
Segmentation fault at address (nil)

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting

Revision history for this message
In , Skeggsb (skeggsb) wrote :

(In reply to comment #55)
> (In reply to comment #54)
> > I can confirm I didn't had any freeze using 2.6.39 kernel on Lenovo T410 for 3
> > days (I had several freezes per hour before), using GNOME Shell.
>
> Frederic, you forgot to mention that it's now terribly unstable. I have regular
> crashes like below, especially on: VT switching, Suspend-to-RAM, etc.
>
> Details:
> (II) NOUVEAU(0): NVLeaveVT is called.
Not even remotely related. And from the backtrace, probably not nouveau's fault either.

>
> Backtrace:
> 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x45de88]
> 1: /usr/bin/Xorg (0x400000+0x62449) [0x462449]
> 2: /lib64/libpthread.so.0 (0x7fc7e6fb2000+0xefc0) [0x7fc7e6fc0fc0]
> 3: /lib64/libc.so.6 (0x7fc7e5f00000+0x7427c) [0x7fc7e5f7427c]
> 4: /lib64/libc.so.6 (__libc_malloc+0x70) [0x7fc7e5f76600]
> 5: /usr/lib64/X11/modules/libexa.so (0x7fc7e3297000+0x8d44) [0x7fc7e329fd44]
> 6: /usr/lib64/X11/modules/libexa.so (0x7fc7e3297000+0x53c2) [0x7fc7e329c3c2]
> 7: /usr/bin/Xorg (0x400000+0xa4a20) [0x4a4a20]
> 8: /usr/bin/Xorg (ChangeWindowAttributes+0x2e3) [0x455e23]
> 9: /usr/bin/Xorg (0x400000+0x27c38) [0x427c38]
> 10: /usr/bin/Xorg (0x400000+0x2db61) [0x42db61]
> 11: /usr/bin/Xorg (0x400000+0x215ce) [0x4215ce]
> 12: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fc7e5f1ec5d]
> 13: /usr/bin/Xorg (0x400000+0x21179) [0x421179]
> Segmentation fault at address (nil)
>
> Fatal server error:
> Caught signal 11 (Segmentation fault). Server aborting

Revision history for this message
In , Frédéric Crozat (fcrozat) wrote :

I didn't forgot anything : nouveau is rock stable with 2.6.39.x kernel on my T410 (no issue with VT switch. I didn't test suspend-to-ram). Your issues are a different bug, I think (X crash, no gpu lockup).

Revision history for this message
In , List0570 (list0570) wrote :

Uuggh, this seems to be a serious problem, even when using nvidia.ko.
Why? Because xorg seems to probe drivers on startup, and somehow managing to load the nouveau module even when I want to use the nvidia module. The nouveau module then fails to unload properly (use count is always at least 1, and stays 1 even after changing to runlevel 3 and killing xorg).

I could reliably hang the system just by logging out from kdm. Syslog is full of nouveau problems - why, given that I am using nvidia?
When renaming all instances of nouveau.ko under /lib/modules the hangs disappeared.

This has always been there and seems to have zilch effect:

> ls -l /etc/modprobe.d/nvidia.conf
-rw-r--r-- 1 root root 18 2011-04-29 22:55 /etc/modprobe.d/nvidia.conf
> cat /etc/modprobe.d/nvidia.conf
blacklist nouveau

Symptoms are random freezes requiring a hardware reset, but always involving some graphics operation. I can detect no other hardware malfunction, and lockup behaviour is not characteristic of general hardware faults (e.g. ram, cpu, disk).

My reset button is really polished now ;-((

System details:
openSUSE 11.4 with all updates installed,
currently that is kernel kernel-desktop-2.6.37.6-0.5.1.x86_64
xorg-x11-server-7.6_1.9.3-15.24.2.x86_64
The problem seems worse with
kernel-desktop-2.6.39.2-36.1.x86_64

01:00.0 VGA compatible controller: nVidia Corporation GT216 [GeForce GT 220] (rev a2)

Mobo is Gigabyte GA-880GA-UD3H with
NB/SB: AMD 880G / SB850
USB 2.0 + 3.0, SATA 3Gb + 6Gb (prob not relevant)
The chipset also contains an integrated ATI Radeon HD 4250 (which works fine on the OSS radeon driver, with the limits of the driver).
Phenom II X6 1090T CPU

Revision history for this message
In , Pq-z (pq-z) wrote :

(In reply to comment #58)
> Uuggh, this seems to be a serious problem, even when using nvidia.ko.
> Why? Because xorg seems to probe drivers on startup, and somehow managing to
> load the nouveau module even when I want to use the nvidia module. The nouveau
> module then fails to unload properly (use count is always at least 1, and stays
> 1 even after changing to runlevel 3 and killing xorg).

Your problem has nothing to do with this bug report.

You are (or the X server is) trying to use nouveau and nvidia drivers at the same time, which is known to cause havoc and should not be attempted. All the fallout you described fits perfectly.

The simple fix for you is to define Driver "nvidia" in your xorg.conf.
That prevents probing of any drivers and uses only what you want.

If you feel there is bad behavior in the X server, when it probes different drivers, you should file a bug against the X server. But first, make sure your distribution has not patched the X server driver probing code, e.g. by adding drivers not present in the original driver list. If they have, you should complain to your distribution.

Revision history for this message
In , Younes Manton (younes-m) wrote :

On , <email address hidden> wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=26980

> --- Comment #58 from Volker Kuhlmann <email address hidden>> 2011-07-13
> 04:03:19 PDT ---

> Uuggh, this seems to be a serious problem, even when using nvidia.ko.

> Why? Because xorg seems to probe drivers on startup, and somehow managing
> to

> load the nouveau module even when I want to use the nvidia module. The
> nouveau

> module then fails to unload properly (use count is always at least 1, and
> stays

> 1 even after changing to runlevel 3 and killing xorg).

> I could reliably hang the system just by logging out from kdm. Syslog is
> full

> of nouveau problems - why, given that I am using nvidia?

> When renaming all instances of nouveau.ko under /lib/modules the hangs

> disappeared.

> This has always been there and seems to have zilch effect:

> > ls -l /etc/modprobe.d/nvidia.conf

> -rw-r--r-- 1 root root 18 2011-04-29 22:55 /etc/modprobe.d/nvidia.conf

> > cat /etc/modprobe.d/nvidia.conf

> blacklist nouveau

> Symptoms are random freezes requiring a hardware reset, but always
> involving

> some graphics operation. I can detect no other hardware malfunction, and
> lockup

> behaviour is not characteristic of general hardware faults (eg ram, cpu,

> disk).

> My reset button is really polished now ;-((

> System details:

> openSUSE 11.4 with all updates installed,

> currently that is kernel kernel-desktop-2.6.37.6-0.5.1.x86_64

> xorg-x11-server-7.6_1.9.3-15.24.2.x86_64

> The problem seems worse with

> kernel-desktop-2.6.39.2-36.1.x86_64

> 01:00.0 VGA compatible controller: nVidia Corporation GT216 [GeForce GT
> 220]

> (rev a2)

> Mobo is Gigabyte GA-880GA-UD3H with

> NB/SB: AMD 880G / SB850

> USB 2.0 + 3.0, SATA 3Gb + 6Gb (prob not relevant)

> The chipset also contains an integrated ATI Radeon HD 4250 (which works
> fine on

> the OSS radeon driver, with the limits of the driver).

> Phenom II X6 1090T CPU

Delete nouveau.ko.

Revision history for this message
Lucazade (lucazade) wrote :

Not fixed also with latest kernel 3.x in Oneiric.
After a couple of mins I have a hard freeze with a 250GTS.

Revision history for this message
In , Arun Raghavan (arunraghavan) wrote :
Download full text (4.0 KiB)

Using 3.0.4, this bug is very much still there (or at least the backgrace looks very similar). Happy to help provide more information or debug if pointed in the right direction.

(gdb) bt
#0 0x00007f5af04f3007 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f5aeea8f878 in drmIoctl (fd=9,
    request=1074291842, arg=0x7fff50cac510)
    at /usr/src/debug/x11-libs/libdrm-2.4.26/libdrm-2.4.26/xf86drm.c:167
#2 0x00007f5aeea91c2b in drmCommandWrite (
    fd=<optimized out>, drmCommandIndex=<optimized out>,
    data=<optimized out>, size=<optimized out>)
    at /usr/src/debug/x11-libs/libdrm-2.4.26/libdrm-2.4.26/xf86drm.c:2422
#3 0x00007f5aee44311d in nouveau_bo_wait (bo=0x1b75420,
    cpu_write=<optimized out>, no_wait=<optimized out>,
    no_block=<optimized out>)
    at /usr/src/debug/x11-libs/libdrm-2.4.26/libdrm-2.4.26/nouveau/nouveau_bo.c:390
#4 0x00007f5aee443703 in nouveau_bo_map_range (
    bo=0x1b75420, delta=0, size=<optimized out>, flags=4)
    at /usr/src/debug/x11-libs/libdrm-2.4.26/libdrm-2.4.26/nouveau/nouveau_bo.c:433
#5 0x00007f5aee64d2b5 in NVAccelDownloadM2MF (
    dst_pitch=3740, dst=0x2c96644 "", h=20, w=1,
    y=<optimized out>, x=237, pspix=0x29275f0)
    at /usr/src/debug/x11-drivers/xf86-video-nouveau-0.0.16_pre20110801/xf86-video-nouveau-0.0.16_pre20110801/src/nouveau_exa.c:132
#6 nouveau_exa_download_from_screen (pspix=0x29275f0,
    x=237, y=348, w=1, h=20, dst=0x2c96644 "",
    dst_pitch=3740)
    at /usr/src/debug/x11-drivers/xf86-video-nouveau-0.0.16_pre20110801/xf86-video-nouveau-0.0.16_pre20110801/src/nouveau_exa.c:386
#7 0x00007f5aed9f81ee in exaCopyDirty (
    migrate=<optimized out>, pValidDst=0x2927680,
    pValidSrc=0x2927690,
    transfer=0x7f5aee64cd80 <nouveau_exa_download_from_screen>, fallback_index=1, sync=0x7f5aed9f69b0 <exaWaitSync>)
    at /usr/src/debug/x11-base/xorg-server-1.10.4/xorg-server-1.10.4/exa/exa_migration_classic.c:220
#8 0x00007f5aed9fac41 in exaPrepareAccessReg_mixed (
    pPixmap=0x29275f0, index=0, pReg=0x0)
    at /usr/src/debug/x11-base/xorg-server-1.10.4/xorg-server-1.10.4/exa/exa_migration_mixed.c:254
#9 0x00007f5aeda061d0 in ExaPrepareCompositeReg (
    height=20, width=1, yDst=31, xDst=234, yMask=0,
    xMask=0, ySrc=0, xSrc=0, pDst=0x28a94e0, pMask=0x0,
    pSrc=0x28a9e20, op=57 '9', pScreen=<optimized out>)
    at /usr/src/debug/x11-base/xorg-server-1.10.4/xorg-server-1.10.4/exa/exa_unaccel.c:600
#10 ExaCheckComposite (op=57 '9', pSrc=0x28a9e20,
    pMask=0x0, pDst=0x28a94e0, xSrc=0, ySrc=0, xMask=0,
    yMask=0, xDst=234, yDst=31, width=1, height=20)
    at /usr/src/debug/x11-base/xorg-server-1.10.4/xorg-server-1.10.4/exa/exa_unaccel.c:625
#11 0x00007f5aeda02189 in exaComposite (op=57 '9',
    pSrc=0x28a9e20, pMask=0x0, pDst=0x28a94e0,
    xSrc=<optimized out>, ySrc=<optimized out>, xMask=0,
    yMask=0, xDst=<optimized out>, yDst=<optimized out>,
    width=1, height=20)
    at /usr/src/debug/x11-base/xorg-server-1.10.4/xorg-server-1.10.4/exa/exa_render.c:1066
#12 0x00000000004db50a in damageComposite (op=57 '9',
    pSrc=0x28a9e20, pMask=0x0, pDst=0x28a94e0, xSrc=0,
    ySrc=0, xMask=0, yMask=0, xDst=234, yDst=31, width=1,
    height=20)
    at /u...

Read more...

Revision history for this message
In , Skeggsb (skeggsb) wrote :

(In reply to comment #61)
> Using 3.0.4, this bug is very much still there (or at least the backgrace looks
> very similar). Happy to help provide more information or debug if pointed in
> the right direction.
Any GPU hang etc will have similar backtraces from X's point of view, and it's not usually useful in itself to see what's happening.

Did you have any output in your kernel log from when this occurred?

Revision history for this message
In , Arun Raghavan (arunraghavan) wrote :

(In reply to comment #62)
[...]
> Did you have any output in your kernel log from when this occurred?

There was no output after the initial module load messages.

Revision history for this message
In , Skeggsb (skeggsb) wrote :

(In reply to comment #63)
> (In reply to comment #62)
> [...]
> > Did you have any output in your kernel log from when this occurred?
>
> There was no output after the initial module load messages.

Are you able to install envytools (http://nouveau.git.sourceforge.net/git/gitweb.cgi?p=nouveau/envytools;a=summary) and run "nvapeek 0x400700 4" while the GPU is hung?

Also, how new are all your userspace components (xf86-video-nouveau, libdrm, mesa etc)?

Revision history for this message
In , Lithium-flower (lithium-flower) wrote :
Download full text (3.5 KiB)

I experimented this bug not long ago - right now, Im not sure it is exactly the same, since it's not really random but it happens everytime I run Supertux.
I recovered the folloving kernel log lines rebooting from another partition after the freeze.

kernel 3.0.4, Arch Linux, x86_64, fully updated
nouveau-dri 7.11-2

lspci:

01:00.0 VGA compatible controller: nVidia Corporation GT215 [GeForce GT 240] (rev a2) (prog-if 00 [VGA controller])

kernel.log:

Sep 11 08:44:25 faye kernel: [ 1297.836589] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:25 faye kernel: [ 1298.301418] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1299.629024] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1300.956673] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04400000 - Ch 2
Sep 11 08:44:27 faye kernel: [ 1300.956719] psmouse.c: Wheel Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
Sep 11 08:44:29 faye kernel: [ 1302.284342] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:29 faye kernel: [ 1299.632344] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:31 faye kernel: [ 1302.960046] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:32 faye kernel: [ 1303.611794] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:34 faye kernel: [ 1303.615050] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1302.960046] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1305.614198] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44:53 faye kernel: [ 1307.613341] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:53 faye kernel: [ 1310.944310] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1307.613341] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1314.940910] [drm] nouveau 0000:01:00.0: PRAMIN flush timeout
Sep 11 08:44:53 faye kernel: [ 1310.941070] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:53 faye kernel: [ 1316.266913] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 5
Sep 11 08:44:53 faye kernel: [ 1312.943555] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:53 faye kernel: [ 1312.943555] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:55 faye kernel: [ 1318.264271] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x04c00000 - Ch 2
Sep 11 08:44:55 faye kernel: [ 1322.259717] psmouse.c: resync failed, issuing reconnect request
Sep 11 08:44:55 faye kernel: [ 1322.256898] [drm] nouveau 0000:01:00.0: PGRAPH TLB flush idle timeout fail: 0x00c00f01 0x00000209 0x00001600 0x00000000
Sep 11 08:44:55 faye kernel: [ 1322.256898] [drm] nouveau 0000:01:00.0: vm flush timeout: engine 0
Sep 11 08:44:55 faye kernel: [ 1323.596565] [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00400000 - Ch 2
Sep 11 08:44...

Read more...

Revision history for this message
In , Arun Raghavan (arunraghavan) wrote :

(In reply to comment #64)
> (In reply to comment #63)
> > (In reply to comment #62)
> > [...]
> > > Did you have any output in your kernel log from when this occurred?
> >
> > There was no output after the initial module load messages.
>
> Are you able to install envytools
> (http://nouveau.git.sourceforge.net/git/gitweb.cgi?p=nouveau/envytools;a=summary)
> and run "nvapeek 0x400700 4" while the GPU is hung?

All I get is a '...'.

> Also, how new are all your userspace components (xf86-video-nouveau, libdrm,
> mesa etc)?

mesa - 7.11
libdrm - 2.4.26
xf86-video-nouveau - 0.0.16_pre20110801 (that's the Gentoo package, which is presumably a snapshot from that date)

For what it's worth, kernel's 3.0.4, and the GPU is a 9400M (PCI id 10de:0863).

Revision history for this message
In , Marcin Slusarz (marcin-slusarz) wrote :

This bug was fixed in 2.6.39 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=2b4cebe4e165b0ef30a138e4cf602538dea15583 ),
so I'm closing this bug report.

ColdFeetBob, Arun Raghavan: you are experiencing different bugs.
If you still can reproduce them please read http://nouveau.freedesktop.org/wiki/Bugs and open *new* bug reports. Thanks.

Changed in nouveau:
status: Confirmed → Fix Released
Revision history for this message
In , aaron (nelaaro) wrote : Re: [MASTER] [GT21x] X freeze/crash with nouveau driver

I got the same errors as you.
When I check what was in the file, i found it to be some html page, which was an error message.

Need to double check that wget command.

head *
==> libio.c <==
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
<head>
<title>cgit error</title>
<meta name='generator' content='cgit v0.9.0.2'/>
<meta name='robots' content='index, nofollow'/>
<link rel='stylesheet' type='text/css' href='/cgit/cgit.css'/>
<link rel='alternate' title='Atom feed' href='http://0x04.net/cgit/cgit.cgi/pgtest/atom/?h=(null)' type='application/atom+xml'/>
</head>

==> libio.h <==
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
<head>
<title>cgit error</title>
<meta name='generator' content='cgit v0.9.0.2'/>
<meta name='robots' content='index, nofollow'/>
<link rel='stylesheet' type='text/css' href='/cgit/cgit.css'/>
<link rel='alternate' title='Atom feed' href='http://0x04.net/cgit/cgit.cgi/pgtest/atom/?h=(null)' type='application/atom+xml'/>
</head>

==> peek.c <==
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
<head>
<title>cgit error</title>
<meta name='generator' content='cgit v0.9.0.2'/>
<meta name='robots' content='index, nofollow'/>
<link rel='stylesheet' type='text/css' href='/cgit/cgit.css'/>
<link rel='alternate' title='Atom feed' href='http://0x04.net/cgit/cgit.cgi/pgtest/atom/?h=(null)' type='application/atom+xml'/>
</head>

Revision history for this message
no!chance (ralf-fehlau) wrote : Re: [MASTER] [GT21x] X freeze/crash with nouveau driver

Same effect here with 6100 onboard graphics. Mouse is moving, but I cannot click anything. Keyboard input is without effect, i.e. I cannot canche to console with Ctrl+Alt+F1.

Revision history for this message
no!chance (ralf-fehlau) wrote :

Sorry: s/canche/change/ ;-)

penalvch (penalvch)
summary: - [MASTER] [GT21x] X freeze/crash with nouveau driver
+ 10de:0a3c [MASTER] [GT21x] X freeze/crash with nouveau driver
Revision history for this message
penalvch (penalvch) wrote : Re: 10de:0a3c [MASTER] [GT21x] X freeze/crash with nouveau driver

Joe Barnett, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc4

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: bios-outdated-1.52
summary: - 10de:0a3c [MASTER] [GT21x] X freeze/crash with nouveau driver
+ 10de:0a3c [Lenovo ThinkPad W510] X freeze/crash with nouveau driver
Revision history for this message
dino99 (9d9) wrote :

This version is now outdated and no more supported

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in baltix:
status: New → Invalid
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.