[nvidia-glx] [amd64] Frequent lock ups with white screen with black lines when using nvidia binary driver with CPU speed scaling/Cool 'n Quiet

Bug #109643 reported by Connell
48
Affects Status Importance Assigned to Milestone
linux-restricted-modules-2.6.20 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Frequently while using Ubuntu 7.04 in Gnome with the NVidia Restricted Driver enabled I get complete system lockups requiring a hard-reboot. When this happens the screen becomes mostly white with a random scrambled pattern of mostly-black lines. This happens even more frequently if Beryl or Compiz are enabled, but I have had it happen many times even when neither of these are running. There does not appear to be any specific action that is causing this behavior. It happened once while surfing in Firefox, once while setting up new accounts in gAIM, once while installing packages with Automatix, etc. I have yet to have this problem when I do not have the Nvidia restricted driver enabled, however I have not spent a lot of time in the system without this driver, as the usability of the system without graphic drivers is very limited in my opinion.

This happens for me several times per week. The hardware seems pretty stable when running other Operating systems (my testing in other OSes with this equipment was pretty limited). The system is running very cool, so no signs of overheating. There was some mention in the forums of possible problems with Processor Stepping ("cool 'n' quiet mode"), I haven't made a lot of changes to BIOS settings yet, so I cant confirm/deny that this is related, but I have a suspicion that this is more related to how the Kernel is interacting with the video drivers.

Hardware Profile:
Biostar Geforce 6100-m9 Motherboard (On-board Geforce 6100 video)
AMD Athlon 64 3400+ Venice
1GB PC3200 Memory

(I have tried both the AMD64 and X86 versions of Ubuntu, and experience the same problem in both environments)

Several complaints that seem to be related have been discussed in the forums here:
http://ubuntuforums.org/showthread.php?t=405993 and
http://ubuntuforums.org/showthread.php?t=402580

[Additional info]
bug 94739 reports similar info, particularly noticeable with Blender. Switching drivers and distro seemed to affect frequency but did not eliminate it. Disabling /etc/init.d/powernowd resolved the issue for the user.

Revision history for this message
Chris Burgan (cburgan) wrote :

Thanks for your bug report. This bug was filed without a package, so I am moving it to the restricted modules for Feisty.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Connell:
Could you attach the output of:
dpkg -l nvidia-\* | grep ii

Is there any mention of NVRM: Xid in /var/log/messages ? Additionally if the machine is network and is running sshd can you still ssh into the machine when it "locks up"?

Setting to needsinfo pending reply from Connell.

Changed in linux-restricted-modules-2.6.20:
status: Unconfirmed → Needs Info
Revision history for this message
Connell (ubuntu-rjconnell) wrote :

$ dpkg -l nvidia-\* | grep ii
ii nvidia-glx 1.0.9631+2.6.20.5-15.20 NVIDIA binary XFree86 4.x/X.Org driver
ii nvidia-kernel-common 20051028+1ubuntu7 NVIDIA binary kernel module common files

$ grep NVRM /var/log/messages
Apr 22 15:45:54 amber-desktop kernel: [ 47.192948] NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-9631 Thu Nov 9 17:38:10 PST 2006
Apr 23 18:12:58 amber-desktop kernel: [ 43.986975] NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-9631 Thu Nov 9 17:38:10 PST 2006
Apr 23 18:19:24 amber-desktop kernel: [ 44.383865] NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-9631 Thu Nov 9 17:38:10 PST 2006
Apr 24 17:34:44 amber-desktop kernel: [ 47.035077] NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-9631 Thu Nov 9 17:38:10 PST 2006

I will have to wait until the next time the machine decides to lock up to answer the ssh question ... I am doubting that it will work as the system will not respond to any other commands (ie: CTRL+ALT+DEL, Power Button, etc.) but I'll be sure to post back as soon as I find out.

Revision history for this message
Asif Youssuff (yoasif) wrote :

I have filed a possibly related bug here: https://bugs.launchpad.net/ubuntu/+bug/109810

Revision history for this message
Connell (ubuntu-rjconnell) wrote : Re: [nvidia-glx] system frequently locks up with white screen with black lines while nvidia restricted driver in use must hard boot system

I did disable the Cool 'n' quiet mode, and the system has not locked up since ... I will continue to test. So at this point I am not certain if this is related to the NVidia drivers, or not. I didn't have the problem until I enabled the video driver, but perhaps that was a coincidence, or perhaps running the nvidia driver was just enough of a load on the processor to cause the CPU stepping. Maybe it was the combination of the 3.

Either way it is looking like this issue may primarily be related to the kernel. I would really like to be able to better determine the root cause. If I can provide any further information please ask.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Hmm. It might be the nvidia binary driver hasn't yet had changes made to it to cope with CPU frequency scaling...

Revision history for this message
Connell (ubuntu-rjconnell) wrote :

should this be changed out of "needs info" status, or is there some other information I can provide?
Thanks,
RJ

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

RJ:
Good catch! I guess I had better starting asking more questions! : ) Can you re-enable "cool 'n' quiet" in the BIOS and add the following new line just after #! /bin/sh in /etc/init.d/powernowd
exit 0

This will basically disable the loading of the program that changes the speed of your computer at boot. Can you then see if things still lock up after a while?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

This might be related to bug #13530 ...

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Might also be related to Bug #59857

Revision history for this message
Asif Youssuff (yoasif) wrote :

Tried disabling powernowd as Sitsofe suggested (turned it back on in the BIOS and edited the file).

Everything seems to be working fine... it's been up for about 21 hours with no issues.

Revision history for this message
Connell (ubuntu-rjconnell) wrote :

I also have had no problems since I disabled powernowd and re-enabled the cool-n-quiet mode in the BIOS ... looks like you may be onto something sitsofe.

I will report back if anything changes. Hopefully this can get fixed soon so others done experiance similar problems.

Thanks again for your help,
RJ

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote : Re: [nvidia-glx] System frequently locks up with white screen with black lines when using nvidia binary driver with CPU speed scaling/Cool 'n Quiet

RJ, yoasif:
Thanks for plugging away on this issue. You may have discovered a workaround for a few other NVRM: Xid sufferers. It might not be the cause in every case but it's something to bear in mind. Setting to confirmed based upon RJ and yoasif's feedback.

Changed in linux-restricted-modules-2.6.20:
status: Needs Info → Confirmed
Revision history for this message
Asif Youssuff (yoasif) wrote :

By the way, whatever the issue that is affecting me (that disabling powernowd seems to fix) is a multi-distro issue (I had it in SuSE 10.2 as well), so I think it also affects upstream.

Just something to keep in mind.

Bryce Harrington (bryce)
description: updated
Revision history for this message
Connell (ubuntu-rjconnell) wrote :

Just wanted to report its been several days and no lockups, so disabling powernowd seems to be an effective workaround.

Question is --- based upon what we are seeing I'm not so sure this is related to the NVIDIA GLX anymore (that being said, I cant confirm that its not related - Would be nice to know if any non-NVidia users are getting this). Any thoughts on this? Should the ticket name be updated?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Connell:
You can check yourself. If you don't see any crashes with the nv driver when powernowd is running then the odds are there is a problem with the nvidia binary driver. I have seen a range of laptops/desktops with i915 cards running GL programs which do not suffer this problem (and are definitely doing scaling) but that's not to say that there aren't other graphics drivers which have not been tested in a variable CPU frequency scaling environment and thus cause lockups...

Revision history for this message
piechan (kumiko-peter) wrote :

I had the same problem. Ubuntu 7.04 crashed very frequently – several times per hour(!). If you see the error message 'MP-BIOS Bug 8254 times not connected to IO-APIC' at boot up you probably need to activate the noapic option in /boot/grub/menu.lst as described in http://ubuntuforums.org/showthread.php?t=430674&highlight=8254 . In my case at least it resolved the problem. By the way: My Laptop is equiped with a ATI graphics card (ATI Mobility Radeon 9600 M10).

Revision history for this message
Connell (ubuntu-rjconnell) wrote :

Thanks for the suggestion piechan, but I have not noticed that message at startup. Could be helpful for others tho.

Revision history for this message
Brijam (brian-opensourcery) wrote :

Confirming the same bug on my HP zd7000 laptop with nvidia-restricted-glx & powernowd. In my case X.org was was pegging the CPU after no keyboard use for ~30 minutes, eventually locking the machine. The same thing would happen if I locked the workstation. Definitely seems to be related to ACPI as the workaround specified above /did/ eliminate the lockup.

Revision history for this message
Toadmund (toad) wrote :

Hi there, I am the originator of one of the posts above, and a poster in the other, I also wrote this one, with pics of my problem:
[url]http://ubuntuforums.org/showthread.php?t=420888[/url]
This is day 3 with no crash yet and I am not expecting it to hold, but, so far so good.

What did I do?
I disabled AMD cool and quiet in my BIOS, not a WSOD yet! A couple days previous I got the WSOD 3-4 times in under 1-2 hours, super frustrating! I usually got between 1 -3 crashes a day.
Keeping my fingers crossed.

MOBO: MSI rs480m2 with ATI radeon Xpress 200g onboard video

Revision history for this message
Trae Blain (trae) wrote :

I also was experiencing this problem. I've been up for 2 weeks now with no white screens. I am using a similar board as Toadmund and have the ATI radeon Xpress 200m onboard video. I'm running Feisty AMD64.

Shutting off the Cool 'n Quiet in the BIOS has kept it from happening.

Revision history for this message
Erwin Olario (gowin) wrote :

Is there a way to disable cool n quiet feature from a laptop using a turion processor?

I'm using the MSI S270.

Revision history for this message
Toadmund (toad) wrote :

Just the same, in the BIOS, a turion is an AMD product, 64 bit I believe.
Mine is an Athlon 64 3200.

Revision history for this message
Erwin Olario (gowin) wrote :

You're correct, Turion is the mobile 64-bit proc of AMD.

I've tried looking for a setting to disable cool n quiet in the BIOS of the MSI s270. No such setting available.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :
Revision history for this message
Erwin Olario (gowin) wrote :

@Sitsofe
Thanks. I'll try this.

Revision history for this message
Toadmund (toad) wrote :

Erwin:
Mine is a Phoenix BIOS
AMD Cool'n'queit is under 'Power Management setup' just in case you missed it.

Good luck man!

Revision history for this message
Toadmund (toad) wrote :

PS, Just think 'P.M.S.' :)

Revision history for this message
Erwin Olario (gowin) wrote :

AMD's Cool N' Quiet power management system are for desktops. The mobile equivalent is called PowerNow!

In any case, there's still no such setting for my laptop's BIOS. So for people in the same rut, just follow Sitsofe's tip: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules-2.6.20/+bug/109643/comments/9

Revision history for this message
Erwin Olario (gowin) wrote :

By the way, I've never had any lockups since disabling CnQ/PowerNow!

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(This might be a duplicate of Bug #85370 )

Revision history for this message
Maciej Jesionowski (dark-prophet) wrote :

Hey, I'm adding a comment here as it seems I have a very similar problem. I'm running Ubuntu 7.10 for desktop, I've just switched from Windows few weeks ago. My system is:

AMD Athlon X2 4400+
ASUS M2N-VM DVI (GeForce 7050PV integrated video on nForce 630a)
ADATA 2GB 800MHz DDR2

I've installed NVidia driver through the Restricted Driver Manager. While using Gnome with full hardware acceleration (the mode with additional visual effects) my system hangs after few minutes and screen fills with skewed multicolor stripes (just like described here <a href="https://bugs.launchpad.net/ubuntu/+bug/109810">Bug #109810</a>). If I run Gnome without visual effects, the system works fine (or so it seems).

Looks like this problem is video related somehow (or video application triggers it). When I launch some video application, like World of Warcraft (running with Wine), the game often hangs on random basis. Usually in first minute of application running. Such lock up needs a hard reset. Sometimes everything runs fine without hanging so this issue is a bit random.

I've read that this can be caused by Cool'n'Quiet feature of AMD Processors, but at this moment I'm not sure if such feature is turned on in my BIOS. Once I'm back at home I have to check it. TBH this problem is driving me mad, I really hope that disabling CnQ would help, at least for now.

Revision history for this message
Chris (chrisspelberg) wrote : Workaround works!

I've had those lock-ups in Ubuntu 7.10, just a plain freeze of everything, including network, ssh, caps-lock, mouse, etc.
I use the following hardware:
- AMD Athlon XP 1800+ (1533MHz)
- NVidia GeForce2 MX-400

Installed the binary driver through 'Restricted Driver Manager'

What works for me is editing the file /etc/init.d/powernowd.early and setting:
DO_MODULES=no

After a reboot everything works well. 3D-effects, wobbly windows, everything.

Hope this helps.

Revision history for this message
Peter Heslin (p-j-heslin) wrote :

I also experienced this bug and disabling powernow fixes it. I am not sure that the bug is classified correctly, however. I found that it happened with an nvidia card both with the proprietary and free drivers and also with an ATI card with both proprietary and free drivers.

The symptom is a hard crash -- sometimes with keyboard LEDs flashing to show a kernel panic, sometimes not even that. It had been happening to me for a long time, but it used to be relatively infrequent. Lately, perhaps since upgrading to gusty, it had been much more frequent, several times a day.

The system is an AMD Athlon 64 X2 3800+ with Asus A8V Deluxe AGP (Socket 939) Motherboard. It originally had a Sapphire ATI Radeon 9200 video card, but I swapped it for an XFX Geforce 6200 card in an attempt to stop the crashing. I recently put the ATI card back in to see if that helped, but the system would inevitably crash soon after booting up.

The system has been stable now with the NVIDIA card and proprietary driver for about two weeks after adding an exit command to the top of /etc/init.d/powernowd (turning Cool n' Quiet off in BIOS did not help, as the modules were loaded anyway).

I have another system with the same CPU and an ATI Radeon 9200 AGP video card, but a different motherboard (Gigabyte), and it has never had a crash despite powernow being active.

Anyway, I hope this info helps with debugging.

Revision history for this message
DWHagar (david-hagar) wrote :

On my system (Dell Inspiron 8200) the problem was solved by forcing the use of the nvidia built in AGP driver, this is done by the Option "NvAGP" "1" setting in your xorg.conf as well as blacklisting the related non nvidia agp driver in modprobe.d/blacklist (mine is intel_agp), gone many days without any lockups now.

Though I'm not using AMD64, nor even 64 bit.

Revision history for this message
pftg (pftg) wrote :

My way to solve is:
   Comment all aliases for nvidia, nvidiafb and nvidia_legacy and leave only for nvidia_new in /lib/modules/2.6.22-14-generic/modules.alias

Revision history for this message
Eduardo (vancouverislandgeek) wrote :

This did not resolve my issue. A simple test I've discovered is to go to http://www.musicovery.com. Without fail, as this website loads my system reboots, even with the proposed solutions listed above.

Linux delly 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

ii nvidia-glx-new 100.14.19+2.6.22.4-14.10 NVIDIA binary XFree86 4.x/X.Org 'new' driver

model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5000+

on a Dell Dimension C521

I initially tried 64-bit ubuntu, so my OS could see my 4G of RAM, but after seeing this error I switched to 32-bit and the error still occurs. The crash is so severe there is no time to generate a core dump to debug.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(If you aren't on a 64 bit system or if you are and disabling powernow doesn't help you probably aren't seeing this particular bug)

Revision history for this message
phenest (steve-clark) wrote :

Linux precision 2.6.27-3-generic #1 SMP Wed Sep 10 16:18:52 UTC 2008 x86_64 GNU/Linux
Ubuntu Intrepid
nVidia GeForce 7950GTX
Dell Precision M90
nvidia-glx-177.76

I'm having trouble with nVidia drivers crashing (see attachment for screen shot). This happens with any nVidia driver. The open source nv driver works fine. I have found that by changing the CPU speed to 'performance' in the Configuration Editor appears to 'cure' the problem.

Revision history for this message
Markus Golser (golserma) wrote :

I have the same problem phenest has

Revision history for this message
phenest (steve-clark) wrote :

Have you been able to pin it down to anything? I have noticed the GPU runs 10C hotter in Intrepid than in Hrady, but it doesn't seem to be an overheating problem. I really am stumped as to what's causing it.

Revision history for this message
Bryce Harrington (bryce) wrote : linux-restricted-modules-2.6.20 is obsolete

This package has become obsolete so we're closing out the bug report as WONTFIX.
Thanks for reporting it though!

Changed in linux-restricted-modules-2.6.20:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.