[nvidia-glx] Frequent lockups when NV 3d is enabled (solved)

Bug #94739 reported by Ric95
2
Affects Status Importance Assigned to Milestone
linux-restricted-modules-2.6.20 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Fiesty worked well until Nvidia 3d was enabled. ( nice new driver manager btw ;) )
Now system locks up hard in 3 - 30 seconds.( unless 3d disabled)
Browsing the forums it seems several people have this problem, a few have fixed it by using older (?) drivers or enabling 3d via envy or automatix.
Amd64
Ubuntu 7.04(64), herd 5- updated
Nvidia 7300

(btw, why won't the 'package choice' in this bug report acknowlege fiesty 7.04?)

Revision history for this message
Ric95 (pamric-shaw) wrote :

I have now downgraded to Edgy 6.10 (64bit) but still occasionally get the same lockup. None of the log files record the event, neither does terminal.
How can I help find this bug? ( I can reproduce it by moving windows around for a while)

Revision history for this message
Ric95 (pamric-shaw) wrote :

Narrowing the bugsearch:
I have changed to Kubuntu 6.10, lockups are now very rare. The only lockup I have had with this is when dragging a Blender render window.
So it seems likely to be a mix of the Nvidia 3d driver with the later (gnome) window functions ( post Dapper).
I'd be happy to continue working on this bug if anyone wants ask me to try anything particular....

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
See if there is mention of NVRM: Xid in /var/log/messages ...

Revision history for this message
Phillip Heath (mstlyevil) wrote : Re: [nvidia-glx] Frequent lockups when NV 3d enabled

I can confirm this bug as I had the same result using the restricted driver manager on a fresh feisty installation. I uninstalled the driver using the RDM them proceded to install the drivers manually. Manual installation did not fix the issue and neither did reconfiguring xorg using dpkg-reconfigure xserver-xorg. The kernel module continued to refuse to load forcing me to do a fresh installation.

After doing a fresh installation I installed the nvidia drivers both manually and with Automatix and this bug was not reproduced either time.

Revision history for this message
Phillip Heath (mstlyevil) wrote :

I forgot to mention I am also n Feisty AMD64 7.04 and I used the alternate installation cd.

Revision history for this message
Phillip Heath (mstlyevil) wrote :

Also I am using a Nvidia 7600 GT.

Revision history for this message
John Jason Jordan (johnxj) wrote :

I was using the nvidia-glx on my laptop with a GeForce4 420 Go 64M and Ubuntu Edgy amd64. All was well until I upgraded to Feisty. At the end of the upgrade the reboot dumped me to the command line and I had to edit xorg.conf and change "nvidia" to "nv" in order to get X to load. I tried reinstalling it, but it won't boot. I also tried the nvidia-glx-new driver, but it also won't boot. I really need the nVidia driver back because it works so much better than the nv driver. Is there any news on when something might be fixed?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

John:
There is not enough information in your comment to identify what the problem might be (plus it doesn't sound like your particular issue is a lock up). Please can you file a new bug report (attaching your xorg.conf file and Xorg.0.log) and post a link back here?

Revision history for this message
Ric95 (pamric-shaw) wrote :

Solved.
I've noted the times when the system locks up, and in my case there was no events logged in the log files. But I switched to Xubuntu 7.04 and it runs clean and stable :)
[See if there is mention of NVRM: Xid in /var/log/messages ...]
Sorry, but I've wiped the partition when I installed Xubuntu. :(

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95's last comment makes this sounds like Bug #13530 . If so Ric95 is less likely to see the problem but assuming that the nvidia binary module is still being used the issue will flare up with programs like firefox...

Revision history for this message
Ric95 (pamric-shaw) wrote :

Ya, it looks like several people have had problems with that.
Firefox hasn't lockedup, but my beloved Blender has ( rarely ).
 What can I do to fix this and hopefully help the community ?

Can I recompile the NV driver ? ( I read an old page describing an Nvidia supplied compiler to tweak for the system... still available? )
I'm interested in copiling the kernel for my hardware , will that help?
Would it help to compile Blender from source? , a static compile is an option too.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
There's very little to recompile - it a binary only driver. The only thing you could recompile is the glue layer which isn't really going to make much difference. Given that this bug lockups up your entire system the problem can't possibly lie with Blender so recompiling it would likely not help (the only thing you could possibly do is try and work out exactly what operation blender did which lead to a lock up and basically see if you could rewrite blender not to do it).

It might be interesting to find out whether disabling RenderAccel (as described here; http://us.download.nvidia.com/XFree86/Linux-x86/1.0-9755/README/appendix-d.html ) helps your stability. Further because your card is relatively modern you have the option to try the nvidia-glx-new drivers. It would be good to know whether your problems still persist with those...

Revision history for this message
Ric95 (pamric-shaw) wrote :

Too early to say if "RenderAccel" "0" helps, but trying the "nvidia-glx-new" broke it bad with a version mismatch. And reverting back to "nvidia-glx" didn't work, I ended up re-installing from scratch!. ( now it occurs to me that I should have purged nvidia, then re-installed. )

Revision history for this message
Ric95 (pamric-shaw) wrote :

Nope. "RenderAccel" "False" ( not "0" btw) Doesn't help.
But at least Blender is the only thing that crashes it. ( pity, Blender is the most important piece of software for me.)
Blender is available in a static build that doesn't use system openGL libraries. I'll try that.
If it works I could submit the static version to repositories, but there are probably too few people who would benefit from that.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
That's too bad. This is a long shot but I'm out of ideas:
Add the following new line just after #! /bin/sh in /etc/init.d/powernowd
exit 0
and then reboot. Any change?

Revision history for this message
Ric95 (pamric-shaw) wrote :

I worried I may be too optimistic, but that seems to have cured it !!!!
On a hunch, I first tried turning off the service... and locked it up 30 seconds later. But then I edited that line in and have been working windows around blender just fine :) Thanks!,... what does "exit 0" do anyhow?

Revision history for this message
Bryce Harrington (bryce) wrote :

'exit 0' simply causes a script to exit. So effectively, that change causes powernowd to exit without doing anything.

I gather this indicates the bug has something to do with power management and cpu frequency scaling.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Exactly what Bryce said.

Turning off services can be a bit fraught and the powernowd service has two different scripts which can cause it to be run. Unfortunately powernowd doesn't seem to support a setting in /etc/default/powernowd to allow users to cleanly disable it in a similar manner to other initscripts hence the need to edit the script directly.

Bryce:
You are bang on the money. This looks like Bug #109643 (or rather that bug looks like this one since this was filed first).

Revision history for this message
Ric95 (pamric-shaw) wrote : Re: [nvidia-glx] Frequent lockups when NV 3d is enabled and cpu scaling is on

I'm just glad we could resolve it :) Thats wierd that Nvidia would effect that, but now that I think about it That sort of makes sense.

Revision history for this message
Bryce Harrington (bryce) wrote :

Ric95, excellent to hear the problem is resolved for you. :-) Of course, the underlying issue of powernowd being bugged still exists, but that's covered by the other bug report.

Sitsofe, cool, that bug already has one dupe for it, so I'll mark this a dupe of it as well, and update its description.

Revision history for this message
Ric95 (pamric-shaw) wrote :

Yes this is definatly a big improvement.
But I still get a lockup very rarely. More like once/ 2 hours rather than once/ 10 min.

Is it safe to uninstall Powernowd ?

Is Nvidia and the coders of Powernowd working to resolve this? ( mabey Nvidia-glx-new already has the fix....)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Drat. It looks like powernowd wasn't the sole cause of your problems. It's probably unwise to uninstall powernowd though. There's nothing the coders of powernowd can do about because their program is just a daemon that asks the kernel to change the speed of the CPU. There's nothing the kernel developers can do because they don't have the source to the NVIDIA binary drivers. If there is a problem with CPU scaling and the NVIDIA binary drivers the only people who can explain where the problem lies and how to fix it are NVIDIA as they are the only ones with the source code to their driver. That's the way it goes with binary only drivers...

As for whether things are any better with nvidia-glx-new... I don't know. In theory your card is supported so if you might be able to try those drivers out (PS I suspect your inability to revert back to the nvidia-glx driver last time was because of Bug #106217 )...

Revision history for this message
Ric95 (pamric-shaw) wrote :

I may try nvidia-glx-new when I have time to tinker with it. I would be nice if there was an easier way to revert back to nvidia-glx. Hopefully with Gutsy both drivers will be in the driver manager. It sounds like they may build a sort of x-org crash recovery system. Cool.
When I do try, what code could completely remove mvidia-glx-new from terminal ? [ sudo purge nvidia ] ?
Then I could [ apt-get install nvidia ]

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Unduplicating this bug based on Ric95's recent comment.

Ric95:
Generally speaking only one driver (the latest driver that supports your card) will be recommended for a given card. Anything else may coincidentally work but won't be "supported" by NVIDIA (but you will have to check with NVIDIA on that).

I suspect an apt-get purge won't be any better than a remove. Use apt to remove the package then remember to go and remove the dotfile mentioned in Bug #106217 afterwards...

Revision history for this message
Ric95 (pamric-shaw) wrote :

Hi again. I've been exploring many other distros and they seem to have the same problems with Nvidia combined with the more recent Linux kernels. A redhat bugchase may implicate the'i2c handler' ( beyond my knowledge what that is).

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote : Re: [nvidia-glx] Frequent lockups when NV 3d is enabled

Ric95:
I suspect your best bet is to talk to NVIDIA directly about this (e.g. via http://www.nvnews.net/vbulletin/forumdisplay.php?f=14 ). Before they look at any problem they will ask you to install the latest drivers from their website. Instructions for doing so on Ubuntu can be found on https://help.ubuntu.com/community/NvidiaManual (and http://www.nvnews.net/vbulletin/showthread.php?t=72490 ) although doing so this certain caveats and will change the areas you can go for support with any future problems.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Can you include the output of
lspci -nn | grep -i nv
in this bug report?

Changed in linux-restricted-modules-2.6.20:
status: Unconfirmed → Needs Info
Revision history for this message
Ric95 (pamric-shaw) wrote :

ric@ric-desktop:~$ lspci -nn | grep -i nv
01:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce 7300 GS [10de:01df] (rev a1)

[I suspect your best bet is to talk to NVIDIA directly about this] . Ya, I would need to use the most up to date driver. They didn't exactly make that easy, what driver is in the gutsy betas ?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Currently _not_ the most up to date one - see Bug ##120943, .

Changed in linux-restricted-modules-2.6.20:
status: Needs Info → Unconfirmed
Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(That should have been Bug #120943)

Ric95:
Just before you switch to newer drivers can you also check whether the module parameter specified in Bug #115267 makes any difference?

BTW: if you want to try building a package of the newer drivers you may want to give Envy (http://www.albertomilone.com/nvidia_scripts1.html ) a try. At least that way it will be easier to uninstall the drivers at a later date.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Could you also post the output of
lspci -vnn | grep -i nv -A 5
?

Revision history for this message
Ric95 (pamric-shaw) wrote :

ric@ric-desktop:~$ lspci -vnn | grep -i nv -A 5
01:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce 7300 GS [10de:01df] (rev a1) (prog-if 00 [VGA])
        Subsystem: Unknown device [19f1:1fe2]
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fb000000 (64-bit, non-prefetchable) [size=16M]

I'll try installing the newer drivers asap.
I also want to try adding 'acpi=off' and 'noapic' to the kernel parameters line in /boot/grub/menu.lst
..but one thing at a time :)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Ric95:
Hopefully you will get this before you switch drivers...

Can you check that using the module parameter specified in https://bugs.launchpad.net/bugs/115267 doesn't make a difference while you are still on the older drivers?

Revision history for this message
Ric95 (pamric-shaw) wrote :

Its hard to say. But I'm now thinking that I'm chasing two bugs that cause the same lockup :(
After making that change as per bug115267 I didn't notice the random lockup, but I still induced a lockup by dragging a window. I'll try to find time to chase that separately.
Would it be possible for the developers to make a hacked kernel ( with debugging features compiled in ) available in repositories ?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

(I'm not an Ubuntu dev). I don't think you are going to have much luck debugging this unless you know how to set up a serial console and I would be surprised if someone built a debug kernel just for you, however there's nothing stopping you rebuilding your own kernel with various extra options. My feeling is that having a kernel with more debug features on would not help you as it sounds like the problem lies in the binary part of the NVIDIA module and you will not have any debug symbols for that (only NVIDIA do). It sounds more and more like your best chance with this is to talk to NVIDIA (http://www.nvnews.net/vbulletin/showthread.php?t=46678 )...

Revision history for this message
Ric95 (pamric-shaw) wrote :

I'm back just to post an update in hopes I can help make Ubuntu the best ever. ( a postcard from the edge ;) )
I've used many Linux distros lately, and almost all had a similar lockup on my hardware. ( note to self: no more compaq )
Now I'm using Sabayon/Gentoo. 2.6.21, 64 bit, it had the lockup problem until someone mentioned using the nvidia driver vers; 1.0.9755-r1. this has completely fixed that random lockup.:)
http://www.sabayonlinux.org/forum/viewtopic.php?f=5&t=8944

I still like Ubuntu. I'm crossing fingers and toes hoping you guys use 2.6.23 with nv 1.0.9755-r1 for 8.04 ;)

Revision history for this message
Ric95 (pamric-shaw) wrote : Re: [nvidia-glx] Frequent lockups when NV 3d is enabled(solved)

My apologies to everyone who tried to help me.
The problem was in my BIOS. When I installed My PCI-Express video card the BIOS set itself to PCI, so the OS would end up looking in the wrong slot for video :(

Revision history for this message
Bryce Harrington (bryce) wrote : linux-restricted-modules-2.6.20 is obsolete

This package has become obsolete so we're closing out the bug report as WONTFIX.
Thanks for reporting it though!

Changed in linux-restricted-modules-2.6.20:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.