nvidia-glx-new: X does not start on reboot. Possible Module Mis-match? (manual install)

Bug #108578 reported by kry10
4
Affects Status Importance Assigned to Milestone
linux-restricted-modules-2.6.20 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: nvidia-glx-new

After installing nvidia-glx-new on Ubuntu Feisty 7.04, X will not start after a reboot. However, I can "fix" this problem by doing the following:

sudo rmmod nvidia
sudo modprobe nvidia_new

Bug resurfaces, however, upon reboot.

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's a copy of my /var/log/Xorg.0.log file:

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Also, here's the relevent info from /var/log/kern.log:

Apr 21 08:09:55 mark-desktop kernel: [ 57.712000] NVRM: API mismatch: the client has the version 1.0-9755, but
Apr 21 08:09:55 mark-desktop kernel: [ 57.712000] NVRM: this kernel module has the version 1.0-9631. Please
Apr 21 08:09:55 mark-desktop kernel: [ 57.712000] NVRM: make sure that this kernel module and all NVIDIA driver
Apr 21 08:09:55 mark-desktop kernel: [ 57.712000] NVRM: components have the same version.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Thank you for your bug report.

kry10:
Could you add the output of
ls -al /lib/linux-restricted-modules
to this bug report?

Was your install a clean install or an upgrade?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

kry10:
Could you also attach the file dmesg.txt produced by running
dmesg > dmesg.txt
to this bug report too?

Setting to needsinfo pending reply from kry10.

Changed in linux-restricted-modules-2.6.20:
status: Unconfirmed → Needs Info
Revision history for this message
kry10 (launchpad-marklinford) wrote :

ls -al /lib/linux-restricted-modules:

total 24
drwxr-xr-x 4 root root 4096 2007-04-20 07:09 .
drwxr-xr-x 19 root root 8192 2007-04-11 13:50 ..
drwxr-xr-x 19 root root 4096 2007-04-20 07:08 2.6.20-15-386
drwxr-xr-x 17 root root 4096 2007-04-20 07:09 2.6.20-15-generic
-rw-r--r-- 1 root root 58 2007-04-16 19:31 .nvidia_new_installed

This was an upgrade from 6.10. Under Edgy, I manually installed the graphic's driver from Nvidia's site, but removed them using their --uninstall switch before installing nvidia-glx-new

Attached below is my dmesg output. Please feel free to contact me if you need any more info. Thanks!

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

kry10:
Your dmesg indicates that something is forcefully loading the wrong nvidia module very early in your boot thus stopping the wanted nvidia module from loading later on. Can you see if there is a mention of nvidia in /etc/rc.local or /etc/modules ?

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's /etc/rc.local:

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's /etc/modules:

Revision history for this message
kry10 (launchpad-marklinford) wrote :

I'm going to comment out some of the modules listed in /etc/modules and see if I can narrow it down. BRB ...

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Well, I just tried commenting everything out of /etc/modules, and I still get the same behavior.

I wonder what else could be trying to forcibly load the old nvidia module?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

kry10:
There was no mention of nvidia in /etc/modules, so changing it wasn't going to have an effect...

What happens when you
sudo modprobe nvidia
and
sudo sh -x /sbin/lrm-video nvidia
? Could you also attach
/etc/default/linux-restricted-modules-common

Revision history for this message
kry10 (launchpad-marklinford) wrote :

ok:

sudo modprobe nvidia: nothing happens

sudo sh -x /sbin/lrm-video nvidia:

+ PATH=/sbin:/bin
+ MODULE=nvidia
+ shift
+ [ nvidia = nvidia ]
+ [ -e /lib/linux-restricted-modules/.nvidia_legacy_installed ]
+ [ -e /lib/linux-restricted-modules/.nvidia_new_installed ]
+ MODULE=nvidia_new
+ XORG=nvidia
+ cat /etc/X11/xorg.conf
+ sed -n -e /^[ \t]*section[ \\t]*"device"/I,/^[ \t]*endsection/I{/^[ \t]*driver[ \t]*/I{s/^[ \t]*driver[ \t]*"*//I;s/"*[ \t]*$//;p}}
+ grep -q -w nvidia
+ modprobe --ignore-install -Qb nvidia_new

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

lrm-video is doing the correct thing. I'm wondering if the mismatch has gone away and the problem you are now seeing is something different.

kry10:
I am starting to run out of ideas here. Could you add
/var/log/Xorg.0.log
/etc/modprobe.d/lrm-video
along with nvidiaetcgrep.txt as generated by
grep nvidia /etc/ -R > nvidiaetcgrep.txt

Could you also do
sudo dmesg -c
then record the output of
sudo modprobe nvidia
dmesg
?

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Strange, huh?

After "sudo dmesg -c", "sudo modprobe nvidia" and "dmesg" return nothing.

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's Xorg.0.log

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's nvidiaetcgrep.tct

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

kry10:
I see your results but I don't understand them. You're right this is weird...

What's the result of doing
sudo dmesg -c
sudo modprobe nvidia_new
dmesg
? Is anything printed?

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Sorry, but:

sudo dmesg -c
sudo modprobe nvidia_new
dmesg

prints nothing :(

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

My bad - if it's already loaded then it wouldn't print anything.

kry10:
/sbin/lrm-video has been hacked up hasn't it? Please upload it to this bug report.

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's /sbin/lrm-video. To the best of my knowledge, it's never been altered (at least by me :) )

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

kry10:
Did you ever use any 3rd party tools to enable the nvidia driver? Turns out I got the wrong file. I meant to ask for:
/etc/modprobe.d/lrm-video

Revision history for this message
kry10 (launchpad-marklinford) wrote :

Here's /etc/modprobe.d/lrm-video .

This box was an upgrade from edgy to feisty. Under edgy, I used Nvidia's video driver 97.55 downloaded from their website. After I upgraded to feisty, I uninstalled the Nvidia driver using their tool (sudo sh NVIDIA-Linux-x86-1.0-9755-pkg1.run --uninstall), then installed nvidia-glx-new using Synaptic

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

/etc/modprobe.d/lrm-video has definitely been nobbled and that sort of thing can't possibly happen by itself...

kry10:
At any rate, can you uncomment the commented out install lines in that file and then reboot and indicate what the result is?

Revision history for this message
ephem (ephem23) wrote :

I see the exact same behaviour since upgrading to feisty, although with a slightly older card that needs the 'nvidia' module, as opposed to 'nvidia_new'.
I don't remember exactly, but I might have used the nvidia-installer somewhere in the past.

Symptoms are: first it works fine, but after a reboot, X can't start.

When I try to load the nvidia kernel module, "modprobe nvidia" says:
FATAL: Could not open '/lib/modules/2.6.20-15-386/volatile/nvidia.ko': No such file or directory

I can "repair" my ubuntu by doing a
sudo apt-get install --reinstall linux-restricted-modules-2.6.20-15-386
after that,
sudo /etc/init.d/gdm restart
successfully starts X

That package doesn't contain the missing file, but reinstalling seems to create it in a way.
Of course, it's gone when I boot the system again.

I did all the checks mentioned above in this thread and see the same things like kry10.

Revision history for this message
kry10 (launchpad-marklinford) wrote :

SUCCESS!

After uncommenting the commented out lines in /etc/modprobe.d/lrm-video, X starts without any problems. I've done a few reboots, just to confirm, and they all work correctly.

I'm not sure how lrm-video got "nobbled." As I mentioned above, the only method I used was Nvidia's --uninstall switch - maybe they're messing with that file? If so, of course, it's not our fault ...

Please let me know if you'd like any additional info. Thanks!

Revision history for this message
ephem (ephem23) wrote :

Now it also worked for me, but I did something different.

I had a faint recollection of using envy to install the nvidia binary drivers, but had later used the driver coming as a "restricted" ubuntu package.
In an attempt to reverse the process, I removed linux-restricted-modules-2.6.20-15-386 and nvidia-glx, installed envy, used envy to install the driver, then remove it again, and installed linux-restricted-modules-2.6.20-15-386 and nvidia-glx again. All in one go, without rebooting or restarting X.

And it worked. I rebooted a few times, and it's fine now.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

ephem:
I think your issue is unrelated to this particular bug.

kry10:
I would dearly love to know how those lines became commented. If this style of change is commonly being done (e.g. by a 3rd party tool) then I think I'm just going to have to stop looking at these NVIDIA binary driver bug reports because the chances of finding the cause of the problem are vanishingly small. It took us 3 days to find this one and we got lucky. Thanks for your patience and let's leave this bug open while I think about it a bit more.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Setting back to unconfirmed because kry10 promptly answered all queries.

Changed in linux-restricted-modules-2.6.20:
status: Needs Info → Unconfirmed
Revision history for this message
kry10 (launchpad-marklinford) wrote :

Sitsofe:

It was not a bother at all. I'm just glad you discovered the cause, even if we got lucky in this case.

FWIW, my computer at work, while a different setup, had a similar upgrade path (3rd party nvidia Edgy to Feisty), and lrm-video was not touched at all. Maybe someone is sneaking into my house and commenting code when I'm not home? :) In all likelyhood this probably was just an isolated case.

I'm happy to help if you'd like any additional info. Otherwise, I'd consider this issue resolved. Thanks!

Revision history for this message
Yongsu Park (pcpenpal) wrote :

I wrote my thinking at https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules-2.6.20/+bug/107646 .
I think this bug is critical. Maybe all Feisty user who has a recent nVidia card experiences it.

Timo Aaltonen (tjaalton)
Changed in linux-restricted-modules-2.6.20:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.