latest kernel(2.6.20-16.28) update gives boot problems

Bug #117314 reported by Pieter Lexis
66
Affects Status Importance Assigned to Milestone
linux-source-2.6.20 (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Binary package hint: linux-image-2.6.20-16-generic

The boot hangs at Loading hardware drivers. i have not yet run the recovery boot up. but ore information can be found in this thread @ Ubuntu Forums: http://ubuntuforums.org/showthread.php?t=456662

The easiest way to reproduce is to update and reboot

Revision history for this message
JAB van Ree (javanree) wrote :

Same here, 2.6.20-15 gave all my IDE devices sdx names as well on an IBM Thinkpad R50.
Now with 2.6.20-16 my drives are named hdx again and thus several partitions dont get mounted correctly, HIGHLY annoying.
Please revert to 2.6.20-15 behaviour.

Revision history for this message
Chris Chalvantzis (chalvantzis) wrote :

Same problem here. It say that it lost interrupt with my hd ' s (the sata ones propably) and it doesnt boot. If i press ctrl-alt-delete it reaches the phase when Xserver loads but it brokes down (normaly) because i have used the official nvidia drivers and the kernel module need to rebuild. But when i login using the console i find no hd's at /media/<my mount poins>. Also my system dont shutdown/restart. I should do it from the case... I had not such problems with 2.6.20-15 kernel...

This is my south bridge:

00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)

Revision history for this message
markh (mark-hickmore) wrote :

Similar problems, 2.6.20-16 does not seem to find my root file system even though it is specified with a root=UUID in the menu.lst

Revision history for this message
Brade (bradezone) wrote :

similar problems. ubuntu no longer auto-mounts my two windows partitions. I use the automatix read/write NTFS utility...

Revision history for this message
yey365 (yey365) wrote :

2.6.20-16 will not boot on my centrino laptop, nor will 16 generic. The recovery modes will boot. System fails on screen after grub, the hard disk activity light flickers then nothing.

Revision history for this message
Fabio Marzocca (thesaltydog) wrote :

I have a problem with 2.6.20-16 too, even if not so serious as the posts above. My mouse wheel is not working anymore after rebooting.

This is xorg.conf input section:

Section "InputDevice"
 Identifier "Configured Mouse"
 Driver "mouse"
 Option "CorePointer"
 Option "Device" "/dev/input/mice"
 Option "Protocol" "ImPS/2"
 Option "ZAxisMapping" "4 5"
 Option "Emulate3Buttons" "true"
EndSection

It was working with previous kernel.

Revision history for this message
Dama (heppu02) wrote :

Hello.
I'm getting "hde lost interrupt" during kernel boot and then it freezes.
I'm able to boot normally with the old kernel (2-6-20-15).

Specs:
Ubuntu 7.04 32-bit.
Pentium 4 2.8GHz
Abit IC7
2x IDE & 1x SATA HDD

Revision history for this message
Erik (launchpad-emerkle) wrote :

After I upgraded, the grub menu.list file was incorrect. The entry for 2.6.20-15 looked like this BEFORE:

title Ubuntu, kernel 2.6.20-15-generic
root (hd0,1)
kernel /boot/vmlinuz-2.6.20-15-generic root=UUID=6430bdad-99af-4e51-8552-4f50510924c1 ro quiet splash
initrd /boot/initrd.img-2.6.20-15-generic
quiet
savedefault

AFTER the upgrade, my 2.6.20-16 AND 2.6.20-15 entries were altered so that neither was bootable. The offending line seems to be the "root" line as it was changed to "(hd1,1)" instead of the previous "hd(0,1)". I manually adjusted this and was able to boot BOTH 2.6.20-15 and 2.6.20-16.

After upgrade: (before I manually fixed it)
title Ubuntu, kernel 2.6.20-16-generic
root (hd1,1)
kernel /boot/vmlinuz-2.6.20-16-generic root=UUID=6430bdad-99af-4e51-8552-4f50510924c1 ro quiet splash
initrd /boot/initrd.img-2.6.20-16-generic
quiet
savedefault

Revision history for this message
Erik (launchpad-emerkle) wrote :

Forgot to mention, the system I upgraded in my previous post had no SATA drives, all drives are PATA in that box. I have not yet upgraded my machine that has a mix of SATA and PATA drives.

Revision history for this message
Lou Quillio (public) wrote :

Solution for me was to address volumes in fstab by their UUID.

When I first saw 7.04's new UUID convention I didn't like it, didn't at all appreciate the automatic edits to my fstab, and didn't like how a portion of my hard-won (though limited) understanding of volume mounting had been broken without announcement. So I immediately changed my fstab volume names back to traditional `/dev/sda<n>`.

Something about the -16 patch caused bad superblock reads on _some_ of my volumes. Not the root/boot one, but the one I use for /home and another. When I changed my fstab entries to UUID the bad reads went away. fsck _did_ turn-up and fix some garbage on my root volume on re-boot.

The effect of all this, intended or not, seems to be that you must use the UUID notation. That may not be literally true, but as a practical matter I took it as sign to stop resisting. This sort of thing will keep coming up, I think.

Learn volume UUIDs with `vol_id -u /dev/sda<n>`, probably as root. In fstab, replace volume names (the filesystem column) with `UUID=<UUID string from `vol_id -u>`.

All's well now. I documented the `vol_id` syntax in comments to my fstab, so I won't have to look it up next time. HTH

LQ

Revision history for this message
Vajra Vrtti (vajravrtti) wrote :

In my case, kernel hangs during boot with a X cursor frozen in a black screen.
Uninstalling nvidia-glx and falling back to the 'nv' driver did not solve the problem.
2.6.20-15 works fine.

Revision history for this message
hotani (hotani) wrote :

This update incorrectly changed Grub on my system (as others have experienced), and left me unbootable.

After reading the above, I changed the incorrect root entry in grub from (4,1) to (0,1) which fixed it.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

My Fstab has always had uuid since edgy.

yet still my system stops dead with a flashing text cursor in the top left

no error messages

going back to 2.6.20-15 all OK.

excerpt from my fstab.....

# /dev/hda2 -- converted during upgrade to edgy
UUID=e9533fb2-8c39-4ea6-b14a-da0a3475a503 / ext3 defaults,errors=remount-ro 0 1
# /dev/hda1 -- converted during upgrade to edgy
UUID=89a2980a-997d-4f56-a34b-e2f58bed6439 none swap sw 0 0

Revision history for this message
yey365 (yey365) wrote :

OK it gets stranger. When attempting to boot 2.6.20-16 the process stops just after system starting message and sits there. Entered ctrl-alt-f1 and saw message relating to what appears to be fstab commands. Last line says press enter to continue; pressed enter and successfully booted to the kernel. All seems to be working; inc. beryl, vmware-player, video, sound, et al.

Revision history for this message
yey365 (yey365) wrote :

OK, further reviewing has narrowed this down to the rebadging of storage devices from sda to hda. Other users have arrived at the same conclusion and I now concur; in 2.6.20-15 my laptop hard disk is sda1, in 2.6.20-16 it is hda1.

Revision history for this message
yey365 (yey365) wrote :

Here is the content of the startup screens:

Starting up...
Loading, please wait...
kinit: name_to_dev_t( /dev/disk/by uuid/440adce5-08d1-48d5-8f6b-946ed16c9d82) = hda2 (3,2)
kinit: trying to resume from /dev/disk/by uuid/440adce5-08d1-48d5-8f6b-946ed16c9d82
kinit: No resume image, doing normal boot...
resume: libcrypt version: 1.2.3
resume: Could not start the resume device file
Please type the file name to try again or press ENTER to boot the system

Hope this helps.

Jim

Revision history for this message
claudio@ubuntu (claudio.ubuntu) wrote :

same here: sd* -> hd*, so system is unbootable.

Revision history for this message
Erik (launchpad-emerkle) wrote :

Update...

I posted earlier that after upgrading, my grub menu.list file was not correct. Apparently, the "groot" option in the file was not set correctly (maybe on install?). After setting it to "(hd0,1)", update-grub seems to correctly set the "root" lines for my grub entries.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

OK pined my problem down to my resume settings
cant use uuid for swap as it changes from time to time
It worked fine on -15 using /dev/sda1 in menu.lst and fstab

With -16 it doesn't matter whether I use /dev/hda1 or the UUID the boot mesage complains that it cant resume the image (even though it should be a clean boot (not from suspend)).

Revision history for this message
Vajra Vrtti (vajravrtti) wrote :

As I said before, in my case, kernel hangs during boot with a X cursor frozen in a black screen.
2.6.20-15 works fine.
This post is to attach my syslog.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

OK just a quick message because it looks like this bug is going to be popular. If you are using 3rd party binary drivers not provided by Ubuntu (or installed using a 3rd party tool) then bugs.launchpad.net is not the right place to seek support. Please use one of the support methods described on http://www.ubuntu.com/support/communitysupport instead.

Changed in linux-source-2.6.20:
status: Unconfirmed → Confirmed
Revision history for this message
hardyn (arlenn) wrote :

I respect that these kind of things happy from time to time, when when a update like this looks like it might pose a problem for a great number of users, It would be really nice if Ubuntu or an Ubuntu dev. could make some sort of a statement acknowledging that they recognize that their is a problem and a solution is on the way; or there will not be a fix for some reasons... i think it would go a long way in defusing the anxiety behind such trouble.

thanks.

Revision history for this message
José Tomás Atria (jtatria) wrote :

There are various problems reported here.

One of them is related to the relabeling of storage devices form sd** to hd**, wich produces a IRQ problem, and then an error message about DMA interrupt and DMA interrupt recovery. This is all related to disks not been found or not been properly identified byt eh kernel.

Some people have said that they got it to boot by changing the refeernce in menu.lst back to hd** or using uuid's. I can confirm that this does NOT work on my system, as i use only UUID's references for my boot images. The 2.6.20.16 still refuses to load and hangs right there.

I can also confirm that this problem affects the normal bot mode as well as the recovery mode.

2.6.20.15 loads fine, no problems at all, using the exact same menu.lst and fstab files.

Besides that particular problem, there are people reporting issues with third party drivers, X server not loading and wireless cards failing to work. Please bear in mind that these issues are all related to modified kernel modules, and DO NOT apply to the kernel itself. it is quite normal to ahve to reconfigure yor kernel modules if you have made changes, like installing propietary graphic or wifi drievers. This is NOT the place to report these.

Please stay on topic in regards to the bug reported: kernel image fails to load and reports disk issues and DMA interrupts and recovery.

ps: i would submit a copy of the boot log, but i don't know how to recover it or reproduce it outside tty1. sorry.

Revision history for this message
Vajra Vrtti (vajravrtti) wrote :

@Gorgonzola
"Besides that particular problem, there are people reporting issues with third party drivers, X server not loading and wireless cards failing to work. Please bear in mind that these issues are all related to modified kernel modules, and DO NOT apply to the kernel itself."
My system hangs apparently while loading X server. Since I have installed everything from the Ubuntu repositories that should be correctly handled by the 'linux-restricted-modules-2.6.20-16-generic' package. It was NOT.

Revision history for this message
José Tomás Atria (jtatria) wrote :

@Vajra
Yes, it should have been handled correctly, but there's no indication in your problem reports that it is caused by a bug in the kernel itself.

i'm not saying that these are not important issues, i was just pointing out that the original problem reported is related to a rather specific and critical aspect of the kernel, as disk recognition and handling wich prevents it from loading and hangs the machine.

If your system hangs when loading X server the kernel has already booted and it's up and running. So it's a diferent kind of bug... aparently not in the kernel itself. :)

greetings!

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Hi All,

If you are getting the hangs early in boot symptom, I think this is related to a problem with resume.

On my system the boot process is attempting to resume a non existent image and stops when it fails???.

You can check this for yourselves.

when system has stopped press ctrl alt F1 and read the message press reurn to continue booting (system should boot normally).

It doesn't matter how you address the resume= via /dev/sd, /dev/hd or UUID= the kernel still hangs at this point (probably because the boot image references a non valid swap partition (probably sd).

I'll investigate further (guess its help yourselves again after the devs muck up) the silence is deafening.....

Revision history for this message
ccfiel (ccfiel) wrote :

I also have a problem with 16. after reboot it just display black screen with a cursor. this is my first time to report a problem. what shall i send?

chris ian fiel

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Gorgonzola:
Your problem seems very clearly defined. Is there any chance you file a new bug report and post a link to the new bug back here so it doesn't get lost? Please also include the output of lspci and lsmod in the new bug report. Thanks!

Revision history for this message
rodia (gmathar) wrote :

PROBLEM:
On my system, X is unable to load the NVIDIA binary driver after the kernel update. Not the famous "kernel module and X driver version mismatch", the X log just says it encountered an error loading the NVIDIA driver module. The package versions are:

nvidia-kernel-common 20051028+1ubuntu7
nvidia-glx-new 1.09755+2.6.20.5-16.28
nvidia-glx-new-dev 1.09755+2.6.20.5-16.28

WORKAROUND:
I choose the pre-update kernel version in the boot selector screen and it's all back to normal.

OPINION:
This is exactly the kind of **** that keeps Linux from being ready for prime time. Some poeple should check their priorities maybe.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Vajra:
Can you file a new bug report and include the output of
lspci
/etc/X11/xorg.conf
and
/var/log/Xorg.0.log
in it? Can you also indicate whether you can here the log in sound and whether caps lock works even though the screen is black? If you are running binary drivers could you switch to binary drivers and again report whether the screen is black? Can you then include a link to your newly created bug report within in this bug. Thanks!

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

OK heres the hack to fix once you've established it is the resume issue (see above)

all 3 locations must exist and be either UUID= or hd= (for ide) (I use the physical (hda1) as UUID moves on swaps for me so example is for /dev/hda1)

menu.lst
fstab
and /etc/initramfs-tools/conf.d/resume

ensure /boot/menu.lst resume= entry is resume= /dev/hda1
/etc/fstab is /dev/hda1 none swap sw 0 0
/etc/initramfs-tools/conf.d/resume is RESUME=/dev/hda1

then issue update-initramfs -u
and reboot

BIG FAT WARNING --- if you muck this up your system will not boot again...(for ex windows users this is more dangerous than messing with the registry)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Andrew:
Interesting theory with suspend... Can you try booting with the noresume kernel parameter in grub and reporting your results back?

Revision history for this message
MarkNZ (czpinfhkmwqkhnbv) wrote :

I'm running Feisty on a Dell Latitude D505 and everything has been working great until I updated the kernel through update manager this morning.

I've moved /home to it's own partition some time ago using [URL="http://ubuntu.wordpress.com/2006/01/29/move-home-to-its-own-partition/"]these instructions[/URL], and it's been working fine until the kernel upgrade today. Ubuntu doesn't seem to recognise that /home is mounted to another partition, even though the entry is still in /etc/fstab .

My /etc/fstab is below:

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/sda6 /home ext3 nodev,nosuid 0 2
# /dev/hda3
UUID=f74ca020-c668-4657-a71c-860a2b29a5c7 / ext3 defaults,errors=remount-ro 0 1
# /dev/hda5
UUID=7e54cd1f-0dd3-42ad-88df-34f36859f389 none swap sw 0 0
/dev/cdrom /media/cdrom0 udf,iso9660 user,noauto 0 0
/dev/sda1 /media/windows ntfs nls=utf8,umask=0222 0 0
/tmp/app/1/image /tmp/app/1 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/2/image /tmp/app/2 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/3/image /tmp/app/3 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/4/image /tmp/app/4 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/5/image /tmp/app/5 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/6/image /tmp/app/6 cramfs,iso9660 user,noauto,ro,loop,exec 0 0
/tmp/app/7/image /tmp/app/7 cramfs,iso9660 user,noauto,ro,loop,exec 0 0

The new /home partition is definitely /dev/sda6 and the filesystem is also definitely ext3, so that looks fine to me. Any idea where I am going wrong?

Also, I noticed there are a few other files in /etc such as fstab.edgy, fstab~ and a few others which I'm assuming are past versions, is it possible that Ubuntu is reading one of these files?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Gorgonzola:
It looks like the issue you described is also reported in Bug #116996 ...

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Sitsofe My system is now booting perfectly with 2.6.20-16.28 so I don't think adding no-resume will affect it.

It was definatly the '/dev/sda1' in /etc/initramfs-tools/conf.d/resume. causing the 'hang with flashing cursor' which isnt really a hang its the system waiting for acknoledgment behind the usplash.

Hance pressing ctrl alt F1 and enter gets the system booting.

All I've got to remember now is how I was hibernating (think I was using a sofware solution) so I can modify it to use /hd not /sd , Is there any plans to alter this again within a production Release as this seems ludicrous to me , whats the dvelepment version for??.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

If you wish I can deliberatly bust my 2.6.20.-16.28 kernel to how it was.

then try noresume, Though I'm not sure that'll fix it as it depends on whether the noresume option is before the 'does image exist code'

I believe the initramfs kernel is functioning correctly

1 check if image exists
2 error partition not valid (/dev/sd1 in initranfs from conf file)
3 display error and await user input.

I think the problem is the use of UUID for swap partitions needs rethinking as the UUID will alter with fairly usual manipulation of the swap. this them breaks booting.

In my case its because I'd changed my initramfs to use /dev/sd1 but I'll bet my last Dollar that in most cases its because users have run mkswap which has changed their UUID .

Revision history for this message
Ben Collins (ben-collins) wrote :

Phillip, please check into this.

Changed in linux-source-2.6.20:
assignee: nobody → phillip-lougher
importance: Undecided → High
status: Confirmed → Unconfirmed
Revision history for this message
Chris Chalvantzis (chalvantzis) wrote :
Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Ok pinned the last bit of the puzzle down my suspend....
I'm using uswsusp so in /etc/uswsusp.conf I had /dev/sda1 (as that was the correct swap drive)

so when the new kernel installed the and the devices no longer matched the initrd image no longer had uswsusp in it.

Therefore correcting uswsusp (/dev/hda1) got suspending working again but of cause it wouldn't resume...

Dpkg-reconfigure uswsusp fixes initrd and now only 24 hours after the latest upgrade my system is back to working.

my main concern is that the Devs will now 'fix' this latest drive naming change and thus break my system yet again... at least I can fix it quickly now...

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Phillip,

I take it back even though you didn't mention it in the change log the TIFM fixes are in this kernel, If not then strangely my SD works.

Revision history for this message
Mark (phenix1) wrote :

a friend of mine have had the same problem as erik, in the new menu.lst root was set to (1,0) instead of (0,0), maybe grub does not like to be installed in a slave hard drive. In addition the section in menu.lst that booted windows disappeared completely

Revision history for this message
Carlos Blanquer Bogacz (cblanquer) wrote :

Upgraded kernel seems to recognise my SATA HDD as hde and gives an error message "hde lost interrupt" during kernel boot.
No boot is conclude din a reasonable amount of time.
It should be sdx.
2-6-20-15 works just fine.

Revision history for this message
Vajra Vrtti (vajravrtti) wrote :

@Sitsofe Wheeler
I hope I did it right. That was my first bug report.
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/117621

Revision history for this message
José Tomás Atria (jtatria) wrote :

@Sitsofe Wheeler
You are right, my problem is discussed in detail in bug#116996, this discussion should concentrate on the X server and AGP related versions of the problem.

I'm off to #116996.
Good luck!

Revision history for this message
Phillip Lougher (phillip-lougher) wrote : Re: [Bug 117314] Re: latest kernel(2.6.20-16.28) update gives boot problems

Andrew Waldram

Yes the TIFM fixes are in this kernel. The changelog should have
mentioned an update to the tifm 0.8d driver.

Phillip

Revision history for this message
LarsBjerregaard (lars-rubyglow) wrote :

The unofficial survey going on here: http://ubuntuforums.org/showthread.php?t=456662&page=20 starting post#195 might yeld some clues, and it would seem there's a lot of folks with Intel ICH4 and ICH5 there.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Ok I can't prove this as I went round the houses fixing up my system.

But I reckon if you have the message about resume file (after pressing ctrl alt F1)

then simply press enter to continue boot.

once in x start a terminal window

and issue

sudo update-initramfs -u
and
sudu dpkg-reconfigure uswsusp (if you using this method for suspend) and select your swap device (now NOT /dev/sdxx

this should update the initrd with the correct UUID for the swap and then reconfigure uswssusp in the initrd to support it.

If this works please report back here so others can be sure.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Also of note with this new kernel.

I have a new ghost CD-ROM01 in Gnome its at /dev/hdc and doesn't exist

the real device is at /dev/hdb

Revision history for this message
hardyn (arlenn) wrote :

Andrew...

same with me, although mine was /dev/scd0
I have commented it out in /etc/fstab... i don't know if this a correct procedure or not.

other than that, things seem to be okey.

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Think thats probably the way to get rid of it.
or correct it to the right setting, but doesn't appear needed.

Could be left over from a previous upgrade.

Revision history for this message
LarsBjerregaard (lars-rubyglow) wrote :
Revision history for this message
BatteryKing (jmcsnyder) wrote :

I recently ran the update from 6.10 to 7.04 and on first boot I had options between kernel 2.6.20-16-generic and 2.6.20-15-386.
Kernel 2.6.20-15-386 works fine.
Kernel 2.6.20-16-generic does not boot at all. By default on the splash screen I get a thin sliver on the progress bar and after leaving the system alone for several minutes no progress is made. Disabling the splash screen all I get is a PCI error allocating a memory resource (which I also get in the working -15 kernel on the first line at boot) and a message saying "Loading, please wait..." and nothing more.

Revision history for this message
BatteryKing (jmcsnyder) wrote :

I figured out a way to get things working. I completely de-installed the -16 kernel and modules and such and re-installed from the command prompt `apt-get install kernel-generic`

Revision history for this message
hardyn (arlenn) wrote :

BatteryKing:

i would think that would give you .16 back again; could you check that?

Revision history for this message
André Fettouhi (a-fettouhi) wrote :

The backports modules are also missing for the latest kernel 2.6.20-16.28. The linux-backports-modules-generic package is available for update but the linux-backports-modules-2.6.20-16-generic which itr depends on isnt available at all...

Regards

André

Revision history for this message
davidr (davaweb) wrote :

Same here.

Stops loading at the hardware drivers.

There is nothing I can add to the above posts other than I have removed 2.6.20-16

Revision history for this message
LarsBjerregaard (lars-rubyglow) wrote :

I think klmonz in the appended bug-description at the top of bug https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/116996, hits the nail on the head.

Reading through the extensive thread in http://ubuntuforums.org/showthread.php?t=456662, it seems obvious, that Debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=419458 describes the core problem with disks, which is the same as *this* bug.

This is FIXED in Debian! I'm sorry, but I have to say that I find it disheartening, that bug https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/116996 is still unconfirmed+undecided, bug https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/117447 is unconfirmed+undecided, and bug https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/117314 is unconfirmed, though recognized as status high.

Please devs! You are probably overburdened, that's understood. BUT.... this bug is wrecking the systems of a hell of a lot of users, and if far worse than the X-oops update you released some time ago. This one is GRAVE, as the Debian bug correctly states. Please please... fix this. Thank you.

Revision history for this message
Phillip Lougher (phillip-lougher) wrote :

On 5/31/07, LarsBjerregaard <email address hidden> wrote:

> This is FIXED in Debian! I'm sorry, but I have to say that I find it
> disheartening, that bug https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/116996 is still unconfirmed+undecided, bug
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/117447 is unconfirmed+undecided, and bug
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/117314 is unconfirmed, though recognized as status
> high.
>
> Please devs! You are probably overburdened, that's understood. BUT....
> this bug is wrecking the systems of a hell of a lot of users, and if far
> worse than the X-oops update you released some time ago. This one is
> GRAVE, as the Debian bug correctly states. Please please... fix this.
> Thank you.

I'm monitoring the situation, and trying to decide what is the best
way to resolve this.

The reversion from libata for PATA to the older IDE drivers was done
to fix a large number of bugs people were experiencing with the feisty
final kernel. Happily it has fixed a number of problems, and many
systems are now correctly working for the first time.

Unhappily, the bad behaviour that other people are experiencing with
the reversion wasn't intended. A wholesale re-reversion of the libata
isn't an option because this will cause devices to change again for
everyone, and everyone will be quite understandably upset.

The ideal solution to this is to selectively re-revert to libata for
the people experiencing these problems. Unfortunately, this takes
time to work out exactly what hardware isn't working and to only
revert that. Please continue sending bug reports.

Thanks

Changed in linux-source-2.6.20:
status: Unconfirmed → In Progress
Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

 Phillip Lougher

I can see your between a rock and a hard place ,

ICH6 appears to function correctly under either libata or PATA, Hence the lack of bug reports (besides me) for this chipset.

I don't think it matters which way you go with this for this chipset as I appear to be the only ICH6 person who noticed the drive renaming and I'm not the screaming kind.

2 others noted the ghost cdrom for the fstab entries but dont think that'd cause much fuss if you reverted back to libata.

What would be cool is actually knowing because if I'm staying PATA I'll get hdparm out and ramp up my HD performance.

Also there appears a number of issues using UUID for swap , I came across it a while ago UUID altering on swap over time.
I also see a number of reports in the forums of people stating their swap changes UUID between boots.

Could it be the some hibernate methods clear the swap (ala mkswap) thus generating new UUID's and leading to the resume issue on kernel change.??

I wouldn't have seen this besides having changed my swap over to physical drive number to keep my system functional across hibernations (uswsusp) shortly after the introduction of UUID. Hence me noticing the driver change (blank flashing cursor tends to focus attention)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Lars:
Your comment links and forum checking are invaluable but can you ease up on the bug comment cross-posting? Rather than posting in three different bugs simultaneously confine the comment to this bug initially because if there is something people disagree with in your comment they wind up having to reply on all three bugs too...

(Please don't stop commenting in this bug though! I find your small number of targeted forum links especially useful!)

Andrew:
It is probably worth spinning a new bug off about the swap/UUID problem. I've now been told that the swap UUID can change
a) If you use uswsusp
b) If you install another Linux distribution (swap is often reformatted).

It might be possible to do something about a) by making it recreate the swap with the same UUID as it had before. If you do spin off a new bug can you include a link to it here?

Revision history for this message
Andrew Waldram (andrew-waldram) wrote :

Sitsofe:

Done - UUID can change on swap breaking swap mount and Hibernation- bug 118199

I've logged it against the kernel though its really Ubuntu policy in my opinion.
please excuse my plagiarization of your comments.

Revision history for this message
LarsBjerregaard (lars-rubyglow) wrote :

Phillip: Thanks very much for taking this on. I do find the "wholesale" changeover in an otherwise "routine" kernel update mid-release slightly discomforting though...
Sitsofe: Yes, I can ease up on the cross-posting. I know it's not normally good form. However, in this case it's had the desired affect, and I see it's now getting attention, so - no regrets ;-)

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Lars:
Somehow I knew you were going to say that : )

Revision history for this message
loko (arph) wrote :

i also have boot-problems after upgrading the kernel.

at start, i get the error-message that apt is not installed and i should do "apt-get install aptitude" which is kind of funny ;-)

but, if i start with the old kernel again, i don't get this error. this is strange

Revision history for this message
Alex Muntada (alex.muntada) wrote :

I'd like to share my experience with this issue in my laptop with PATA disk: as some of you have commented above, I was using /dev/sdXX syntax in menu.lst and fstab since first Feisty alphas. After upgrading to 2.6.20-16.28 the laptop just couldn't mount root partition at boot time. Previous kernel version still worked, fortunately.

Moreover, my root partition, whis is reiserfs, didn't have an UUID at all, so I had to boot a live CD and set a new UUID for it with 'reiserfstune -u `uuidgen` /dev/sda2'. Then, I replaced all the /dev/sda2 references in fstab and menu.lst with the new 'UUID=...' thing. Finally, rebooted and it just works fine now, even suspend to disk.

BTW, I also replaced /dev/hdc from fstab with /dev/cdrom (a symlink to hdc) just in case.

HTH

Revision history for this message
Martin Pitt (pitti) wrote :

Philip,

> Unhappily, the bad behaviour that other people are experiencing with
> the reversion wasn't intended. A wholesale re-reversion of the libata
> isn't an option because this will cause devices to change again for
>everyone, and everyone will be quite understandably upset.

Sorry, but that's precisely what we have to do, and ASAP. Please let me reiterate that the purpose of -security is *not* to fix hardware where the release did not install before. That's bad luck. Breaking hardware where it did install before is absolutely not acceptable.

Can you please prepare a feisty-security upload that reverts this change to the state of the release? Thank you!

Changed in linux-source-2.6.20:
importance: High → Critical
Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Martin:
Since it looks like we are headed for another change I think the word has to be put out to those who have gone and relabelled their /etc/fstab with the /dev/h??? syntax that we are now going back the other way. The folks over in bugs like Bug #94119 need to be warned that their CD drives are going to go back to being broken. Perhaps people need to be told what grub /module options can be added to diable ata_piix. People need to be warned not to upgrade to -16. People need to prepare for another round people having binary driver breakage (both due to manual compiles but there also seems to be some percentage of people who this happens to who use lrm - perhaps there's a depmod race...) . Perhaps some public statement can be put out explaining what has happened and who it has happened to. Does the update go out as -security given that it's not fixing a security problem any more? What does -security mean?

Also what happens to the current renaming bugs? Are they just closed? Do we just open a new bug if/when another "great renaming" happens? Is it worth trying to fix the PATA bugs being masked on SATA machines? Do we just close those bugs too?

Revision history for this message
ifthengoto (febwa1976) wrote :

Don't know if this is relevant.

I was one of those that could not use the 2.6.20-16 version when I did the recommended update. Because of other reasons I had to reinstall Feisty over the weekend and voila - the 2.6.20-16 now works alongside the 15 version perfectly!

(I did change my configuration in the new install (for other reasons).

In the old install I had the MBR on my XP primary drive and the Feisty install on a USB drive.

I now have the MBR on the USB drive with Feisty (when I need XP I use BIOS menu).

The other effect of this is that for the first time ever I have not had to make any changes to my menu.lst file at all as it was configured perfectly during the install.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 117314] Re: latest kernel(2.6.20-16.28) update gives boot problems

Hi, Sitsofe Wheeler [2007-06-01 16:44 -0000]:
> Martin: Since it looks like we are headed for another change I think
> the word has to be put out to those who have gone and relabelled
> their /etc/fstab with the /dev/h??? syntax that we are now going
> back the other way. The folks over in bugs like Bug #94119 need to
> be warned that their CD drives are going to go back to being broken.
> Perhaps people need to be told what grub /module options can be
> added to diable ata_piix. People need to be warned not to upgrade to
> -16. People need to prepare for another round people having binary
> driver breakage

Right, all of this needs to happen in the USN. The USNs already
explain the ABI change consequences.

> Does the update go out as -security given that it's not fixing a
> security problem any more? What does -security mean?

Yes, it needs to go to -security, since it was broken in -security.
The problem was that -security was abused to introduce changes which
were not security related, so we cannot just let them stay there.

> Also what happens to the current renaming bugs? Are they just closed? Do
> we just open a new bug if/when another "great renaming" happens? Is it
> worth trying to fix the PATA bugs being masked on SATA machines? Do we
> just close those bugs too?

I leave that to the kernel folks.

Revision history for this message
Matthijs De Smedt (matthijs--) wrote :

I also had problems with 2.6.20-16.28, the same "Loading hardware drivers" message.

When I saw some people were experimenting with irqpoll, I immediately thought about PNP. I disabled "Plug & Play OS" in my BIOS, and poof, 2.6.20-16 boots fine. If you still have problems, please try this and report your experience.

Revision history for this message
Matthijs De Smedt (matthijs--) wrote :

A little addendum. Booting normally works sometimes without irqpoll, and as far as I can see with irqpoll it works always.

Revision history for this message
AllanEising (allan-eising) wrote :

I also experience the exact same behavior regarding my SATA drive changing from /dev/sdc to /dev/hde and not being able to boot afterwards. Is there any preliminary solution to this yet?

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

AllanEising:
Try and remove all instances of /dev/s??? from your /etc/fstab , /boot/grub/menu.lst and /etc/initramfs-tools/conf.d/resume files and replace them with UUIDs. The
blkid
command will list the UUIDs for all partitions. Do this in 2.6.20-15 so you can see the "old" /dev/s??? syntax and match it to its UUID. Please be careful and *take a back up of your /etc/fstab* because failure to make this change correctly can stop your system booting into any kernel. I repeat, this change is not for the faint of heart (although if you do slip up you can probably use a LiveCD to copy the backup /etc/fstab back to the right place).

I'm not sure what the best practice for DVD/CDROMs is /dev/s??? references is... I'm speculating that using the /dev/cdrom symlinks might be OK over in Bug #117413 but haven't seen any word that this is the "correct" thing to do.

Revision history for this message
Wilbur Harvey (wilbur-harvey-spirentcom) wrote :

0) All 5 of my systems have nvidia chipsets (some AM2, some socket 939)
1) on all of my systems, the PS-2 mouse doesn't work.
2) nvidia-glx drivers are not working
3) all my hard drives are sata (4 raid installs) and all work fine
4) the cdrom drives, which used to be /dev/hda are all not available. There is no more /dev/hd*, the cdrom doesn't seem to be available under any other mount point.
5) all of my sata devices are mounted using UUID's

Revision history for this message
Pedro Fausto R. Leite Jr. (pedrofausto) wrote :

I haven't found any problem with this kernel:
$ uname -r
2.6.20-16-generic

I have a AMD64-like motherboard (Sempron 64 bits) and working fine. Everything's mounting just like usual.
This is my fstab:

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
# /dev/hda1
UUID=524791d0-2027-48c6-91d8-d631241b2a53 / reiserfs notail 0 1
# /dev/sda1
UUID=DC7C1AF37C1AC862 /media/sda1 ntfs defaults,nls=utf8,umask=007,gid=46 0 1
# /dev/hda2
/dev/hda2 none swap sw 0 0
/dev/cdrom /media/cdrom0 udf,iso9660 user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0

Using an on-board graphic chipset: S3 Unichrome Pro VGA Adapter (VIA)
Mice working fine (PS/2)

No booting problem.

Revision history for this message
wilco (wilco-laposte) wrote :

Sorry for my english but i'm french.

It seems that the message
hdg: dma_timer_expiry: dma_status == 0x24
hdg: DMA interrupt recovery
hdg: lost interrupt
du to the change /dev/sd by /dev/hd

may be corrected with with Asus p4p800 Deluxe motherboard by entering :
 "Compatibilty Mode" in the IDE configuration instead of "Enhanced Mode".
I try on my PC and now i can boot on 2.6.20-16 kernel without any problem.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

A new kernel (2.6.20-16.29) reverting the piix changes was released: http://www.ubuntu.com/usn/usn-470-1 on 08 June 2007. Additionally a new wiki page describing how references to partitions should be UUIDs/labels has recently appeared: https://wiki.ubuntu.com/UsingUUID .

While not every query mention in https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/117314/comments/67 , https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2007-June/001059.html and https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2007-June/001063.html has been answered (e.g. UsingUUID doesn't address /boot/grub/device.map or the swap formatting problem ...), I'm only going to explicitly reask one set of questions. Martin Pitt basically said the answer to this depends on the kernel folks:

What happens to the current renaming bugs? Are they just closed? Do we just open a new bug if/when another "great renaming" happens? Is it worth trying to fix the PATA bugs being masked on SATA machines? Do we just close those bugs too?

Revision history for this message
talete (edorai) wrote : Re: [Bug 117314] Re: latest kernel(2.6.20-16.28) update gives boot problems

thanks to all people for the help

now with the latest

2.6.20-16.29

everything is ok

edoardo

Revision history for this message
Chris Mayo (chris-mayo) wrote :

2.6.20-16.29 is better, but hangs on booting if I enable the IT8212 (GigaRAID) controller on my Gigabyte GA-8KNXP.

Edgy's 2.6.17.1-10.34 works just fine.

Revision history for this message
disabled.user (disabled.user-deactivatedaccount) wrote :

The mentioned wiki page regarding usage of UUID (https://wiki.ubuntu.com/UsingUUID) doesn't exist:

"This page does not exist yet. You can create a new empty page, or use one of the page templates. Before creating the page, please check if a similar page already exists."

Revision history for this message
dschneller (dannyschneller) wrote :

I installed the new update because I suffered from the hard disk renaming (Intel PERL865 mainboard). Now I can boot again, however X11 does not start, because it claims it cannot load the nvidia kernel module (FATAL: Could not run the install command for nvidia). After some fiddling I could load it from the command line and launch X11 manually. However I wonder why it was not started on boot, because I just ran the same /sbin/lrm-video line I found in /etc/modules.d. Maybe some sort of a race condition on boot?

I do not use any RAID stuff - the / drive (containing all system components) is connected via PATA and is called /dev/sda. There are two more disks, both SATA (sdb, sdc), however they just contain Windows and some linux data - not needed for boot.

I also noticed that I could not start use VMware server in the manually started X11. Upon starting a virtual machine it just failed immediately with no error message. Trying sudo /etc/init.d vmware-server restart failed to load the kernel modules, too.
Is it correct that there is no updated linux-restricted-drivers package for the new kernel? Seems to be some sort of problem with kernel modules.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

hk47:
I have a feeling it might have been created in the wrong place and has now been moved to https://help.ubuntu.com/community/UsingUUID . It would be wise for someone to update http://www.ubuntu.com/usn/usn-470-1 ....

dschneller:
Spin off a new bug report and post a link to it here Can you also make sure that the files /etc/X11/xorg.conf , /var/log/Xorg.0.log , /etc/default/linux-restricted-modules-common are included along with the output of dpkg -l linux-restricted-* | grep ii , dpkg -l nvidia-\* | grep ii , sudo sh -x /sbin/lrm-video nvidia , uname -a and dmesg along with details about how you enabled the NVIDIA binary drivers in the first place and whether you have followed any online guides.

Revision history for this message
Kees Cook (kees) wrote : Re: [Bug 117314] Re: latest kernel(2.6.20-16.28) update gives boot problems

It seems the page was moved without a redirect left in its place. I
have fixed this now, and the page should work again.

Revision history for this message
NiklasW (falken) wrote :

Hello,

After upgrading to the latest kernel I aslo hade problems booting on my SATA drive. (Current setup, 2 SATA drives and 1 IDE). I used UUID ref in both fstab and menu.lst before the upgrade. After some investigation I managed to get my mashine to boot, the solution for me was to update the UUID of the drive. Meaning that this seams to have changed after the upgrade. Due to lack of time I have not hade time to correct my fstab yet, but I managed to get the system up and running just before I hade to run to work this morning :)

Maybe I am confirming a fact that is already known, but if needed I can provide additional input about my hardware.

Revision history for this message
Lex Berger (lexberger) wrote :

Same problem as dschneller: The kernel update fixed the fixed the boot problem, but now the nvidia module fails to load.
And, hell, NO, I won't file ANOTHER bug.

I know I'm being unprofessional now, and I never thought I would. But this is the 7th system-breaking bug in 3 months for me, and my hardware and my system setup is not that exactly exotic. When one bug is fixed, I run into another. Really, this SUCKS. I've been using GNU/Linux for 11 years now, but I've NEVER encountered a Distribution with that many critical bugs.

Please do some more testing before releasing kernel updates. Feisty is final, isn't it?

Revision history for this message
Kees Cook (kees) wrote :

I'm sorry that people have been continuing to have problems with the -16.29 kernel update. We have not yet been able to reproduce these issues. e.g. a machine local to me using the nvidia binary driver loads without problems. Can you check that your linux-restricted-modules was upgraded along with everything else? To make sure you have the correct linux-meta packages installed, try:

$ dpkg -l linux-$(uname -r | cut -d- -f3-) | grep ^i
$ dpkg -l linux-restricted-modules-$(uname -r | cut -d- -f3-) | grep ^i

Ff either of these are blank, the top-level linux-meta packages are missing from your system, which could cause things like linux-restricted-modules to get out of sync with the kernel ABI. If this is the case, try:

$ sudo apt-get install linux-$(uname -r | cut -d- -f3-)
$ sudo apt-get install linux-restricted-modules-$(uname -r | cut -d- -f3-)

Revision history for this message
dschneller (dannyschneller) wrote :

I checked for the status as you asked.
Turns out for some reason linux-generic was not installed. To fix it, I did an apt-get update, installed it via your command line above and then did an apt-get upgrade just to be sure I had everything as current as possible.

While X11 starts ok now - the VMware server at first still could not be used. The VMware-Server-Console started, however I could not resume an existing VM.

There is a strange thing I noticed:
$ dpkg -s vmware-server-kernel-modules
Package: vmware-server-kernel-modules
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 52
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: i386
Source: linux-meta
Version: 2.6.20.15.14
Depends: vmware-server-kernel-modules-2.6.20-15
Description: vmware-server kernel module dependency package
 This empty package allows people to keep their VMware Server kernel
 modules up-to-date when upgrading their Linux kernel.

This shows a dependency on version 2.6.20-15 of vmware-server-kernel-modules, even though this is not the most recent one. Shouldn't this be -16 after my update/upgrade cycle?

Via lsmod I found out that neither vmmon nor vmnet had been loaded. Trying to modprobe them did not work, because they simply were not installed for the -16 kernel.

So I manually installed vmware-server-kernel-modules-2.6.20-16 (which is listed along with the -15 version):

$ dpkg -p vmware-server-kernel-modules-2.6.20-16
Package: vmware-server-kernel-modules-2.6.20-16
Priority: optional
Section: restricted/misc
Installed-Size: 7096
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: i386
Source: linux-restricted-modules-2.6.20
Version: 2.6.20.5-16.28
Depends: module-init-tools, linux-image-2.6
Conflicts: vmware-player-kernel-modules-2.6.20-16
Size: 2774320
Description: vmware-server modules for Linux (kernel 2.6.20)
 This package contains the set of loadable kernel modules for
 VMware Server.
 .
 This package contains the compiled kernel modules for 2.6.20. All
 supported kernel types for this architecture are included in this single
 package.

Now I can resume the virtual machine again. Seems like a problem with the empty pseudo package to me?

Revision history for this message
LarsBjerregaard (lars-rubyglow) wrote :

I have one of the chipsets on the "endangered list", 27df 82801G (ICH7 Family) IDE Controller (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/116996/comments/79 thanks sitsofe), and I have now upgraded from 2.6.20-15.14 to 2.6.20-16.29.

I am happy to say that everything works just fine. Thanks.

Kees Cook (kees)
Changed in linux-source-2.6.20:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.