grub guessed BIOS disk order incorrectly

Bug #8497 reported by leon breedt
354
This bug affects 11 people
Affects Status Importance Assigned to Milestone
grub
Fix Released
Unknown
gfxboot-theme-ubuntu (Ubuntu)
Fix Released
Undecided
Colin Watson
grub (Ubuntu)
Fix Released
High
Colin Watson

Bug Description

i have an ASUS P4C800-E I875P motherboard (ICH5-R), which has two SATA
controllers, and one PATA. my boot drive is on a SATA controller, my data disk
on a PATA controller.

at installation time, i elected to install Ubuntu onto the SATA device, and when
prompted, entered "/dev/sda" as the root device for GRUB, however, it generated
a root device of (hd1,x) instead of (hd0,x).

Revision history for this message
Matt Zimmerman (mdz) wrote :

Probably needs to go upstream, please forward if so

Revision history for this message
Colin Watson (cjwatson) wrote :

What does the file /boot/grub/device.map contain?

Revision history for this message
leon breedt (bitserf+bugzilla) wrote :

that would be the culprit:

(hd0) /dev/hda
(hd1) /dev/sda

Revision history for this message
leon breedt (bitserf+bugzilla) wrote :

clarification: the BIOS is configured to boot off SATA, and GRUB regards the
SATA device as hd0 when booting.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Is it possible that the BIOS drive order changed between the CD-ROM boot and the
hard disk boot? Something similar was described in bug #8040.

Revision history for this message
leon breedt (bitserf+bugzilla) wrote :

this sounds highly likely.

my default boot sequence doesn't include CD-ROM, so to install Ubuntu,
i would have brought up the boot device menu and selected CD-ROM for installation,
and then let it use the default sequence on reboot after installation.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Unfortunately, this bug may be unsolvable, like the other one. I don't think we
can predict that, although a given disk is bios drive X during the install, it
becomes bios drive Y afterward, and X != Y. Colin, can you think of any way
around this?

Revision history for this message
Michael Vogt (mvo) wrote :

I can confirm this bug. It is e.g. triggered if the there are two drives, one
s-ata and one p-ata and the boot order of the bios is set to s-ata the p-ata disk.

The problem is that grub will assume that p-ata "hda" if exists comes before the
"sda". This is hardcoded into grubs lib/device.c:init_device_map() code. This
may be a unsolvable problem unless we are able to ask the bios about the
assigned drive numbers at boot-time in some way.

Revision history for this message
Michael Vogt (mvo) wrote :

(In reply to comment #8)
> s-ata and one p-ata and the boot order of the bios is set to s-ata the p-ata disk.

That should read "...and the boot order of the bios is set to s-ata first and
then p-ata".

The result of this assumption is that the device.map is created with
(hd0) /dev/hda
(hd1) /dev/sda

but it really is the other way around. I guess this problem is not too common
because it require people to activly change the bios boot ordering.

Revision history for this message
Michael Vogt (mvo) wrote :

Pondering a bit about the bug I thought about using the linux real mode
interrupt (lrmi) interface (that vbetool already uses) to figure the bios driver
order. The interessting bit here is int 13h/ah=48h
(http://www.ctyme.com/intr/rb-0715.htm) to get drive information. I wrote some
testcode (based on vbetool) at
http://people.ubuntu.com/~mvo/hacks/hddhack-0.0.tar.gz. Calling the interrupts
works pretty well but I seem to be unable to get the drive information into a
real-mode buffer :/

Another idea would be to use the partition label or uuid to figure the
boot-partition but that would involve a lot more work than the rather simple
approach above.

Revision history for this message
Michael Vogt (mvo) wrote :

Another update:

Fortunately there is already support in the linux kernel for passing on the
needed information about the bios disk ordering. The option is called CONFIG_EDD
and disabled by default in ubuntu because of
http://bugzilla.ubuntu.com/show_bug.cgi?id=8899.

Some information about the support can be found here: http://lwn.net/Articles/12334/

This looks like the right solution for the problem of creating a correct grub
device map.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Leon, is this bug still preesent in Dapper?

We seem to have CONFIG_EDD=m now; do we need to take further action in order to take advantage of it?

Changed in grub:
status: Unconfirmed → Needs Info
Revision history for this message
Eduardo Cereto (dudus) wrote :

yes this bug is in dapper flight 6.

Found that disabling ide than fixing grub, tha plugging ide back in is a nasty solution for this.

Revision history for this message
Carthik Sharma (carthik) wrote :

Changing to Confirmed since a second person has reproduced this on Dapper.

Changed in grub:
status: Needs Info → Confirmed
Revision history for this message
Matthew Fisher (fishermd-deactivatedaccount) wrote :

This is still a problem in flight 7 (AMD64)

I have a slightly different configuration:

400G SATA (boot drive)
DVDROM (pri-master)
DVDRW (pri-slave)
Promise Card with 2 drives (pri-master, sec-master)
Promise Card with 2 drives (pri-master, sec-master)

my boot order is removable, optical, sata

The 4 PATA drives attached to the promise cards are in a RAID-5 configuration, but are used as storage--no booting involved.

Changing the grub files after the fact took care of the problem.

Revision history for this message
Menachem Shapiro (menachem) wrote :

I had the same issue as the original poster of this bug with one difference. Instead of my /dev/sda drive being a SATA drive, it is a USB drive.

I installed Dapper to the USB drive using the Intel x86 alternate install CD. The computer is a Compaq Presario R3000 [even though it is an AMD 64, I used the x86 install CD]. The drive in the computer is running Windows XP. During the installation process, I installed grub to /dev/sda1. When I booted up to USB Drive grub start the boot process, it errored out because it was looking for hd(1,0) instead of hd(0,0).

I would have thought that grub always considered the drive it was booting off of hd(0,0).

Revision history for this message
J.L. (reaxion) wrote :

I have the same issue with Ubuntu Dapper 6.06 LTS. My equipment spec is as below:

ASUS A8N-SLI Deluxe
2GB DDR RAM
4 x Hitachi DeskStar 400GB SATA drives running on nForce4 SATA chipset as individual drives
AMD Athlon64 3600+ (2.2GHz, running stock speed)
Windows XP is running on the real first drive.
Ubuntu detected the 3rd drive on the board as the first, and installed. Caused Grub not to work without manually selecting the 3rd drive at boot. Of course, XP cannot be booted because Grub has the drive order wrong.

I'd like to avoid lilo but it looks like I cannot.

Revision history for this message
Mikel Ward (mikelward) wrote :

I just got bitten bi a similar issue.

I hav:
SATA 1 - /dev/sda
SATA 2 - /dev/sdb
PATA 1 - /dev/hdb

In mi ASUS/nVidia BIOS, the boot configuration is:
CD-ROM
SATA
SCSI
HDD (meening PATA/IDE)

Installing Ubuntu 6.06 from the Live CD resulted in an unbootable sistem. I just see "GRUB GRUB GRUB" heeps ov times.

Mi /boot/grub/device.map had:
(hd0) /dev/hdb
(hd1) /dev/sda
(hd2) /dev/sdb

I'v manually chanjed this to:
(hd0) /dev/sda
(hd1) /dev/sdb
(hd2) /dev/hdb

to be consistent with mi BIOS boot order, then reinstalld grub. Hopefully it werks!

Revision history for this message
Mikel Ward (mikelward) wrote :

Werks fine now.

Still an important bug in the installer however.

I shood also note that I never chanjed the boot order in BIOS. It's simply that GRUB detects them in the rong order.

Revision history for this message
Anders (andersja+launchpad-net) wrote :

This is still a problem on the Edgy Beta LiveCD/Installer. I entered bug 66667 (now listed as duplicate of this one)

Revision history for this message
Paul Dufresne (paulduf) wrote :

Bug #30967 have the same title.

Revision history for this message
Mårten Woxberg (maxmc) wrote :

I can confirm this is present in 6.10 release.

What do I change for Ubuntu to correctly auto-generate my new menu.lst when I update my kernel?

Revision history for this message
Jason McMullan (jason-mcmullan) wrote :

Why not build with CONFIG_EDD=m, and set the default Linux command line to 'edd=skipmbr'?

That way we don't have the takes-forever disk checks of bug 8899, but we will still be able to get
the geometry information correctly for GRUB et. al.?

And if someone *really* needs the MBR checks, they can append edd=on?

Revision history for this message
Jason McMullan (jason-mcmullan) wrote :

Urgh, forgot the bugzilla -> lanchpad renumbering. I meant bug 15213.

Revision history for this message
richard (richardjones) wrote :

I've hit this problem too in the latest Feisty release (5).

I have three drives which are labelled by Linux and grub in the following manner:

Drive type Linux grub
SATA-1 /dev/sda ??
SATA-2 /dev/sdb /dev/sda (this is my boot disk)
SATA-2 /dev/sdc ??

The BIOS lists them in with the first SATA-2 drive first. I don't believe I have any control over ordering the SATA drives in the BIOS.

I don't actually know what devices grub assigns to the other drives.

Colin Watson (cjwatson)
Changed in grub:
assignee: kamion → nobody
Revision history for this message
glenstewart (glen-stewart) wrote :

I'm curious if this bug is the same as what I see on my Abit AN8-32X system. I have 6 SATA drives (4 on an Nvidia chip, and 2 on Sil 3132 chip), 2 IDE drives (each a master on its own channel), and 2 USB "drives", which are a SD and CF card reader.

In Dapper, Edgy, and Fiesty, I see the set of 2 Sil 3132 chip drives "floating" in the boot order. On one boot, they appear as sdd and sde. On the next boot, they appear as sdf and sdg. It appears that the USB "drives" are not consistent about when the kernel detects them, and so they either appear before or after the two Sil 3132 SATA drives.

When Edgy converted fstab to a UUID entry, I thought the misrepresentation of the drives would go away, but the problem manifested immediately after the Fiesty upgrade. Granted, I only had to reboot a few times between Edgy and Fiesty. (-:

Thoughts?

Revision history for this message
Djamu (djamu) wrote :

Can confirm this to , full report here ( I believe i've tested it on dapper/edgy/feisty )

http://ubuntuforums.org/showthread.php?p=2523834#post2523834

Something similar yet quit different here, ( might be the same problem )
upon attaching my RAID5 array to feisty, all my sata drives (MD) superblocks changed, resulting in an array that cannot be assembled.
A "fix" is ready ( works under VMware ), yet i've still have to confirm this in a real world situation, meaning bringing back my RAID online, as i'm in the process of dd dumping partitions.
To be continued....

http://ubuntuforums.org/showthread.php?t=410136

Revision history for this message
Phillip Susi (psusi) wrote :

There is no way to detect how the bios sees the disks, and the bios may not see some disks at all. I think the fix for this is to have the installer notify the user that it is making its best guess as to how the bios sees the disks, but it may be wrong and if so, they will need to correct it. Then prompt them to edit devices.map.

Revision history for this message
glenstewart (glen-stewart) wrote :

I've found a solution to my problem...using entries in /etc/fstab and /boot/grub/menu.lst that refer to the disk "by-id", such as :

# /dev/sda1
/dev/disk/by-id/scsi-1ATA_Maxtor_7V300F0_V601KP2G-part1 / reiserfs nouser,defaults,noatime,auto,rw,dev,exec,suid 0 1

If people aflicted by this bug are able to use by-id or any of the other links under /dev/disk/, the problem may be controlled.

Revision history for this message
ed (eddyhkim) wrote :

I would like to add my experience here:
upgraded a running ubuntu 6.x to 7.04. This caused an unbootable system, so I reinstalled 7.04. I got an error 15 from grub. Problem was that the disk order was different depending on if I booted with the install CD or without the CD.
I have 3 disks:
1 - IDE /dev/hda my boot disk
2 - IDE /dev/hdc raid disk
3. SATA /dev/sda raid disk
4 IDE CDROM /dev/hdd

The order of the disk is correct when installing or booting through the CD. The first disk hd0 = hda. The order of the ide drives switches after the install, and I get a an error 15 from grub. At the grub shell, hd1 becomes hda and hd0 is hdc. The strange thing (for me) was if I boot off the CD, then choose the last option of booting from the hard drive, the boot succeeds, implying the CD may have something to do with this whole mess.

I've never had this problem before on this or other machines.

I was able to work around it by changing the menu.lst and changing all the references of (hd0,0) to (hd1,0) but it seems I have to remember to do this everytime an upgrade updates menu.lst.

Revision history for this message
jabarlee (jabarlee-deactivatedaccount) wrote :

Probably my problem falls under the same category (Ubuntu 7.04, fresh installation):
I have a pci IDE controller (Sil 0680 ATA) with two hard disks (master/slave) on it's 1st channel.
I also have four hard disks on the onboard IDE controller (both channels)

So, my "normal" setup is:
Onboard IDE controler, primary channel, master: /dev/sda -> boot disk, RAID1 (/dev/md0)
Onboard IDE controller, primary channel, slave: /dev/sdb
Onboard IDE controller, secondary channel, master: /dev/sdc -> RAID1 (/dev/md0)
Onboard IDE controller, secondary channel, slave: /dev/sdd

PCI IDE controller, primary channel, master: /dev/sde
PCI IDE controller, primary channel, slave: /dev/sdf

The problem is that, randomly, the system boots with changed disk order, as follows:

Onboard IDE controler, primary channel, master: /dev/sde*
Onboard IDE controller, primary channel, slave: /dev/sdb
Onboard IDE controller, secondary channel, master: /dev/sdc
Onboard IDE controller, secondary channel, slave: /dev/sdd

PCI IDE controller, primary channel, master: /dev/sda*
PCI IDE controller, primary channel, slave: /dev/sdf

It only affects the drives marked with *, nothing else. BIOS always detects the disks in the "normal" order, onboard controller first, PCI controller last.

I'd be happy to post more info if needed

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

All partitions really should be referenced by UUID/label in /etc/fstab (see https://help.ubuntu.com/community/UsingUUID for details) however this doesn't help for deciding where to put the MBR when using grub-install (which is what I assume this bug is about)...

Revision history for this message
perriman (chuchiperriman) wrote :

The same problem for me. I have an Ide hd and an ata hd. My ata is /dev/sda and my ide /dev/hdb. Ubuntu create device.map:

(hd0) /dev/hdb
(hd1) /dev/sda

But really my bios hd order is:

(hd0) /dev/sda
(hd1) /dev/hdb

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

I'm another person bitten by this. A system with PATA and SATA drives attempting to install on /dev/sda. The install goes fine and says it's finished but the BIOS fails to find any boot loader installed and wants a system disk inserted.

Given the number of individual "me toos" on this bug, and that there are 13 duplicates, can its importance be bumped up a bit from medium. Mixed PATA/SATA systems are becoming more common and failing to install is a real turn-off for newcomers.

Revision history for this message
Val Blant (vace117) wrote :

In my case the drive that is seen by grub as (hd1) when the system is running appears as (hd0) to grub during bootup.

As a workaround I opened /boot/grub/menu.lst and changed the 'groot' line to rea:

# groot=(hd0,0)

Note that the line is supposed to stay commented out. The 'update-grub' tool actually reads these commented values when it is generating a new /boot/grub/menu.lst.

This way I don't have to remember to change (hd1,0) to (hd0,0) after every automatic menu.lst update.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Thanks Val. I understand it can be corrected and worked around, but it takes a lot of perserverance to get to the point where you find a bug report like this one and discover the workaround. Every attempt at installation finishes with the "all went well" message yet reboot fails. I'm happy to delve into things but newcomers would be lost. Our LUG is seeing an increase in "don't want to upgrade to Vista" users trying Ubuntu for the first time. I just think having a plan for how to get this fixed for Gutsy would save lots of newcomers much hassle as PATA/SATA mixes are becoming more common. It sounds like EDD is the way to go.

Revision history for this message
Sergey Zelenev (zelenev) wrote :

The bug is affecting IBM R52 laptop as well. I have installed i386 Ubuntu 7.04 and grub on external USB drive using alternate CD. I must mention that I chose not to follow installer suggestion to install grub to my internal HDD and installed it on my USB HDD instead. I can boot into grub, but get error 15 unless I manually edit menu.lst and change (hd1,0) to (hd0,0). The boot order in BIOS is not affecting which drive is hd1 and which is hd0.

Revision history for this message
leon breedt (bitserf+bugzilla) wrote :

Hi,

Just to add some more information if it was not available already. My lone remaining PATA drive died a few weeks back, and I did a clean install of Feisty (7.04) today as follows:

Cold boot, press F8 to select boot menu popup (since default boot order is hard drive first).
Boot off the installation CD
Install Ubuntu
Reboot

I experienced no issues - So I suspect the comments by Ralph that it is the PATA/SATA mix is correct.

However, my hardware has changed since I originally reported this bug:

* ASUS P5W DH Deluxe Motherboard (ICH7R)
* Core 2 Duo, etc

Revision history for this message
Dwayne Nelson (edn2) wrote :

I am wondering if my bug #107249 is also being caused by this. In my case, kernel updates have sometimes failed and have resulted in the system presently being unbootable.

My machine, an Asus A8V, reports 4 sata drives, two on each controller. Three of these drives are being used by linux: a bootable RAID-1 (3 disks), and a RAID-5 (3 disks) for data. The fourth drive contains a bootable XP partition.

If lilo depends on a consistent ordering of the four drives, is it possible that the random reordering (as reported to linux by the bios) could have caused lilo to attempt to update a kernel on the XP drive instead of one from the linux array?

Revision history for this message
Mikel Ward (mikelward) wrote :

I got bitten by this again in the Gutsy RC on my new PC.

I posted my experiences here http://ubuntuforums.org/showthread.php?p=3518911#post3518911.

In brief:

My BIOS sees my drives as:
HDD1 - 80 GB - Windows
HDD2 - 80 GB - Linux
HDD3 - 250 GB - Media

Linux sees them as:
/dev/sda - 80 GB - Windows
/dev/sdb - 250 GB - Media
/dev/sdc - 80 GB - Linux

So it generated device.map assuming that order was correct, i.e.:
(hd0) /dev/sda
(hd1) /dev/sdb
(hd2) /dev/sdc

but it should have been:
(hd0) /dev/sda
(hd1) /dev/sdc
(hd2) /dev/sdb

I see this as a pretty big issue, because it rendered my system unbootable (both Ubuntu and my existing Windows installation).

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

I too fear that this is going to bite more people as their hardware moves over time away from PATA. Rendering a system unbootable is pretty severe and something that many users can't recover from un-aided. And how do they get that aid if the machine is their main Internet access? Can this bug's importance be re-considered? I can't see what options there are due to Launchpad's interface, but medium seems a bit low. And is grub definitely the right package? Lastly, can we have some feedback from those in the know about what course of action is intended to get this problem resolved? Thanks.

Revision history for this message
unggnu (unggnu) wrote :

I think that this is serious. It still happens with Gutsy on some (desktop) installs with a raid system that hd1 is used instead of hd0 which makes system unbootable for newbies and it even happened between a Kernel upgrade during Gutsy Alpha.

Revision history for this message
David Freitas (jddcef) wrote :

This is a SHOWSTOPPER.
The priority HAS to be at the highest.
This thing has racked up 23 duplicates already, it is a major problem.
And the conditions for it aren't special, it's basically any new computer.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

https://wiki.ubuntu.com/Bugs/Importance says an Importance of High is for a bug that "Has a severe impact on a small portion of Ubuntu users". This bug stops systems booting for a small, but growing, number of users. It also says "Makes a default Ubuntu installation generally unusable for some users... for example the system fails to boot". I think that's also true.

So will someone please up the Importance from the default Medium.

Revision history for this message
TJ (tj) wrote :

Will all those experiencing this bug with Gutsy give us detailed information as to the physical configuration of the motherboard disk interfaces and disk drives so we can understand the precise circumstances that cause this?

Right now we have a combination of comments, some of which indicate an issue with the BIOS boot-order, others that the BIOS drive-detection & reported order is different to that of Linux and/or GRUB.

So, as an example:

Asus A8V
Mobo > SATA-1 > Drive-1
          > SATA-2 > Drive-2
          > PATA-1-1 > Drive-3 [master]
          > PATA-1-2 > Drive-4 [slave]
          > PATA-2.1 > DVD-1 [master]

During Installation:

BIOS boot order
 DVD-1
 Drive-1 Master Boot Record (MBR), /boot partition /dev/sda5
 Drive-3 root partition /dev/hd2
 USB

Linux (GRUB)
Drive-1 /dev/sda (GRUB hd1)
Drive-2 /devsdb (GRUB hd2)
Drive-3 /dev/hda (GRUB hd3)
Drive-4 /dev/hdb (GRUB hd4)
DVD-1 /dev/scd0

Output of:
$ ls -l /dev/disk/by-id

After Installation:

BIOS boot order
 Drive-1
 Drive-3
 DVD
 USB

Linux (GRUB)
Drive-1 /dev/sda (GRUB hd0)
Drive-2 /devsdb (GRUB hd1)
Drive-3 /dev/hda (GRUB hd2)
Drive-4 /dev/hdb (GRUB hd3)
DVD-1 /dev/scd0

Output of:
$ ls -l /dev/disk/by-id

Revision history for this message
Mikel Ward (mikelward) wrote :

Ubuntu Linux 7.10 RC amd64

ASUS M2A-VM mainboard

Configured in PATA emulation mode so that Windows Vista works. (I'm not sure what they called it on the BIOS settings, but it's the opposite of AHCI mode. In Windows they show up under a generic ATA controller.)

Mainboard:
SATA4 (empty)
SATA2 Seagate ST380013AS 80 GB HD
SATA3 Seagate ST3250820AS 250 GB HD
SATA1 Seagate ST380817AS 80 GB HD

PATA1 LiteOn DVD-RW

(The physical ordering is weird! The front two connectors are red, the back two are black. I assume the ordering is due to RAID. There is only one PATA channel. I have no PATA hard drives, only the DVD drive.)

BIOS:
HDD1 Seagate ST380817AS 80 GB HD
HDD2 Seagate ST380013AS 80 GB HD
HDD3 Seagate ST3250820AS 250 GB HD
HDD4 None

Boot order is:
CD
HDD1
HDD2
etc.

Pretty sure that was the case during both installation and normal operation. I changed it once to (CD, HDD2, HDD1) after Ubuntu wouldn't boot. Doing this seemed to change Linux's device names, and it couldn't find the root device. I changed it back. I don't think this has any permanent effect.

Windows:
SCSI 0,0,0 Seagate ST380817AS 80 GB HD
SCSI 0,1,0 Seagate ST380817AS 80 GB HD
SCSI 1,0,0 Seagate ST3250820AS 250 GB HD

device.map after install:
(hd0) /dev/sda
(hd1) /dev/sdb
(hd2) /dev/sdc

corrected device.map:
(hd0) /dev/sda
(hd1) /dev/sdc
(hd2) /dev/sdb

This looks weird, but it's right since Linux detects HDD2 (what Grub calls hd1) as /dev/sdc.

$ ls -l /dev/disk/by-id | awk '{ print $8, $9, $10 }'

ata-ST3250820AS_6QE1M99W -> ../../sdb
ata-ST3250820AS_6QE1M99W-part1 -> ../../sdb1
ata-ST380013AS_3JV2TQ1K -> ../../sdc
ata-ST380013AS_3JV2TQ1K-part1 -> ../../sdc1
ata-ST380013AS_3JV2TQ1K-part2 -> ../../sdc2
ata-ST380013AS_3JV2TQ1K-part5 -> ../../sdc5
ata-ST380817AS_4MR0KRW8 -> ../../sda
ata-ST380817AS_4MR0KRW8-part1 -> ../../sda1
scsi-1ATA_ST3250820AS_6QE1M99W -> ../../sdb
scsi-1ATA_ST3250820AS_6QE1M99W-part1 -> ../../sdb1
scsi-1ATA_ST380013AS_3JV2TQ1K -> ../../sdc
scsi-1ATA_ST380013AS_3JV2TQ1K-part1 -> ../../sdc1
scsi-1ATA_ST380013AS_3JV2TQ1K-part2 -> ../../sdc2
scsi-1ATA_ST380013AS_3JV2TQ1K-part5 -> ../../sdc5
scsi-1ATA_ST380817AS_4MR0KRW8 -> ../../sda
scsi-1ATA_ST380817AS_4MR0KRW8-part1 -> ../../sda1

Let me know if there's any other info I can provide.

Revision history for this message
Mikel Ward (mikelward) wrote :

Corrections:

Windows:
SCSI 0,0,0 Seagate ST380817AS 80 GB HD
SCSI 0,1,0 Seagate ST380813AS 80 GB HD <---
SCSI 1,0,0 Seagate ST3250820AS 250 GB HD

And to make it clear, I'm not actually using any kind of RAID, just that the mobo is capable of it.

The relevant part of lspci -vv shows this:

00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Unknown device 81ef
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64
        Interrupt: pin A routed to IRQ 22
        Region 0: I/O ports at ff00 [size=8]
        Region 1: I/O ports at fe00 [size=4]
        Region 2: I/O ports at fd00 [size=8]
        Region 3: I/O ports at fc00 [size=4]
        Region 4: I/O ports at fb00 [size=16]
        Region 5: Memory at fe02f000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] #12 [0010]

[...]

00:14.1 IDE interface: ATI Technologies Inc SB600 IDE (prog-if 8a [Master SecP PriP])
        Subsystem: ASUSTeK Computer Inc. Unknown device 81ef
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64
        Interrupt: pin A routed to IRQ 16
        Region 0: I/O ports at 01f0 [size=8]
        Region 1: I/O ports at 03f4 [size=1]
        Region 2: I/O ports at 0170 [size=8]
        Region 3: I/O ports at 0374 [size=1]
        Region 4: I/O ports at f900 [size=16]
        Capabilities: [70] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
                Address: 00000000 Data: 0000

Revision history for this message
Mikel Ward (mikelward) wrote :

I think /proc/scsi/scsi is the simplest way to find out Linux has the order different:

/proc/scsi$ cat scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA Model: ST380817AS Rev: 3.42
  Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA Model: ST3250820AS Rev: 3.AA
  Type: Direct-Access ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA Model: ST380013AS Rev: 3.05
  Type: Direct-Access ANSI SCSI revision: 05

Based on the BIOS ordering, scsi1 should probably be ST380013AS rather than ST3250820AS.

Revision history for this message
Mikel Ward (mikelward) wrote :

Even more info than you wanted, but dmesg suggests it's using AHCI mode, which is interesting.

[ 26.020215] ahci 0000:00:12.0: version 2.2
[ 26.020433] ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
[ 26.072845] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 27.025250] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[ 27.025256] ahci 0000:00:12.0: flags: ncq ilck pm led clo pmp pio slum part
[ 27.026627] scsi0 : ahci
[ 27.027127] scsi1 : ahci
[ 27.027421] scsi2 : ahci
[ 27.027701] scsi3 : ahci
[ 27.028019] ata1: SATA max UDMA/133 cmd 0xffffc20000aa4100 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 22
[ 27.028025] ata2: SATA max UDMA/133 cmd 0xffffc20000aa4180 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 22
[ 27.028030] ata3: SATA max UDMA/133 cmd 0xffffc20000aa4200 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 22
[ 27.028035] ata4: SATA max UDMA/133 cmd 0xffffc20000aa4280 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 22
[ 27.513327] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 27.514711] ata1.00: ATA-6: ST380817AS, 3.42, max UDMA/133
[ 27.514714] ata1.00: 156301488 sectors, multi 1: LBA48 NCQ (depth 31/32)
[ 27.516302] ata1.00: configured for UDMA/133
[ 28.000696] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 28.034734] ata2.00: ATA-7: ST3250820AS, 3.AAE, max UDMA/133
[ 28.034738] ata2.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
[ 28.092961] ata2.00: configured for UDMA/133
[ 28.651854] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 28.654960] ata3.00: ATA-6: ST380013AS, 3.05, max UDMA/133
[ 28.654964] ata3.00: 156301488 sectors, multi 1: LBA48
[ 28.658170] ata3.00: configured for UDMA/133
[ 28.987424] ata4: SATA link down (SStatus 0 SControl 300)
[ 28.986853] scsi 0:0:0:0: Direct-Access ATA ST380817AS 3.42 PQ: 0 ANSI: 5
[ 28.987550] scsi 1:0:0:0: Direct-Access ATA ST3250820AS 3.AA PQ: 0 ANSI: 5
[ 28.988136] scsi 2:0:0:0: Direct-Access ATA ST380013AS 3.05 PQ: 0 ANSI: 5
[ 30.130443] ide0: BM-DMA at 0xf900-0xf907, BIOS settings: hda:pio, hdb:pio
[ 30.130457] Probing IDE interface ide0...
[ 30.206343] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 30.206388] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 30.206437] sd 2:0:0:0: Attached scsi generic sg2 type 0
[ 31.544471] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Revision history for this message
TJ (tj) wrote :

Mikel, thanks for those reports, they help clarify your circumstances tremendously. This is just off the top of my head without any real investigation, but the difference in the SCSI addresses between Windows and Linux makes me wonder about how the SCSI addressing is being determined.

For Windows, the three-digit addresses are, I seem to recall, Channel, ID, LUN but the Linux addresses reported in the SCSI reports above show each device as a separate Host, with each Channel, ID, and LUN being 0,0,0.

Linux will arbitrarily order the Hosts based on discovery order so on the face of it, we *seem* to be developing a somewhat plausible explanation for the issue.

Here's a summary of what I'm thinking. SCSI addresses are of the form
Adaptor (host), Channel (bus), ID, LUN.

Windows (? is unknown)
?,0,0,0 Seagate ST380817AS 80 GB HD
?,0,1,0 Seagate ST380813AS 80 GB HD
?,1,0,0 Seagate ST3250820AS 250 GB HD

Linux
0,0,0,0 Seagate ST380817AS 80 GB HD
1,0,0,0 Seagate ST3250820AS 250 GB HD
2,0,0,0 Seagate ST380813AS 80 GB HD

Revision history for this message
Mikel Ward (mikelward) wrote :

Yeah, the Windows ones are supposed to be (channel, target, lun) or (controller, target, lun).

I would have thought logically they would all have the same controller, but maybe SATA is handled differently.

(0,0,0), (0,1,0), and (1,0,0) would map well to IDE primary master, IDE primary slave, IDE secondary master. Perhaps that's what it's doing as part of the legacy/PATA emulation? The addresses in Windows are almost certainly pretend ones.

Maybe we get different results depending on whether we enumerate the devices via the BIOS or the SATA controller?

Revision history for this message
Mikel Ward (mikelward) wrote :

These two comments suggest EDD is the right way to construct device.map:

http://kerneltrap.org/node/6408
        "These days, you just cannot enumerate controllers in any meaningful manner.
        I don't think you ever really could, but at least with static hardware,
        any random enumeration was as good as any other."

http://lwn.net/Articles/75923/
        Say Y or M here if you want to enable BIOS Enhanced Disk Drive
        Services real mode BIOS calls to determine which disk
 BIOS tries boot from. This information is then exported via driverfs.

        This option is experimental, but believed to be safe,
        and most disk controller BIOS vendors do not yet implement this feature.

Since the main purpose of device.map is to tell the stage1 GRUB code which disk to look on for the stage2, and the disk address seems to be resolved by the BIOS, we really need to be asking the BIOS for this information.

Revision history for this message
TJ (tj) wrote :

A follow-up to my observations of the SCSI address allocation. It looks as if this is determined in

drivers/ata/libata-core.c::ata_host_register()

where, in part, you find "/* print per-port info to dmesg */"

and in turn calls

drivers/ata/libata-scsi.c::ata_scsi_scan_host()

It appears that each ATA port is assigned as a separate host adaptor which explains the addressing we are seeing.

Regarding your comments about EDD, yes, it is a potential solution. See https://wiki.ubuntu.com/GrubDiskMapSanity

Revision history for this message
TJ (tj) wrote :

If you're feeling particularly adventurous you could build the kernel EDD module.

1. Install the kernel-source from GIT (https://wiki.ubuntu.com/KernelGitGuide) or use

$ sudo apt-get install linux-source
$ sudo tar -xjf linux-source-2.6.22.tar.bz2
$ cd /usr/src/linux-source-2.6.22

2. Edit debian/config/*/config matching the PC's architecture e.g:

$ sudo sed -i 's/# CONFIG_EDD is not set/CONFIG_EDD=m/' debian/config/*/config

3. Build the kernel package (See https://wiki.ubuntu.com/KernelMaintenance)

$ fakeroot debian/rules binary-arch

4. Install the edd module (not the entire custom kernel)

$ sudo cp drivers/firmware/edd.ko /lib/modules/$(uname -r)/kernel/drivers/firmware/
$ sudo depmod -a
$ modinfo edd

You should see the module's basic details. If you do you can go ahead and load it:

$ sudo modprobe edd

5. Now check sysfs for what EDD reports:

$ ls /sys/firmware/edd/int13_dev8?/
$ cat /sys/firmware/edd/int13_dev8?/mbr_signature

This assumes the BIOS that supports EDD, and that edd has access to libx86

Revision history for this message
Mikel Ward (mikelward) wrote :

... or I could boot from a Fedora or SUSE live CD. ;-)

Will do that tomorrow. Cheers.

Revision history for this message
TJ (tj) wrote :

Sometimes it helps to engage brain before putting mouth into gear!

In my edd module-build instructions above I forgot two things:

1. libx86 is for user-space applications; no involvement in the kernel (was thinking about vbetool at the time I typed that!)
2. The built module can't be modprobe-d since the Symbol.map will be different

So you will need to install the custom-kernel-image packages and boot to that kernel to use EDD.

Revision history for this message
Robert Stoffers (robertstoffers) wrote :

My work around for this which I have been doing for some time is to just edit my Grub boot line, boot into Ubuntu then alter my menu.lst back to how it should be. The culprit for me is always the following line for each kernel listed:

root (hd0,1)

For whatever reason after a kernel update all entries are changed to "root (hd1,1)", changing back to the above fixes the problem each time.

Perhaps a better approach is needed to updating the menu.lst file before more technical measures are attempted. If the script looked at what was already set and just mimicked it for the new kernel instead of guessing every time then this issue would go away.

Revision history for this message
TJ (tj) wrote :

Robert, the solution to your issue is to edit /boot/grub/menu.lst and set groot correctly. This is what is used when the kernel is updated.

## default grub root device
## e.g. groot=(hd0,0)
# groot=(hd0,4)

Revision history for this message
Robert Stoffers (robertstoffers) wrote :

TJ, given that it was commented out, will this stay persistent between kernel updates?

Revision history for this message
TJ (tj) wrote :

Robert, the entries in menu.lst inside the

### BEGIN AUTOMAGIC KERNELS LIST

section that begin with a single comment # are actually used by the updater as settings.

Genuine comments use two ## symbols, just to confuse you.

Revision history for this message
richard (richardjones) wrote :

This is STILL an issue in GUTSY and can create an UN-BOOTABLE SYSTEM that's very difficult to debug.

Ubuntu (or grub) simply doesn't have the same idea of disk ordering as the BIOS.

In particular this is what just happened to me with a new Gutsy install.

After installation and reboot the computer just said "Grub stage 1.5" and "Error 15". That error apparently means it couldn't parse a number, which is pretty much a useless error.

After spending about 2 hours on the problem, I realised that grub had installed itself on hd0, according to Ubuntu, but actually the second disk according to the BIOS. The computer was happy to boot from the second disk, but when grub then ran I can only assume it thought it was on the first disk and subsequently read information from that disk which didn't have the grub-stuff installed. And it broke.

I made it all work by physically shifting the disk connections around to make the boot disk was the first in the BIOS sequence (but detected as *hd2* by Ubuntu) and reinstalling (just to be sure).

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Hi Colin, I'm taking the imposition of assigning this to you because many of the duplicates were and it seems to need someone to move it forward. I thought you'd know who that should be. Also, https://wiki.ubuntu.com/Bugs/Importance says an Importance of High is for a bug that "Has a severe impact on a small portion of Ubuntu users". This bug stops systems booting for a small, but growing, number of users. It also says "Makes a default Ubuntu installation generally unusable for some users... for example the system fails to boot". I think that's also true so can you please up the Importance from the default Medium to High. Even if it makes no difference to when the bug is fixed, it would act as a comforter to those of us bitten by it. :-)

http://lwn.net/Articles/12334/ has a good summary of the problem: "x86 systems suffer from a disconnect between what BIOS believes is the boot disk, and what Linux thinks BIOS thinks is the boot disk. This manifests itself in multi-disk systems - it's quite possible to install a distribution, only to fail on reboot - the disk installed to is not the disk BIOS is booting from. Dell restricts our possible standard factory installed Linux offerings to "disks on no more than one controller" to avoid this problem, but mechanisms now exist to solve it and allow such configurations."

This bug is, as predicted, biting not just those doing a fresh install, possibly for the first time, how off-putting!, but also those upgrading from 7.04.

Changed in grub:
assignee: nobody → kamion
Revision history for this message
Billnvd (billnvd) wrote :

My experience with this issue is a little different.

The system is a file server running 6.06 LTS
There are five HD's

Boot and OS disk is attached to the MB Pri Master.
This disk has always been HDA
Partitions are:
hda1 = boot
hda5 = root
hda6 = home

There are four MDRaid disks on a SIL controller.
These disks have always been HDE, F, G, H

The system was removed from service back when the kernel was at - /vmlinuz-2.6.12-10-386.

Today we refired the system, did a apt-get update, upgrade, dist-upgrade.

The default kernel is now - /vmlinuz-2.6.15-29-386

The system now fails to complete a boot cycle, dropping to the shell.
Indicates /dev/hda5 does not exist.

Booting with the older kernel works fine.
Booting with the current kernel indicates that the onboard controller is now located at HDE and the SIL controller is now HDA, B, C, D.

This problem is a MAJOR problem and a showstopper for Ubuntu. How can anyone trust "Ubuntu Server" when a kernel upgrade changes something like HD detection order and effectively kills a system.

It's not the BIOS as this system worked fine since the early release of 6.06. It has been updated several times. Sometime changed in Grub or the Kernel.

I have another system running Fiesty that randomly changes the assigned drive order with a 3ware sata raid card. This does not effect the booting, but it makes auto mounting of the array impossible. After every boot cycle we have to check which order has been assigned and manually mount the array.

Revision history for this message
HJMills (hjmills) wrote :

Billnvd:
 If the computer is getting to a shell then GRUB should be ok and it is the OS that is having problems. From the sounds of it you could try using UUIDs instead of relying on device nodes to mount drives. This has been the default in Ubuntu for a while now (though I'm not sure if it was there in 6.06).

Revision history for this message
Tormod Volden (tormodvolden) wrote :

Billnvd, your old kernel 2.6.12-10-386 was a Breezy kernel, so I guess the upgrade to 6.06 was not properly done. File a new bug if you still think there's a bug.

Revision history for this message
Phillip Susi (psusi) wrote :

A number of comments posted here are unrelated to this bug report. If you get to a grub menu, then this bug report is not related to your problem. Having the kernel not be able to find the root filesystem because it changed name and you weren't using the UUID is a separate issue. Devices changing names across kernel upgrades or even just reboots is considered upstream to be not an issue and won't be fixed. You should be using UUIDs.

Revision history for this message
Dwayne Nelson (edn2) wrote : All caused by installation process failing to configure for UUIDs?

I agree that it is possible (likely even) that the comments here reflect several problems and not just one. As an end-user, I report on symptoms as best I can (i.e. "wont boot after kernel update"), the underlying problems are probably better identified by those more familiar with the structure of the operating system (you?) - symptoms can then be grouped by their underlying causes. In this case, the similarity appears to be that changing the ordering of drive references can make the machine fail to boot.

I gather that the UUID would not change between consecutive boots, but I don't really understand when you say "you should be using UUIDs." The installation process I followed did not ask me whether I wanted to use UUIDs. I could certainly read up on how to make the proposed change, but perhaps an update/fix is still warranted if the installation process should have been configuring to use UUIDs.

Revision history for this message
Paul Dufresne (paulduf) wrote :

I am recalling developers here that Michael Vogt has given a hint on how to fix this in:
https://bugs.launchpad.net/ubuntu/+source/grub/+bug/8497/comments/11 by using kernel option CONFIG_EDD that was deactivated because of a bug that is now fixed: http://bugzilla.ubuntu.com/show_bug.cgi?id=8899 , it is possible to get disk order as seen by the BIOS.
Like he said, more info on CONFIG_EDD can be found at:
http://lwn.net/Articles/12334/

This seems a better approach than UUID (according to Michael).
Unfortonately, Michael Vogt is not any more a subscriber of this bug (probably have become too noisy).

So hope that there is still a developer here to listen.

Please refrain of giving just examples of broken systems, what is needed now is to discuss how to fix, not to know that it is broken.

Revision history for this message
Paul Dufresne (paulduf) wrote :

Hum, I think I have read too fast.
Seems like bug #15213 (that was 8899 in bugzilla time) was indeed fixed by deactivating CONFIG_EDD.
So, seems the problem would still be that the CONFIG_EDD patch is still buggy, causing a big delay at boot.

Revision history for this message
stdPikachu (sdottait) wrote :

Sounds like this might be related to my problem at http://ubuntuforums.org/showthread.php?t=668639. Not only did GRUB install itself to hd2 (needs to be changed to hd0 to boot) but the kernel typically enumerates the SATA drives in a completely different order every time. If fstab wasn't using UUID's I'd be utterly hosed...

When installed, GRUB picks up the drive precedence correctly from the BIOS, but the kernel doesn't. I'm mulling over writing a bunch of udev rules in order to get a consistently sane device mapping going but would prefer trying out a possible fix first.

As I think I said in the forum, it looks to me like the LiveCD GRUB is basing its detection on what the running kernel has read the drives as, whereas the installed GRUB bases its information on what it gets directly from the BIOS, but the kernel/udev that boots then also wrongly detects the drive order (and in an inconsistent fashion too). Could easily turn into a very nasty problem for people like me with many hard drives or, heaven fobid, people with removable hard drives (again like me ;)).

Still working my way through this thread so apologies if I've repeated something.

Revision history for this message
Thomas Ene (thomas-ene) wrote :

I'm having the same problem:

Configuration: 2 disks (IDE and SATA). Bios sets the IDE disk as the first disk (default option) and I've changed this to the SATA one. I guess that's why grub sets the hd0 disk as the disk it will boot from. This is quite frustrating and it involves a couple of trial and errors until the problem is determined.

/dev/sdb is the boot disk (hd1 is the correct one)

root@ubuntu:/boot/grub# cat device.map
(hd0) /dev/sda
(hd1) /dev/sdb
(hd2) /dev/sdc

root@ubuntu:~# fdisk -l
Disk /dev/sda: 200.0 GB, 200049647616 bytes
Disk /dev/sdb: 250.0 GB, 250059350016 bytes

Changed in grub:
status: Unknown → New
Changed in grub:
status: New → Fix Released
Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Over in SUSE SLES/SLED land EDD is compiled in but not used by default - http://www.novell.com/documentation/sled10/readme/release_notes_sp1.html#b554l8b . My thinking is that if newer devices are getting EDD right perhaps it's going to be time for another BIOS year cutoff and those devices past the year default to using EDD. Such an idea is far too late for Hardy and would require a list of which devices get it right and which get it wrong along with their BIOSes.

Revision history for this message
Colin Watson (cjwatson) wrote :

I now have a grub patch that I think shuffles the devices around appropriately provided that you boot with edd=on, provided that EDD works on your hardware, and provided that the MBR signatures exposed by EDD are distinct - pretty much the same conditions as in SuSE. We can't make this the default, I don't think, but we can make it fairly easy (i.e. accessible from the CD boot menu, documented in release notes, and such) for people to use it if grub's normal autodetection gets it wrong.

Changed in grub:
status: Confirmed → In Progress
Colin Watson (cjwatson)
Changed in grub:
importance: Medium → High
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

gfxboot-theme-ubuntu (0.5.14) hardy; urgency=low

  * Update translations from Launchpad.
  * Add edd=on option to the "Other Options" menu.

 -- Colin Watson <email address hidden> Thu, 03 Apr 2008 10:23:24 +0100

Changed in gfxboot-theme-ubuntu:
assignee: nobody → kamion
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub - 0.97-29ubuntu20

---------------
grub (0.97-29ubuntu20) hardy; urgency=low

  * debian/patches/edd-device-map.diff: Use EDD information if available to
    help generate a more correct device.map; boot with edd=on to activate
    this (LP: #8497).
  * Fix a ".bar" -> ".br" typo in update-grub(8).

 -- Colin Watson <email address hidden> Thu, 03 Apr 2008 11:14:04 +0100

Changed in grub:
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote :

Sitsofe: unfortunately I don't think it's even as simple as a date-style cutoff. My understanding is that it still depends on your manufacturer to a large extent. Dell systems generally seem to get it right as they created the specification in the first place, and some other systems will support it; beyond that I don't think we have enough information to say.

Revision history for this message
Sitsofe Wheeler (sitsofe) wrote :

Colin:
That's a pity. If you can only use it on a case by case basis (because some BIOSes are still getting it wrong) any automatic activation will require the building of a whitelist which sounds extremely painful...

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Colin, one has to boot with edd=on for this to be activated, as you say, and it's presumably defaults to off for good reason, "This option is experimental and is known to fail to boot on some obscure configurations." says drivers/firmware/Kconfig. However, I'd guess that grub only makes use of the MBR's four-byte Windows NT signature at offset 440 in the MBR which is published in /sys/.../int13_dev80/mbr_signature.

Since reading the MBR is done in arch/x86/boot/edd.c:read_mbr() by using INT 0x13 AX=0x201 (read legacy sector) it will work without doing any EDD-specific stuff which may stop booting on "some obscure configurations". If query_edd() in the same file didn't avoid reading the MBRs when (!do_edd && do_mbr) then the data could still be published by /sys for grub to use.

This would allow EDD to not be done by default, avoiding boot problems on some machines, but BIOS device 0x80's MBR's EDD ID would be available to grub on all machines without the user first having to suffer problems, then find out that the resolution is to add edd=on to the kernel's parameters.

What do you think? The current fix of insisting EDD be done before four bytes of the MBR can be published seems limiting and allows more users to be pestered by this issue in the future instead of them never knowing it existed.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Having spoke to Matt Domsch, EDD kernel support's author, it seems it's the reading of the MBR that's typically causes booting problems when the drive being read doesn't exist. So there's no point doing just the MBRs and ignoring the advanced BIOS calls in the hope that it could be enabled by default. I've suggested to Matt that maybe 0x0040:0x0075 (BIOS's opinion on number of hard drives) could be read and used as a limit if it looks sane. Anything we can do to increase the number of machines EDD can run on without causing problems would help.

BTW, when grub is examing the MBR signatures and realises some aren't unique, it doesn't issue any diagnostic to that affect. It's possible the user would be happy to make them unique using, e.g. fdisk's expert's `i' command, but it should be drawn to their attention. Often, they're 0x00 or sometimes a drive is copied from another, resulting in two MBR sigs being the same for evermore.

Revision history for this message
Michael Rooney (mrooney) wrote :

Hello, is this really fixed? I came here from https://wiki.ubuntu.com/WubiGuide and had this error today installing Ubuntu 8.04 with Wubi. All went well until the second restart (the one after installation from within Ubuntu), and I got Error 15 on all entries including Windows, which was quite scary, and a pretty nasty bug. I had to follow the instructions: """You have to change all the lines "root (hdX,Y)/ubuntu/disks" to "root ()/ubuntu/disks". Also edit the line that starts with "#groot=(hdX,Y)/ubuntu/disks" to "#groot=()/ubuntu/disks".""" After that it worked fine, luckily I was able to discover this, but for someone not already familiar with Ubuntu this could be pretty traumatic.

That Wubi wiki page says this is the bug for that issue, so either that is wrong or this issue isn't fixed (or there is some third option I don't know about :) What's up?

Revision history for this message
Agostino Russo (ago) wrote :

Mike, the fix you mentioned is specific to Wubi, since Wubi uses a slightly differerent bootloader/bootloader configuration (grub4dos). That fix will soon be available, see: https://bugs.launchpad.net/wubi/+bug/217348

Revision history for this message
Jesse (jesse-2nda) wrote :

Running Gigabyte GA-81EXP with 2 channels of onboard IDE and channels of Promise IDE has this same problem. Up until last night I was running 7.10 and the controllers booted up randomly. If they came up backwards I could just reboot a couple of times until everything mounted correctly. Upgraded to 8.04 last night and now it's permanently backwards. The first drive on the system comes up as /dev/hdc, but only because I have two drives on the Promise controller. The moment I add another drive the boot disk will become /dev/hdd, and so on. This is a major screwup.

Revision history for this message
bytor3262 (watkins-dean) wrote :

Thanks to some of your help I think I can fix the problem, but I have used a couple of versions of Fedora and they simply give you an option to change the drive order during installation.

Revision history for this message
Steven Clark (davolfman) wrote :

Here's a workaround at least for my case without any severe modification of the system.
1: Make sure the bootloader is installed on the boot HD whether it works right or not. Manually install if you have to with find root and setup in grub off the live cd (I'm not putting those instructions here myself but they're easy to find)
2: Boot to the bootloader which will fail because the menu was configured wrong. Use the GRUB command line and the built in editor to find out what the real hd#'s are, using trial and error for windows (I recommend changing root to rootnoverify and deleting the maps if windows is on hd0) and "find /boot/grub/stage1" for ubuntu. Write these down.
3: Use GRUB's built in editor to temporarily fix the boot menu option for Ubuntu and boot it.
4: In Ubuntu edit /boot/grub/menu.lst and manually fix the boot menu options to the hd#s as they are on boot.
5: Manually install the GRUB bootloader as in 1, ignore the fact that the hd#'s are different than they are at boot, the menu.lst is already fixed this is just to get the bootloader updated.
6: Reboot and test.

Here's my own config: Asus A7N8X-E with an add-on PATA card (I'm too lazy to find out which). The windows install is on 5 partitions across 2 drives:

Onboard PATA controller:
hd0 in GRUB hd2 in Ubuntu
1 Windows Install
2 Programs

Second PATA controller
hd1 in GRUB hd0 in Ubuntu
1 windows pagefile
2 documents and settings mapped directory
3 file storage

hd2 in GRUB hd1 in Ubuntu
Onboard 3rd-party SATA fakeRAID controller (in non-RAID)
1 linux
2 swap

Revision history for this message
Gordon Hughes (stobo-attglobal) wrote :

I am not sure whether this bug is supposed to have been fixed or not, but it is still present in the main release of 8.04 and I can document pretty much exactly what happens.

I have an Asus P4C800-E motherboard with 2 identical 160 GB Samsung disks (D1 & D2) attached to the main SATA controller, and 2 identical 300 GB Samsung disks (D3 & D4) attached to the Promise SATA controller. The pair attached to the Promise SATA controller are configured as a RAID 1 (mirrored) array. The BIOS is set up to boot from D1 with a boot order D1, D2 & D3/D4. This worked fine with Ubuntu 7.10, except that the RAID array was ignored. I had Ubuntu installed on D2 and Windows XP on D1 with the grub boot loader in the boot record of D1.

When installing Ubuntu 8.04 either grub or Ubuntu identifies the disks in a different order - D3=(hd0)=/dev/sda, D4=(hd1)=/dev/sdb, D1=(hd2)=/dev/sdc, and D2=(hd3)=/dev/sdd. I have installed Ubuntu in the primary partition of D2. Whether through the automatic installation or manually, grub insists on identifying this partition as (hd3,0) and the menu.lst file reflects this. The automatic installation puts the grub boot loader in the mbr of (hd0), i.e. D3. Manually, I can put the grub boot loader in the mbr of (hd2), i.e. D1. Now, I have the grub boot loader on both D1 and D3, but it insists on looking for Stage 2 on partition (hd3,0). As a result it falls over reporting Error 21 - because the BIOS is telling it that (hd3,0) doesn't exist as D3 & D4 are mirrored as (presumably) (hd2). This is the outcome when I leave the boot order as D1, D2, D3/D4 or if I change it to D3/D4, D1, D2.

I have no doubt that if I sacrifice the RAID array, then things can be made to work properly, but only at the cost of losing my disk mirroring. The previous advice about editing the menu.lst file doesn't help, because the basic problem lies in the setup that is written to the mbr because of the misidentification of the disk array & order by grub.

I should emphasise that this appears to be a generic grub problem as I have encountered exactly the same problem in trying to install both Mandriva 2008.1 and OpenSUSE 11.0. All of them create an erroneous boot loader that falls over in one way or another - sometimes the error number is 21, sometimes it is 22 - as well as a version of menu.lst that would have to be edited.

As additional information, I am not using WUBI to install Ubuntu and I can replicate the behaviour. Finally, as a matter of considerable frustration, the problem was introduced when I tried to upgrade from my 7.10 setup to 8.04, with the result that a properly functioning setup has been completely destroyed. This is the worst aspect and, in my view, is simply inexcusable since upgrades should never render a system unusable. (Of course, I do have a backup of my user files.)

Revision history for this message
John Gelm (jgelm) wrote :

I am experiencing randomizing. I am using U8.04 64bit and all updates had been applied.

Yesterday, 2008-09-19, my devices were:
root@voyager:/boot/grub# fdisk -l|grep Disk
Disk /dev/sda: 120.0 GB, 120034123776 bytes <---------IDE0 100% Windows
Disk identifier: 0xc8e067c5
Disk /dev/sdb: 500.1 GB, 500107862016 bytes <---------SATA0 U8.04
Disk identifier: 0x94759475
Disk /dev/sdc: 500.1 GB, 500107862016 bytes <---------SATA1 U7.04
Disk identifier: 0x00027610

...and today, 2008-09-20, after a cold boot my devices are:
root@voyager:/boot/grub# fdisk -l|grep Disk
Disk /dev/sda: 500.1 GB, 500107862016 bytes <---------SATA0 U8.04 BIOS , was sdb
Disk identifier: 0x94759475
Disk /dev/sdb: 500.1 GB, 500107862016 bytes <---------SATA1 U7.04 , was sdc
Disk identifier: 0x00027610
Disk /dev/sdc: 120.0 GB, 120034123776 bytes <---------IDE0 100% Windows, was sda
Disk identifier: 0xc8e067c5
root@voyager:/boot/grub#

My BIOS boot drive is SATA0=U8.04.

Yesterday I set up:
root@voyager:/boot/grub# cat device.map
#@# per fdisk -l and hardware: ide0=sda Windows, sata0=sdb U8.04, sata1=sdc U7.04
#@# the boot/root drive in menu.list is hd0=sata0=sdb U8.04
(hd2) /dev/sda <- today sdc
(hd0) /dev/sdb <- today sda
(hd1) /dev/sdc <- today sdb
root@voyager:/boot/grub#

... and today it is all wrong!

As a test, I ran update-grub and all hd0 were changed to hd2; my Windows disk!

Could you please advise me on what I am doing wrong?

Respectfully;
John Gelm

Revision history for this message
Neil Jeffery (neilneil2000) wrote :

I have a similar problem, in both 8.04 and 8.10b

My menu.lst always sets the root as hd(2,x) but it should be hd(0,x)

Please let me know if anyone needs any more info from me, I will be glad to help.

Revision history for this message
Evan (ev) wrote :

This is fixed for new Ubuntu 8.10 installs using the latest daily-live CDs:

http://cdimage.ubuntu.com/daily-live

or for Kubuntu:

http://cdimage.ubuntu.com/kubuntu/daily-live

Revision history for this message
raboof (arnouten) wrote :

Once you have an incorrectly-generated /boot/grub/menu.lst, 'update-grub' will still take defaults from that.

Then, to get rid of this problem, simply remove (or just move) /boot/grub/menu.lst and have 'update-grub' generate an entirely new one.

Revision history for this message
James Howison (james-howison) wrote :

I believe that this situation still exists with 8.04 (which as LTS I'm trying to install). I don't have a CD drive, so i'm installing from a USB key. The BIOS on the Sun x2200 sees the usb key as a type of hard disk. During install it is registered as /dev/sda, with the (single) SATA disk as /dev/hdb.

Of course after I remove the USB key the order of the disks changes in the BIOS (since the usb key is not present). This results in the Grub Error 22. Rescue mode grub reinstall does nothing to fix this.

Will the fix discussed in #90 be backported to 8.04?

Revision history for this message
Jan Ivar Beddari (beddari) wrote :

I don't think this will be fixed ..

The new LTS will most likely use the "search-for-disk-with-uuid-x and set that as ROOT if found" mechanism from the new grub releases. Easiest way for now is to first live edit the config at first regular boot after install (set root(x,y) to whatever you need it to be), then delete /boot/grub/menu.lst and run update-grub again ..

.. and don't change the BIOS boot order from that point onward ..

But this solution doesn't scale very well :-)

Revision history for this message
ahahum (ahahum) wrote :

I don't see this to be fixed in the 9.10 iso I downloaded and installed this week. I have tried to install it from a CD and USB stick, but both fail to boot after the initial restart at the completion of setup.

The hardware I'm installing on doesn't have an onboard CD so it's impossible for the boot order to remain the same as during setup. I've tried tweaking it every way possible and removing all bootable options except the HDD and I'm still in the same boat.

Any ideas? I'd like to get this going.

Thank you.

Adam

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.