Feisty beta1 raid is broken

Bug #96511 reported by marcw
8
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Fix Released
Undecided
Brian Murray
Nominated for Feisty by John Williams

Bug Description

Having just built a nice dev server I thought I would spend some time with Feisty beta1. Installation was a breeze (it correctly detected the onboard NIC that Dapper and Edgy didn't) until it got to partitioning.

I have 4 individual SATA drives (no fakeraid). I wanted a RAID5 and RAID1 combination looking like this:
md0 = RAID1 - sda1, sdb1, sdc1, sdd1 - /boot
md1 = RAID5 - sda2, sdb2, sdc2, sdd2 - /
md2 = RAID5 - sda3, sdb3, sdc3, sdd3 - swap

Getting the installer to do this was a real chore (see bug 95704). But after working around the installation issues it was finally installed. Now it will occasionally boot correctly. Maybe 1 out of 10 times. Most of the time it can't find something it needs and will load Busybox and land at a initramfs prompt.

Just to make sure I wasn't doing anything wrong I wiped the drives and MBRs and installed both Dapper and Edgy Server edition. Using the same partition roadmap as Feisty, both installed perfectly with none of these issues.

Since this is a dev box for now, I am free to experiment. Please advise what kind of information is needed.

Tags: iso-testing
Revision history for this message
Brian Murray (brian-murray) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better. This sounds like another bug regarding md and sata devices. Please try booting Feisty with the kernel option 'break=mount' and assembling the RAID devices manually and let us know the results. Thanks in advance.

Revision history for this message
marcw (marcw) wrote :

Thanks Brian. I'd be happy to but can you write, or point me to , a little step by step of what you're asking?

Revision history for this message
Brian Murray (brian-murray) wrote :

In your '/boot/grub/menu.lst' file you will want to modify the default kernel boot options:

## additional options to use with the default boot option, but not with the
## alternatives
## e.g. defoptions=vga=791 resume=/dev/hda5
# defoptions=break=mount

and add 'break=mount' as I have done. After that you will need to execute update-grub as root, so 'sudo update-grub'. During the boot process you will be dropped to a busybox shell where you can reassemble you raid array. I believe the command would look like 'mdadm --assemble --auto=yes /dev/mdX' where X is the partition so to speak. However, you should confirm that by looking at the man for mdadm.

Revision history for this message
marcw (marcw) wrote :

Ok, thanks for those instructions. Unfortunately, I don't believe anything good came of it.

First off, trying to get a normal prompt is more difficult than I originally said; it's probably closer to 1 out of 30 reboots that will get me to a normal login prompt.

Secondly, I assume the busybox shell you refer to in your instructions is the same one I normally(?) get on the other 29/30 reboots. It's followed by the initramfs prompt.

Lastly, re: the mdadm assembly instructions you provide. I tried the assembly before doing anything else just to see what it said. It gives me this:
mdadm: device /dev/md0 already active - cannot assemble it

I get the same thing with md1 and md2 as well. Then (after eventually getting a prompt I could work with) I followed your instructions with grub. This generated the *exact* same mdadm: results as before.

Sort of makes me wonder if I've got a different problem than others are reporting.

Revision history for this message
BullCreek (jeff-openenergy) wrote :

I spent part of this afternoon battling the same thing I think using feisty BETA also. I'm quite familiar with software RAID on linux, and a few comments:

1. It seems that unlike in older distros, /etc/mdadm/mdadm.conf in feisty is fairly important. Without adding the ARRAYs entries to it, my arrays wouldn't reliably show up each time I booted. This never seemed necessary with older CentOS or RedHat distros nor Edgy. I'm not complaining, just mentioning in hopes of helping others.

2. I'm a bit ashamed about my setup - trying to get more mileage out of junk basically. MB is Asrock 775 DualVSTA - which has the Via PT880 chipset, then two Promise 100TX2 for a total of six PATA drives. The two drives attached to the two PATA ports on the mobo show up as /dev/hda and /dev/hdc respectively - and the four drives attached to the two TX2s show up as /dev/sd[a-d]. My RAID setup is 6 identical 200GB drives:

/dev/hd[ac]1 => /dev/md0 (RAID1) -> /boot
/dev/sd[a-d]1 -> striped swap
/dev/hd[ac]2 /dev/sd[a-d]2 -> /dev/md1 (RAID5) -> /

I don't use the alternate install disk - instead opting to just use the desktop install on /dev/hda, then manually partitioning and setting up raid on the cmd line similar to http://lists.centos.org/pipermail/centos/2005-March/003813.html. I've used this technique many times on a wide array of hardware and distros, and it has always worked for me.

The problem I see is that when I boot off the RAID1, it wigs out and I get dumped into the BusyBox initramfs prompt. I know that the initramfs has to have the modules required to mount the root filesystem in it - and I've made sure of that. My problem now is that I seem to be able to either get the promise controllers to show up or the onboard Via, but not both at the same time which of course precludes both RAID arrays from working/booting properly.

The weird thing is that if I switch hda and hdc back around again to boot normally without RAID, everything returns to working and the drives all show up properly. I don't understand this, particularly because the initrds are identical in both cases.

It seems like feisty has alot of changes in how IDE devices are handled. Can anyone give me any hints on additional things to try and also tell me how does the kernel determine what order to load the modules? It seems like in the case of root=/dev/md0 it must be loading them in a different order than root=/dev/hda1 - and this may be causing a conflict.

Revision history for this message
Brian Murray (brian-murray) wrote :

Bullcreek - have you tried booting with the kernel option 'break=mount' and manually assembling your RAID array?

Revision history for this message
BullCreek (jeff-openenergy) wrote :

I did, but still the same thing, I think because break=mount doesn't come into play until later in initrd.

FWIW, I reinstalled the first disk with yesterdays daily build, and I still get the same thing (namely when I boot off RAID, /dev/hda* and /dev/hdc* don't show up at all in /proc/partitions which of course keeps the RAID arrays from properly assembling. The partitions on the two promise controllers show up fine.

Finally, it dawned on me, why don't you look at the /proc/kmsg and compare RAID versus non-RAID.

A few problems showed up:

1. It appears that in the RAID setup, the kernel hangs trying to initialize the onboard vt8237a controller that hda and hdc are attached to (which driver is built into the fesity kernel). I know this, because if I do 'more /proc/kmsg' that is as far as it goes (and in fact, when it gets to the last line, more seems to hang locking the whole initramfs session requiring a hard reboot).

2. A related bug/feature that shows up in either boot case, is that mdadm tries to assemble the arrays twice. It seems that it tries first when all the fake SCSI devices are found, and again later when it finds some IDE devices that also happen to be part of the array. Maybe originally mdadm wasn't intended to work with both IDE and SCSI drives in the same array (but now it seems it must because of the way modern kernels blur the distinction between IDE/SATA/SCSI drives)??? This seems to be nonfatal, but results in unnecessary starting/stopping of the array(s) during boot to add additional member drives.

I'm no kernel guy, but it seems there is a conflict somewhere in the modules involved and the built-in vt8237a driver that putting root=/dev/md0 versus root=/dev/hda1 somehow triggers. I'm thinking about posting a link to this thread on the mdadm mailing list unless someone here has some bright ideas.

P.S. It also appears intermittent similar to what the first poster was saying. 99% of the time booting of raid, I get /dev/sd[a-d] and no /dev/hda or /dev/hdc, but one time I happened to notice that it was reversed - no sd[a-d] but did get /dev/hda and /dev/hdc.

Revision history for this message
BullCreek (jeff-openenergy) wrote : Re: [Bug 96511] Re: Feisty beta1 raid is broken

I had a XenExpress 3.2 cd handy to boot in this same machine, and one thing I note is that it assigns the same hardware/drives significantly differently.

XenExpress 2.6.16.38 Feisty 2.6.20.13

hd[a-d] Promise Controller 1 hd[a-d] Onboard VIA controller
hd[e-h] Promise Controller 2 sd[ab] Promise Controller 1
hd[i-j] Onboard VIA controller sd[cd] Promise Controller 2

The boards BIOS has on option to set the hard disk order, and it is set
with as follows:

onboard master: 1st
onboard slave: 2nd
promise1 master: 3rd
promise1 slave: 4th
promise2 master: 5th
promise2 slave: 6th

Don't know if this helps, but thought I would provide it as additional info. Is there a chance that perhaps the kernel code in the newer Feisty kernel that reads the BIOS drive order is provided as a module and somehow isn't included in my initrd image??? If so, what is the name of that module please???

Revision history for this message
BullCreek (jeff-openenergy) wrote :

I think I may have found it. Attempting to boot from RAID, /proc/ide/ide0/hda/driver is "ide-default version 0.9.newide" whereas booting successfully via non-RAID, /proc/ide/ide0/hda/driver is "ide-disk version 1.18".

Googling "ide-default ide-disk" reveals some bugs/race conditions that were submitted as a patch back in 2006, but I don't know if the patch was accepted or perhaps something got broken again.

Hopefully this will be enough to fix it. Let me know if there is anything else I can provide.

Revision history for this message
Brian Murray (brian-murray) wrote :

There have been some recent updates to udev and mdadm that may resolve this issue. Could you please try updating your system and rebooting? Thanks again.

Revision history for this message
BullCreek (jeff-openenergy) wrote :

Tried it, no joy. If I get time tomorrow I'm going to experiment with manual changes to the initrd to see if I can work around the module problem that way. I looked, and while I thought I remembered an mdadm mailing list, it seems my memory was faulty or it no longer exists. If you have any other ideas for where I might post this for answers, I'm all ears. I've invested enough time in it now that I'd like to get it workingl

Revision history for this message
marcw (marcw) wrote :

Brian,
I started from scratch tonight building the arrays again as I spec'd in the first post using the beta1 server install disk.. After a successful install and a few reboots I was able to get a normal prompt from which I performed a apt-get dist-upgrade. I noticed this time that I was prompted in the middle of the upgade to choose which arrays were to become active. It defaulted to "all" and that's the way I set it.

However, after a reboot, it once again fails.

Just to make sure I'm clear about the problem, when it fails - which is most of the time - here's an approximation of the screen info:

Loading please wait
mdadm: no devices listed in conf file were found
3-4 modprobe usage statements
4 mount statements no such file or dir
target filesystems doesn't have /sbin/init
Busybox info
can't access tty: job control turned off
(initramfs) prompt

Even on those rare occasions when it actually does boot successfully, I still see the "mdadm: no devices listed..." line.

When it fails it always looks pretty much like the screen approximation (above) dumping me at the initramfs prompt. From there, /proc/mdstat always says that my 3 arrays are active.

As an aside, If I'm understanding many of the other raid bug posters they *have* to insert a break=mount statement in order to arrive at the initramfs prompt. That has never been the case with this machine. I've *always* been dumped to a iniramfs prompt after installing, both before and after the dist-upgrade.

Revision history for this message
marcw (marcw) wrote :

I tried the 04/06 (or was it 04/05?) daily build of the AMD64 Ubuntu Server just to see if there's been any improvement. No change. Still broke.

Then just on a lark, and because I didn't have anything better to do, I loaded up the Debian Etch Rc2 AMD64 netinstall daily build (04/06) just to see what would happen. Using the exact same partition and array design as in my original post, it installed fine and booted great the first time and all subsequent reboots. I have no idea if this information is helpful or not. I did notice that their kernel build is 2.6.18-4.

Revision history for this message
marcw (marcw) wrote :

Well, it was time to try something else. So I grabbed the daily build (04/09) of the i386 Ubuntu Server instead of my usual AMD64. Unfortunately I had exactly the same luck with that one as I've had with the others.

Revision history for this message
marcw (marcw) wrote :

Tried the 04/10 alternate daily AMD64 tonight hoping that something's been fixed. Nope, Still broke. This isn't looking good for using Feisty for my server.

Revision history for this message
marcw (marcw) wrote :

Trying once again, I grabbed the latest AMD64 Server daily (04/11.1) but to no avail - it still fails. This time I'm including screen photos in case it can help diagnose wherever the heck is happening.

The first 8 prints include the extent of the terminal buffer up to the initramfs prompt.
Then an immediate mdstat output
Then 4 pix of my computer setup.

Just to recap, what consistently fails in Feisty (see original bug post) is successful in
Dapper
Edgy
Debian Etch

New Asrock ALIVENF6G-DVI motherboard with latest bios
New Seagate 320GB sata2 (4 total)
New 1GB DDR2 mem (2 total) - dual channel

Revision history for this message
marcw (marcw) wrote :

Not positive if the package should be mdadm or not.

Revision history for this message
BullCreek (jeff-openenergy) wrote :

I agree. I don't really think the problem is with mdadm - but instead with the order that the disk and controller drivers are loaded in feisty. Don't know what module controls that though. Hopefully this bug will be addressed before release!

----- Original Message -----
From: "marcw" <email address hidden>
To: <email address hidden>
Sent: Thursday, April 12, 2007 2:31:46 PM (GMT-0600) US/Central
Subject: [Bug 96511] Re: Feisty beta1 raid is broken

Not positive if the package should be mdadm or not.

** Changed in: mdadm (Ubuntu)
Sourcepackagename: None => mdadm

--
Feisty beta1 raid is broken
https://bugs.launchpad.net/bugs/96511
You received this bug notification because you are a direct subscriber
of the bug.

Revision history for this message
John Williams (jswillms) wrote :

I am seeing this problem as well...

To continue the boot, I mount -t ext3 /dev/md0 /root and hit CTL-D, then the system boots. I installed the alternate beta, and installed all of the updates. The mdadm update ran a update-initramfs command to update the initrd.img files, but still fails the same way. Was hoping that the dailys would be better, but based on the above comments, that appears not the case.

I see that it complains about a missing /etc/fstab. Could that be the problem? How can it mount root if it does not know where it is?

John

Revision history for this message
Mark Lord (launchpad-rtr) wrote :

I had very similar problems (bug 106238). These were "cured" when I force booted with "init=/bin/sh" and then ran "/sbin/init" from the bash prompt. After Gnome came up, I then did apt-get update/upgrade, and the installation of the newest initramfs-tools update fixed my RAID boot issues.

But I don't know if the installer is still buggy? Wonder if the initramfs-tools update also fixes the installer ??

The initial RAID setup was done during installation (from the alternate CD) using these instructions as a starting point: http://www.debian-administration.org/articles/512

Good luck, all.

Revision history for this message
John Williams (jswillms) wrote :

All of my problems are resolved with the latest -15 kernel...

Revision history for this message
marcw (marcw) wrote :

Great news! Using the 04/14 Daily Build, all of my previous difficulries have vanished. All the arrays that I built (I've tried 2 different schemes) now boot flawlessly.

Not sure what the culprit was but it was something that changed between the 04/12 build and today.

Changed in mdadm:
status: Needs Info → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.