[EDGY] Regression: can't boot from lvm root on raid anymore

Bug #52740 reported by Fabio Massimo Di Nitto
4
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Fix Released
High
Fabio Massimo Di Nitto

Bug Description

I have a sparc system with 2 scsi disks sda/sdb and a bunch of partitions to have /boot on md0 and lvm on top of md1. Lvm has a few lv including root.

From the boot log i can see that local-top scripts are executed "too fast" (before the sda/sdb appear) and the md script can't start the raids properly.

I get of course dropped to the busybox shell where i can happily do:

# /scripts/local-top/md
mdadm: /dev/md0 has been started with 2 drives.
mdadm: /dev/md1 has been started with 2 drives.
# /scripts/local-top/lvm
  3 logical volume(s) in volume group "sparc-vg" now active
# exit
 [rest of the boot]

as usual please let me know what info are required and i am as usual ready to test patches.

Fabio

Revision history for this message
Marty (marty-supine) wrote :

I have the same issue but my solution isn't as simple.

Using hints from the 'init' script I can bring my system up with:

# modprobe dm_mod
# /scripts/locat-top/lvm
  1 logical volume(s) in volume group "foobar" now active
# mount -t ext3 /dev/mapper/foobar-vg /root
  [ 134.664391] kjournald starting. Commit intervals 5 seconds
  [ 134.710296] EXT3 FS on dm-0, internal journal
  [ 134.710358] EXT3-fs: mounted filesystem with ordered data mode
# mount -n -o move /sys /root/sys
# mount -n -o move /proc /root/proc
# mount -n -o move /dev /root/dev
# chroot /root /bin/bash
  bash: no job control in this shell
bash# telinit 2
bash# exit
# exit

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Yes i found the reason finally yesterday. It's a race condition between devices that forms the raid and contain the lvm root to appear in /dev and the time when mdrun and lvm are executed in the initramfs. lvm is already partially fixed (for root on lvm only). I need to integrate a more general fix for this case.

Fabio

Changed in initramfs-tools:
assignee: nobody → fabbione
importance: Untriaged → High
status: Unconfirmed → Confirmed
Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

mdadm (2.4.1-6ubuntu5) edgy; urgency=low

  * Modify initrafms scripts to wait for devices to appear if they are not there
    yet when the script is executed on boot:
    - copy generated mdadm.conf in the initramfs (we need the UUIDs)
    - modify local-top script to wait for all UUIDs to appear before executing
      mdrun or wait a max of 3 minutes before giving up.
  (Closes Ubuntu: #52740)

  As a side effects of the above fix:

  * Avoid FileSystem corruption if root is on lvm on raid and raid is not
    started. LVM will find the devices that are part of the raid and use them
    bringing the data off-sync.

  Limitations:

  * It might require initramfs updates if raid UUIDs are changed. This is
    a rare corner case of relocating raids and usually who does that knows
    what he is doing.

  * We do not check if all devices for a certain raids are available but
    the machine might be booting in degraded mode for recovery and we
    shouldn't be blocking on that. Given that there is no way to know that
    this check will not be performed.

 -- Fabio M. Di Nitto <email address hidden> Tue, 26 Sep 2006 09:56:01 +0200

Changed in mdadm:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.