mdadm + dm-raid: overrides previous devices due to good homehost

Bug #135391 reported by lwb
2
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

System
------
OS: Ubuntu 7.04 Feisty Fawn
Kernel: 2.6.20-16-server (latest update as of bug report)
Software:
   - ii dmsetup 1.02.08-1ubuntu10 The Linux Kernel Device Mapper userspace lib
   - ii mdadm 2.5.6-7ubuntu5 tool to administer Linux MD arrays (software
   - ii cryptsetup 1.0.4+svn26-1ubuntu2 configures encrypted block devices
   - ii lvm-common 1.5.20ubuntu12 The Logical Volume Manager for Linux (common
   - ii lvm2 2.02.06-2ubuntu9 The Linux Logical Volume Manager

Hardware Configuration:
   - Motherboard: ASUS M2N4-SLI ACPI BIOS Revision 0301 (output from dmidecode)
   - Storage Controllers:
        - 00:06.0 IDE interface: nVidia Corporation CK804 IDE (output from lspci -vv) [ONBOARD CONTROLLER = libata/ide_disk]
             - 2 x IDE interfaces (1 drive online)
        - 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (output from lspci -vv) [ONBOARD CONTROLLER = sata_nv)
             - 4 x SATA II interfaces (4 drives online)
        - 04:00.0 SCSI storage controller: Triones Technologies, Inc. Unknown device 2300 (output from lspci -vv) [HPT ROCKETRAID 2310 PCI RAID CONTROLLER = rr2310_00]
             - 4 x SATA II interfaces (1 drive online)

Problem
-------
1. Default OS installed to IDE drive ( root filesystem, swap, using default Ubuntu configuration including LVM2) encountered no problems.
2. apt-get update && apt-get upgrade (including kernel). [reboot]
3. Downloaded, compiled and installed module for HPT RR2310 drivers from: http://www.highpoint-tech.com/BIOS_Driver/rr231x_00/Linux/rr231x_0x-linux-src-v2.1-081507-0256.tar.gz. Drives recognised by kernel.
4. Created Software RAID5 set as per general guidelines at: http://linuxgazette.net/140/pfeiffer.html to create storage stack as follows:

    Software RAID-5 (mdadm) 4 disks => Device-mapper Crypto Pseudo-device (dm-crypt) => Device-mapper Logical Volume Manager (LVM2) => Extended 3 Filesystem (ext2fs+journaling) to be mounted in /crypto [reboot]

5. Tested working configuration from boot-up. (md1 device active/available/online). All components functional. Plugged in and brought online 1 hot-swap device on sata_nv controller, partitioned, linux raid autodetect type.
6. Used mdadm to grow array to 5 devices, and allowed time to rebuild. Monitored /proc/mdstat.
7. Made additional size available to system by supplying grow/resizing options to cryptsetup, LVM2 and resize2fs.
8. Mounted md1 device supplied, cryptsetup LUKS password, everything functional 100% for 12 days.
9. Rebooted server. md1 device not active. Rebooted again. Same.

Troubleshooting
---------------
- dm-raid is used for both sata_nv and rr2310_00 controller support.
- I am inclined to believe that this is being caused by some sort of ordering issue (as I soon as I brought online the 'disk5' it worked, but after rebooting, it didn't come back up automatically)
- It is worth reiterating that the raid set spans across two (2) dm-raid controllers.
- If I bring the server up with all disks plugged in (just as it was before problem started occuring), 'disk5' (/dev/sde1 that sits on 2nd controller - sata_nv), the software raid array (/dev/md1) fails to become active after bootup and /dev/sde1 is the only device visible in /proc/mdstats
- If I bring the server up without 'disk5' online, the software raid array (/dev/md1) fails to become active after boot-up and /dev/sda1 to /dev/sdd1 are visible in /proc/mdstats (not /dev/sde1).
- If I hot-swap plug-in 'disk5' after bringing the server up without it online, then /proc/mdstats removes all other drives in array previously visible, and replaces it with single sde1 'disk5' disk.

Logs
----
The logs display when booting up without 'disk5':

   Aug 29 07:18:14 FeistyFawn kernel: [ 38.907396] md: md1 stopped.
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.967943] md: bind<sdb1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.968030] md: bind<sdc1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.968107] md: bind<sdd1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.968183] md: bind<sda1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989506] md: md1 stopped.
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989515] md: unbind<sda1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989523] md: export_rdev(sda1)
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989541] md: unbind<sdd1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989544] md: export_rdev(sdd1)
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989552] md: unbind<sdc1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989556] md: export_rdev(sdc1)
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989563] md: unbind<sdb1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.989567] md: export_rdev(sdb1)
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.997016] md: bind<sdb1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.997103] md: bind<sdc1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.997179] md: bind<sdd1>
   Aug 29 07:18:14 FeistyFawn kernel: [ 38.997255] md: bind<sda1>

   root@FeistyFawn:/var/log# cat /proc/mdstat
   Personalities : [raid6] [raid5] [raid4]
   md1 : inactive sda1[0](S) sdd1[3](S) sdc1[2](S) sdb1[1](S)
         1562352896 blocks

   unused devices: <none>

Then after hot-plugging 'disk5' after booting, it generates constant "md: array md1 already has disks!" messages:

   Aug 29 07:19:06 FeistyFawn kernel: [ 436.437594] md: array md1 already has disks!
   Aug 29 07:19:06 FeistyFawn kernel: [ 436.440356] md: array md1 already has disks!
   Aug 29 07:19:06 FeistyFawn kernel: [ 436.443079] md: array md1 already has disks!
   Aug 29 07:19:06 FeistyFawn kernel: [ 436.445802] md: array md1 already has disks!
   Aug 29 07:19:06 FeistyFawn kernel: [ 436.455818] md: array md1 already has disks!

Until I shutdown mdadm:

root@FeistyFawn:/var/log# /etc/init.d/mdadm-raid stop
 * Stopping MD array md1...
   ...done.

   root@FeistyFawn:/var/log# tail -f kern.log
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298524] md: md1 stopped.
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298533] md: unbind<sda1>
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298541] md: export_rdev(sda1)
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298563] md: unbind<sdd1>
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298566] md: export_rdev(sdd1)
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298579] md: unbind<sdc1>
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298584] md: export_rdev(sdc1)
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298595] md: unbind<sdb1>
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.298599] md: export_rdev(sdb1)
   Aug 29 07:20:02 FeistyFawn kernel: [ 492.314312] md: bind<sde1>

   root@FeistyFawn:/var/log# cat /proc/mdstat
   Personalities : [raid6] [raid5] [raid4]
   md1 : inactive sde1[4](S)
         390588224 blocks

   unused devices: <none>

It appears mdadm-raid is still running! Shut it down again:

   root@FeistyFawn:~# /etc/init.d/mdadm-raid stop
   root@FeistyFawn:~# cat /proc/mdstat
   Personalities : [raid6] [raid5] [raid4]
   unused devices: <none>

   Aug 29 07:27:31 FeistyFawn kernel: [ 940.500060] md: md1 stopped.
   Aug 29 07:27:31 FeistyFawn kernel: [ 940.500069] md: unbind<sde1>
   Aug 29 07:27:31 FeistyFawn kernel: [ 940.500078] md: export_rdev(sde1)

Now try to re-assemble the array (again):

   root@FeistyFawn:~# mdadm -A /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
   mdadm: /dev/sde1 overrides previous devices due to good homehost
   mdadm: /dev/md1 assembled from 1 drive - not enough to start the array.
   root@FeistyFawn:~# cat /proc/mdstat
   Personalities : [raid6] [raid5] [raid4]
   md1 : inactive sde1[4](S)
         390588224 blocks

   unused devices: <none>

   Aug 29 07:30:39 FeistyFawn kernel: [ 1128.919022] md: md1 stopped.
   Aug 29 07:30:39 FeistyFawn kernel: [ 1128.934855] md: bind<sde1>

System Health
-------------
The disks themselves (including superblocks) appear healthy and fine:

 # mdadm -E /dev/md1 /dev/sd?1

 mdadm: No md superblock detected on /dev/md1.
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e38231e0:311e1ca0:306c4798:deffdddf
  Creation Time : Tue Aug 7 06:55:52 2007
     Raid Level : raid5
    Device Size : 390588224 (372.49 GiB 399.96 GB)
     Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Aug 29 22:09:57 2007
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 72143494 - correct
         Events : 0.259758

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1

   0 0 8 1 0 active sync /dev/sda1
   1 1 8 17 1 active sync /dev/sdb1
   2 2 8 33 2 active sync /dev/sdc1
   3 3 8 49 3 active sync /dev/sdd1
   4 4 8 81 4 active sync /dev/.static/dev/sdf1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e38231e0:311e1ca0:306c4798:deffdddf
  Creation Time : Tue Aug 7 06:55:52 2007
     Raid Level : raid5
    Device Size : 390588224 (372.49 GiB 399.96 GB)
     Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Aug 29 22:09:57 2007
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 721434a6 - correct
         Events : 0.259758

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1

   0 0 8 1 0 active sync /dev/sda1
   1 1 8 17 1 active sync /dev/sdb1
   2 2 8 33 2 active sync /dev/sdc1
   3 3 8 49 3 active sync /dev/sdd1
   4 4 8 81 4 active sync /dev/.static/dev/sdf1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e38231e0:311e1ca0:306c4798:deffdddf
  Creation Time : Tue Aug 7 06:55:52 2007
     Raid Level : raid5
    Device Size : 390588224 (372.49 GiB 399.96 GB)
     Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Aug 29 22:09:57 2007
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 721434b8 - correct
         Events : 0.259758

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 2 8 33 2 active sync /dev/sdc1

   0 0 8 1 0 active sync /dev/sda1
   1 1 8 17 1 active sync /dev/sdb1
   2 2 8 33 2 active sync /dev/sdc1
   3 3 8 49 3 active sync /dev/sdd1
   4 4 8 81 4 active sync /dev/.static/dev/sdf1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e38231e0:311e1ca0:306c4798:deffdddf
  Creation Time : Tue Aug 7 06:55:52 2007
     Raid Level : raid5
    Device Size : 390588224 (372.49 GiB 399.96 GB)
     Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Aug 29 22:09:57 2007
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 721434ca - correct
         Events : 0.259758

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1

   0 0 8 1 0 active sync /dev/sda1
   1 1 8 17 1 active sync /dev/sdb1
   2 2 8 33 2 active sync /dev/sdc1
   3 3 8 49 3 active sync /dev/sdd1
   4 4 8 81 4 active sync /dev/.static/dev/sdf1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e38231e0:311e1ca0:769beeee:fa1c5dec (local to host FeistyFawn)
  Creation Time : Tue Aug 7 06:55:52 2007
     Raid Level : raid5
    Device Size : 390588224 (372.49 GiB 399.96 GB)
     Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

    Update Time : Tue Aug 29 22:09:57 2007
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d3605c4f - correct
         Events : 0.259758

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 4 8 81 4 active sync /dev/.static/dev/sdf1

   0 0 8 1 0 active sync /dev/sda1
   1 1 8 17 1 active sync /dev/sdb1
   2 2 8 33 2 active sync /dev/sdc1
   3 3 8 49 3 active sync /dev/sdd1
   4 4 8 81 4 active sync /dev/.static/dev/sdf1

Regards, hope this helps!

Revision history for this message
lwb (lwb) wrote :

Also worth mentioning (not explicitly mentioned above), it is a default configuration, including mdadm.conf:

root@FeistyFawn:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Tue, 07 Aug 2007 06:51:48 +1000
# by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $

Revision history for this message
lwb (lwb) wrote :

After further investigation today, and discussion with colleague, it is possible that this has been caused by creating the array manually using the mdadm command, then subsequently, when rebooted detects and starts the array.
However, when adding the new drive and growing the array, it reads in the mdadm command, which mentions HOMEHOST <system> parameter, and it is assigned to the new device, which after a reboot refuses to play with the rest of the array... Possibility but not certain...

Indicators are that the UUID was different (even though it was added to the array (5 devices, 0 spare) for 12 days, and that the 'disk5' also mentions (local to host FeistyFawn) which none of the other drives mention.

Revision history for this message
xteejx (xteejx) wrote :

Sorry no-one dealt with your issue. Unfortunately, Feisty is no longer supported, can you reproduce this is a supported version of Ubuntu, preferably Jaunty? Thank you.

Changed in mdadm (Ubuntu):
status: New → Incomplete
Revision history for this message
xteejx (xteejx) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in mdadm (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.