ISST-LTE: Ubuntu14.04.4 lpar fails to boot after installation: "The disk drive for /boot is not ready yet or not present"

Bug #1540401 reported by bugproxy
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Fix Released
Critical
Mathieu Trudel-Lapierre
Trusty
Fix Released
Critical
Mathieu Trudel-Lapierre

Bug Description

== Comment: #0 - Manjunatha H R <email address hidden> - 2016-01-21 05:15:30 ==
After installing Ubuntu14.04.4 on a PowerVM lpar with "Use entire disk and setup LVM" option at Partition method installation menu, lpar fails to boot up.

Boot error:
-----------------
The disk drive for /boot is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery

Boot log:
------------
Elapsed time since release of system processors: 42348 mins 4 secs
error: no suitable video mode found.
OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.2.0-25-generic (buildd@bos01-ppc64el-023) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #30~14.04.1-Ubuntu SMP Mon Jan 18 16:25:16 UTC 2016 (Ubuntu 4.2.0-25.30~14.04.1-generic 4.2.6)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/vmlinux-4.2.0-25-generic root=/dev/mapper/biglp1--vg-root ro splash quiet vt.handoff=7
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 000000000ba90000
  alloc_top : 0000000010000000
  alloc_top_hi : 0000000010000000
  rmo_top : 0000000010000000
  ram_top : 0000000010000000
instantiating rtas at 0x000000000ec10000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000baa0000 -> 0x000000000baa16b5
Device tree struct 0x000000000bab0000 -> 0x000000000bae0000
Quiescing Open Firmware ...
Booting Linux via __start() ...
 -> smp_release_cpus()
spinning_secondaries = 31
 <- smp_release_cpus()
 <- setup_system()
[ 9.209318] device-mapper: table: 252:2: multipath: error getting device
[ 9.235875] device-mapper: table: 252:2: multipath: error getting device
[ 9.350003] device-mapper: table: 252:6: multipath: error getting device
[ 9.440078] device-mapper: table: 252:6: multipath: error getting device
[ 9.499595] device-mapper: table: 252:6: multipath: error getting device
[ 9.570007] device-mapper: table: 252:6: multipath: error getting device
[ 9.689502] device-mapper: table: 252:6: multipath: error getting device
[ 9.769905] device-mapper: table: 252:6: multipath: error getting device
[ 9.829579] device-mapper: table: 252:6: multipath: error getting device
[ 9.869910] device-mapper: table: 252:6: multipath: error getting device
[ 9.929757] device-mapper: table: 252:6: multipath: error getting device
[ 9.971988] device-mapper: table: 252:6: multipath: error getting device
 * Stopping Send an event to indicate plymouth is up [ OK ]
 * Starting Mount filesystems on boot [ OK ]
 * Starting Populate /dev filesystem [ OK ]
 * Starting Populate and link to /run filesystem [ OK ]
 * Stopping Populate /dev filesystem [ OK ]
 * Stopping Populate and link to /run filesystem [ OK ]
 * Stopping Track if upstart is running in a container [ OK ]
 * Starting Signal sysvinit that the rootfs is mounted [ OK ]
 * Starting Initialize or finalize resolvconf [ OK ]
 * Starting Signal sysvinit that virtual filesystems are mounted [ OK ]
 * Starting Signal sysvinit that virtual filesystems are mounted [ OK ]
 * Starting Bridge udev events into upstart [ OK ]
 * Starting Signal sysvinit that remote filesystems are mounted [ OK ]
 * Starting device node and kernel event manager [ OK ]
 * Starting Clean /tmp directory [ OK ]
 * Stopping Clean /tmp directory [ OK ]
 * Starting load modules from /etc/modules [ OK ]
 * Starting cold plug devices [ OK ]
 * Starting log initial device creation [ OK ]
 * Stopping load modules from /etc/modules [ OK ]
 * Starting Uncomplicated firewall [ OK ]
The disk drive for /boot is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery -----> BOOT Stops here...

Manually aborting the boot (by pressing Shift+M) for manual recovery, provides the lpar prompt.

uname -a:
---------------
root@biglp1:~# uname -a
Linux biglp1 4.2.0-25-generic #30~14.04.1-Ubuntu SMP Mon Jan 18 16:25:16 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Contents of /etc/fstab:
----------------------
root@biglp1:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/mapper/biglp1--vg-root / ext4 errors=remount-ro 0 1
/dev/mapper/mpath0-part2 /boot ext2 defaults 0 2
#/dev/mapper/biglp1--vg-swap_1 none swap sw 0 0
/dev/mapper/cryptswap1 none swap sw 0 0

df output:
-------------
root@biglp1:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 3046528 512 3046016 1% /dev
tmpfs 619328 5696 613632 1% /run
/dev/mapper/biglp1--vg-root 24308868 1087836 21963152 5% /
none 64 0 64 0% /sys/fs/cgroup
none 5120 0 5120 0% /run/lock
none 3096576 0 3096576 0% /run/shm
none 102400 0 102400 0% /run/user

Steps to recreate:
------------------------
1. Install Ubuntu14.04.4 on a PowerVM lpar having multipath disks.
2. While installing choose "Use entire disk and setup LVM" at disk partitioning installation menu.
3. After installation, lpar fails to mount /boot partition.

Contact info:
----------------
Manju (<email address hidden>) A.P (<email address hidden>)

== Comment: #2 - Manjunatha H R <email address hidden> - 2016-01-21 05:24:03 ==

== Comment: #3 - Manjunatha H R <email address hidden> - 2016-01-21 05:24:45 ==

== Comment: #4 - Manjunatha H R <email address hidden> - 2016-01-21 05:25:40 ==

== Comment: #5 - Manjunatha H R <email address hidden> - 2016-01-21 05:28:15 ==

== Comment: #6 - Manjunatha H R <email address hidden> - 2016-01-21 05:29:24 ==

== Comment: #8 - Manjunatha H R <email address hidden> - 2016-01-21 05:37:27 ==
root@biglp1:~# dpkg -l |grep multipath
ii multipath-tools 0.4.9-3ubuntu7.7 ppc64el maintain multipath block device access
ii multipath-tools-boot 0.4.9-3ubuntu7.7 all Support booting from multipath devices

root@biglp1:~# dpkg -l|grep lvm
ii lvm2 2.02.98-6ubuntu2 ppc64el Linux Logical Volume Manager

== Comment: #19 - Mauricio Faria De Oliveira <email address hidden> - 2016-01-25 14:55:23 ==
The problem matches the suspicion: the LVM detection is happening before multipathd grabs the individual paths, and then the creation of the multipath map /dev/mapper/mpath0 fails, then /boot fails to mount as it's specified as /dev/mapper/mpath0-part2 in /etc/fstab:

 root@biglp1:~# pvdisplay | grep Name
   Found duplicate PV xkHFzaklbXIhfOQfI74LdjE2yPErlQtc: using /dev/sdu3 not /dev/sda3
   Found duplicate PV xkHFzaklbXIhfOQfI74LdjE2yPErlQtc: using /dev/sdf3 not /dev/sdu3
   Found duplicate PV xkHFzaklbXIhfOQfI74LdjE2yPErlQtc: using /dev/sdz3 not /dev/sdf3
   Found duplicate PV xkHFzaklbXIhfOQfI74LdjE2yPErlQtc: using /dev/sdk3 not /dev/sdz3
   Found duplicate PV xkHFzaklbXIhfOQfI74LdjE2yPErlQtc: using /dev/sdp3 not /dev/sdk3
   PV Name /dev/sdp3
   VG Name biglp1-vg

 root@biglp1:~# multipath -v3 /dev/sdp
 ...
 Jan 25 13:38:06 | 36005076308ffc54b000000000000003f: alias_prefix = mpath (internal default)
 Jan 25 13:38:06 | Found matching wwid [36005076308ffc54b000000000000003f] in bindings file. Setting alias to mpath0
 Jan 25 13:38:06 | sdp: ownership set to mpath0
 ...
 Jan 25 13:38:06 | sda: ownership set to mpath0
 ...
 Jan 25 13:38:06 | sdf: ownership set to mpath0
 ...
 Jan 25 13:38:06 | sdk: ownership set to mpath0
 ...
 Jan 25 13:38:06 | sdu: ownership set to mpath0
 ...
 Jan 25 13:38:06 | sdz: ownership set to mpath0
 ...
 Jan 25 13:38:06 | mpath0: pgfailover = -1 (internal default)
 Jan 25 13:38:06 | mpath0: pgpolicy = multibus (controller setting)
 Jan 25 13:38:06 | mpath0: selector = round-robin 0 (controller setting)
 Jan 25 13:38:06 | mpath0: features = 1 queue_if_no_path (controller setting)
 Jan 25 13:38:06 | mpath0: hwhandler = 0 (controller setting)
 Jan 25 13:38:06 | mpath0: rr_weight = 1 (controller setting)
 Jan 25 13:38:06 | mpath0: minio = 1000 (controller setting)
 Jan 25 13:38:06 | mpath0: no_path_retry = NONE (internal default)
 Jan 25 13:38:06 | pg_timeout = NONE (internal default)
 Jan 25 13:38:06 | mpath0: set ACT_CREATE (map does not exist)
 [ 213.998298] device-mapper: table: 252:6: multipath: error getting device
 [ 214.030777] device-mapper: table: 252:6: multipath: error getting device
 Jan 25 13:38:06 | mpath0: domap (0) failure for create/reload map

Looking into a patch for this.

Revision history for this message
bugproxy (bugproxy) wrote : Installation log: partman

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-135869 severity-critical targetmilestone-inin14044
Revision history for this message
bugproxy (bugproxy) wrote : Installation log: syslog

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : multipath, fdisk, lsblk, sysctl -a output from lpar

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : lvdisplay, pvdisplay and vgdisplay output from lpar

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1540401/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
bugproxy (bugproxy) wrote : Patch for booting with LVM on multipath devices

------- Comment on attachment From <email address hidden> 2016-02-03 18:58 EDT-------

Submitting the patch early for Canonical to review,
given it would be important to fix this for 14.04.4 GA media, if possible.

The patch has been verified to work by the developer on 14.04,
and is waiting on feedback from the bug reporter.

More details on its workings will be attached shortly.

Revision history for this message
bugproxy (bugproxy) wrote : Details about the problem and the solution

------- Comment on attachment From <email address hidden> 2016-02-03 18:59 EDT-------

Adding attachment w/ description and steps of problem and solution.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote : Boot log with excerpts of systemd-udevd --debug

------- Comment on attachment From <email address hidden> 2016-02-03 19:01 EDT-------

Adding an attachment (xz) of the boot log with systemd-udevd --debug running,
in order to verify the udev processing of the rules.

You may notice multipath -c is called several times only for SUBSYSTEM==BLOCK devices, and partx -d is called only for the devices part of a multpath device (ie, multipath -c has exit code 0).

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch for booting with LVM on multipath devices" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-03 19:24 EDT-------
Hi @mathieu-tl,

As we just talked on IRC, the multipathd -B patches did not help here.
I'm pasting the comments about that (sorry, I forgot to mark those as external).

[reply] [-] Comment 13 Mauricio Faria De Oliveira 2016-01-25 10:39:23 BRST
Hi Manju,

(In reply to comment #12)
> Installed the suggested packages and rebooted the lpar, issue still occurs:

Thanks for verifying.

Can I check the LPAR?

<...>

[reply] [-] Comment 17 Mauricio Faria De Oliveira 2016-01-25 15:55:04 BRST
Hm, guess I got it.

This differs slightly from the other bug.
It seems that the async discovery of LVM and multipath devices is not well serialized, causing some inconsistencies with the access to the partitions (e.g., /boot).
Checking a bit more.

<...>

[reply] [-] Comment 19 Mauricio Faria De Oliveira 2016-01-25 17:55:23 BRST
The problem matches the suspicion: the LVM detection is happening before multipathd grabs the individual paths, and then the creation of the multipath map /dev/mapper/mpath0 fails, then /boot fails to mount as it's specified as /dev/mapper/mpath0-part2 in /etc/fstab:

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 06:30 EDT-------
The test packages did not work by themselves as the LPAR uses *ibmvfc* and not ibmvscsi (sorry, I confused that), which has *async* SCSI scan.

So, this bug needs this patch *and* the multipathd/initramfs patch (which doesn't fix this by itself, as verified earlier).

I'll build v1 test packages.

...
[ 1.558539] ibmvfc: IBM Virtual Fibre Channel Driver version: 1.0.11 (April 12, 2013)
[ 1.568548] scsi host0: IBM POWER Virtual FC Adapter
[ 1.578741] ibmvfc 30000004: Partner initialization complete
[ 1.588527] ibmvfc 30000004: Host partition: maplev1, device: vfchost2 U78CB.001.WZS0249-P1-C6-T1 U8247.22L.211B3DA-V2-C12 max sectors 2048
...
Begin: Discovering multipaths ... [ 5.179191] device-mapper: multipath round-robin: version 1.0.0 loaded
[ 5.179424] device-mapper: table: 252:2: multipath: error getting device
[ 5.179431] device-mapper: ioctl: error adding target to table
[ 5.205784] device-mapper: table: 252:2: multipath: error getting device
[ 5.205803] device-mapper: ioctl: error adding target to table
[ 5.338527] device-mapper: table: 252:6: multipath: error getting device
[ 5.338558] device-mapper: ioctl: error adding target to table
[ 5.399409] device-mapper: table: 252:6: multipath: error getting device
[ 5.399440] device-mapper: ioctl: error adding target to table
[ 5.509069] device-mapper: table: 252:6: multipath: error getting device
[ 5.509100] device-mapper: ioctl: error adding target to table
[ 5.569473] device-mapper: table: 252:6: multipath: error getting device
[ 5.569518] device-mapper: ioctl: error adding target to table
[ 5.669295] device-mapper: table: 252:6: multipath: error getting device
[ 5.669329] device-mapper: ioctl: error adding target to table
[ 5.759009] device-mapper: table: 252:6: multipath: error getting device
[ 5.759047] device-mapper: ioctl: error adding target to table
[ 5.858710] device-mapper: table: 252:6: multipath: error getting device
[ 5.858741] device-mapper: ioctl: error adding target to table
[ 5.899505] device-mapper: table: 252:6: multipath: error getting device
[ 5.899536] device-mapper: ioctl: error adding target to table
[ 6.008594] device-mapper: table: 252:6: multipath: error getting device
[ 6.008631] device-mapper: ioctl: error adding target to table
[ 6.069401] device-mapper: table: 252:6: multipath: error getting device
[ 6.069427] device-mapper: ioctl: error adding target to table
done.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 07:25 EDT-------
Testing the v1 test packages from:
http://ausgsa.ibm.com/~mauricfo/public/bugs/bz135869/v1/

They include these patches (debdiffs in there):
- kpartx with spaces
- multipathd on initramfs
- lvm on multipath devices

It didn't work; getting errors from partx:
'/usr/bin/partx -d /dev/sdaa'(err) 'partx: specified range <1:0> does not make sense'

I'll try a few more changes.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 08:07 EDT-------
All fixed.

(In reply to comment #37)
> Testing the v1 test packages from:
> http://ausgsa.ibm.com/~mauricfo/public/bugs/bz135869/v1/

> It didn't work; getting errors from partx:
> '/usr/bin/partx -d /dev/sdaa'(err) 'partx: specified range <1:0> does not
> make sense'
>
> I'll try a few more changes.

Fixed by specifying a partition range with "kpartx -d --nr 1-1024". e.g.,
# sed 's/partx -d/& --nr 1-1024/' -i /lib/udev/rules.d/12-dm-mpath-lvm.rules

Also had to make sure the device with LVM (/dev/sdp) was added into /etc/multipath/wwids,
as it was never multipathed (due to LVM locking that path), it was never added.
(wondering why it wasn't copied from the installation.. perhaps a problem w/ LVM/multipath on installer too)

With those 2 fixes, the system boots correctly.
I'll build v2 packages.

root@biglp1:~# mount | grep /boot
/dev/mapper/mpath0-part2 on /boot type ext2 (rw)

root@biglp1:~# mount | grep '/ '
/dev/mapper/biglp1--vg-root on / type ext4 (rw,errors=remount-ro)

root@biglp1:~# lvm pvdisplay | grep Name
PV Name /dev/mapper/mpath0-part3
VG Name biglp1-vg

Mathew Hodson (mhodson)
affects: ubuntu → multipath-tools (Ubuntu)
Changed in multipath-tools (Ubuntu):
importance: Undecided → Critical
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 09:52 EDT-------
(In reply to comment #38)
> With those 2 fixes, the system boots correctly.
> I'll build v2 packages.

Available in
http://ausgsa.ibm.com/~mauricfo/public/bugs/bz135869/v2

With those, the boot finishes successfully.
I'll submit the new patches.

root@biglp1:~# mount | grep /boot
/dev/mapper/mpath0-part2 on /boot type ext2 (rw)

root@biglp1:~# mount | grep ' / '
/dev/mapper/biglp1--vg-root on / type ext4 (rw,errors=remount-ro)

root@biglp1:~# lvm pvdisplay | grep Name
PV Name /dev/mapper/mpath0-part3
VG Name biglp1-vg

root@biglp1:~# dpkg -l | grep mpathlvm
ii kpartx 0.4.9-3ubuntu7.7mpathlvm3 ppc64el create device mappings for partitions
ii kpartx-boot 0.4.9-3ubuntu7.7mpathlvm3 all Provides kpartx during boot
ii multipath-tools 0.4.9-3ubuntu7.7mpathlvm3 ppc64el maintain multipath block device access
ii multipath-tools-boot 0.4.9-3ubuntu7.7mpathlvm3 all Support booting from multipath devices
ii multipath-tools-dbg 0.4.9-3ubuntu7.7mpathlvm3 ppc64el maintain multipath block device access - debugging symbols

Revision history for this message
bugproxy (bugproxy) wrote : [PATCH V2] booting with LVM on multipath devices

------- Comment on attachment From <email address hidden> 2016-02-04 10:20 EDT-------

Hi Canonical,

Please consider this patch V2 for review.

It includes 2 important changes:
- The '--nr 1-1024' to the partx command.
- Built on top of the patches for multipathd/initramfs (LP #1526984) and kpartx w/ spaces (LP #1432062) -- both required.

Thanks,

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi @mathew-hodson,

In case it helps,
I believe this bug is likely to be assigned to @mathieu-tl.

We've been working on these sort of problems, and talked about a tentative 14.04.4 milestone.
Thanks!

PS. Deleted previous attachment/patch version.

Steve Langasek (vorlon)
Changed in multipath-tools (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in multipath-tools (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-04 13:36 EDT-------
FYI.
I've been testing reboots in the LPAR (biglp1, 14.04) with PATCH V2 for 20ish times now.
All boots succeeded.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

This isn't obvious to reproduce. So far, I haven't had the system fail to boot or fail to mount all partitions. I have been testing *without* multipath-tools 0.4.9-3ubuntu7.8; so not using multipathd in the initramfs.

I suppose it may be that I'm using a partitioning that happens to work?

Could you please add the output of: sudo dmsetup ls --tree -o blkdevname

No failure after 20 reboots:

mpath0-part1 <dm-2> (252:2)
 └─mpath0 <dm-0> (252:0)
    ├─ <sdb> (8:16)
    └─ <sda> (8:0)
mpath1 <dm-1> (252:1)
 ├─ <sdd> (8:48)
 └─ <sdc> (8:32)
trusty-boot <dm-6> (252:6)
 └─mpath0-part2 <dm-3> (252:3)
    └─mpath0 <dm-0> (252:0)
       ├─ <sdb> (8:16)
       └─ <sda> (8:0)
trusty-swap <dm-5> (252:5)
 └─mpath0-part2 <dm-3> (252:3)
    └─mpath0 <dm-0> (252:0)
       ├─ <sdb> (8:16)
       └─ <sda> (8:0)
trusty-root <dm-4> (252:4)
 └─mpath0-part2 <dm-3> (252:3)
    └─mpath0 <dm-0> (252:0)
       ├─ <sdb> (8:16)
       └─ <sda> (8:0)

My test system is indeed a ppc64el qemu VM using spapr-vscsi.

Will setup a new system using snapshot to test with multipath-tools 0.4.9-3ubuntu7.8.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 16:52 EDT-------
Hi @mathieu-tl

(In reply to comment #44)
> This isn't obvious to reproduce. So far, I haven't had the system fail to
> boot or fail to mount all partitions. I have been testing *without*
> multipath-tools 0.4.9-3ubuntu7.8; so not using multipathd in the initramfs.

> My test system is indeed a ppc64el qemu VM using spapr-vscsi.

I could reproduce the problem in that scenario by using break=multipath in the kernel cmdline
(which forces LVM udev rules to run before multipath discovery).

> I suppose it may be that I'm using a partitioning that happens to work?

I'd expect it to fail in any case with LVM on top of multipath devices,
and LVM scan/volume activation happening before multipath discory
(root cause: LVM locking an individual path before multipath takes it).

> Could you please add the output of: sudo dmsetup ls --tree -o blkdevname

Sure.

qemu-kvm guest w/ ibm-vscsi:

ubuntu@mauricfo4:~$ sudo dmsetup ls --tree -o blkdevname
[sudo] password for ubuntu:
mpath0-part2 <dm-2> (252:2)
??mpath0 <dm-0> (252:0)
?? <sdb> (8:16)
?? <sda> (8:0)
mpath0-part1 <dm-1> (252:1)
??mpath0 <dm-0> (252:0)
?? <sdb> (8:16)
?? <sda> (8:0)
mauricfo4--vg-swap_1 <dm-5> (252:5)
??mpath0-part3 <dm-3> (252:3)
??mpath0 <dm-0> (252:0)
?? <sdb> (8:16)
?? <sda> (8:0)
mauricfo4--vg-root <dm-4> (252:4)
??mpath0-part3 <dm-3> (252:3)
??mpath0 <dm-0> (252:0)
?? <sdb> (8:16)
?? <sda> (8:0)

powervm lpar w/ ibmvfc:

root@biglp1:~# dmsetup ls --tree -o blkdevname
mpath0-part2 <dm-6> (252:6)
??mpath0 <dm-0> (252:0)
?? <sdz> (65:144)
?? <sdu> (65:64)
?? <sdp> (8:240)
?? <sdk> (8:160)
?? <sdf> (8:80)
?? <sda> (8:0)
mpath2 <dm-2> (252:2)
?? <sdab> (65:176)
?? <sdw> (65:96)
?? <sdr> (65:16)
?? <sdm> (8:192)
?? <sdh> (8:112)
?? <sdc> (8:32)
mpath0-part1 <dm-5> (252:5)
??mpath0 <dm-0> (252:0)
?? <sdz> (65:144)
?? <sdu> (65:64)
?? <sdp> (8:240)
?? <sdk> (8:160)
?? <sdf> (8:80)
?? <sda> (8:0)
mpath1 <dm-1> (252:1)
?? <sdaa> (65:160)
?? <sdv> (65:80)
?? <sdq> (65:0)
?? <sdl> (8:176)
?? <sdg> (8:96)
?? <sdb> (8:16)
biglp1--vg-root <dm-8> (252:8)
??mpath0-part3 <dm-7> (252:7)
??mpath0 <dm-0> (252:0)
?? <sdz> (65:144)
?? <sdu> (65:64)
?? <sdp> (8:240)
?? <sdk> (8:160)
?? <sdf> (8:80)
?? <sda> (8:0)
mpath4 <dm-4> (252:4)
?? <sdad> (65:208)
?? <sdy> (65:128)
?? <sdt> (65:48)
?? <sdo> (8:224)
?? <sdj> (8:144)
?? <sde> (8:64)
biglp1--vg-swap_1 <dm-9> (252:9)
??mpath0-part3 <dm-7> (252:7)
??mpath0 <dm-0> (252:0)
?? <sdz> (65:144)
?? <sdu> (65:64)
?? <sdp> (8:240)
?? <sdk> (8:160)
?? <sdf> (8:80)
?? <sda> (8:0)
mpath3 <dm-3> (252:3)
?? <sdac> (65:192)
?? <sdx> (65:112)
?? <sds> (65:32)
?? <sdn> (8:208)
?? <sdi> (8:128)
?? <sdd> (8:48)

------- Comment From <email address hidden> 2016-02-04 16:53 EDT-------
@mathieu-tl

(In reply to comment #45)
> > My test system is indeed a ppc64el qemu VM using spapr-vscsi.
>
> I could reproduce the problem in that scenario by using break=multipath in
> the kernel cmdline
> (which forces LVM udev rules to run before multipath discovery).

Correction: break=pre-multipath

Revision history for this message
bugproxy (bugproxy) wrote : [PATCH V2] booting with LVM on multipath devices

------- Comment on attachment From <email address hidden> 2016-02-04 10:20 EDT-------

Hi Canonical,

Please consider this patch V2 for review.

It includes 2 important changes:
- The '--nr 1-1024' to the partx command.
- Built on top of the patches for multipathd/initramfs (LP #1526984) and kpartx w/ spaces (LP #1432062) -- both required.

Thanks,

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (4.2 KiB)

Hi @mathieu-tl,

Here is the patch for Xenial.

Differences from the patch for Trusty:
1) install udev rule with priority 56, so it's run after 55-scsi-sg3_id; this way the paths actually have scsi id udev attributes defined by the time 'multipath -c' runs, and now it works fine (fixes the issue you mentioned on IRC).
2) remove the old multipath discovery udev rule (just like you applied for trusty, on multipath-tools 0.4.9-3ubuntu7.5)
3) debian/initramfs/init-top already exists, so just insert the snippet to load the module (rather than create the file).

Test-case:

1) Boot a qemu-kvm guest [1] w/ xenial (LVM on top of multipath) w/ the break=pre-multipath boot option
2) exit all 3 initramfs prompts
3) this should make LVM scan run before multipath discovery; the latter fails to create the devmap, thus mpartX-part2 (for /boot) is not present, and the boot is interrupted.

 ...
 (initramfs) exit
 ...
 (initramfs) exit
 ...
 (initramfs) exit
 ...
 [ 82.361944] device-mapper: table: 252:3: multipath: error getting device
 [ 82.362292] device-mapper: table: 252:2: multipath: error getting device
 [ 82.399493] device-mapper: table: 252:3: multipath: error getting device
 [ OK ] Found device /dev/mapper/mauricfo4--vg-swap_1.
   Activating swap /dev/mapper/mauricfo4--vg-swap_1...
 [ OK ] Activated swap /dev/mapper/mauricfo4--vg-swap_1.
 [ OK ] Reached target Swap.
 [ TIME ] Timed out waiting for device dev-mapper-mpatha\x2dpart2.device.
 [DEPEND] Dependency failed for /boot.
 [DEPEND] Dependency failed for Local File Systems.
 [DEPEND] Dependency failed for Clean up any mess left by 0dns-up.
 [DEPEND] Dependency failed for File System Check on /dev/mapper/mpatha-part2.
 ...
 Welcome to emergency mode! After logging in, type "journalctl -xb" to view
 system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to
 try again to boot into default mode.
 Press Enter for maintenance
 (or press Control-D to continue):

 root@mauricfo4:~# lvm pvdisplay | grep Name
   Found duplicate PV hWoIFGkvc0iVrbZnzhhqxud6QTeotfmQ: using /dev/sda3 not /dev/sdb3
   Using duplicate PV /dev/sda3 without holders, ignoring /dev/sdb3
   Found duplicate PV hWoIFGkvc0iVrbZnzhhqxud6QTeotfmQ: using /dev/sda3 not /dev/sdb3
   Using duplicate PV /dev/sda3 without holders, ignoring /dev/sdb3
   PV Name /dev/sda3
   VG Name mauricfo4-vg

With the patch applied, the partition nodes are removed, LVM only scans the multipath devices, and the boot finishes successfully:

 ...
 (initramfs) exit
 ...
 (initramfs) ls -l /dev/sd*
 brw------- 1 8, 16 /dev/sdb
 brw------- 1 8, 0 /dev/sda

 initramfs) dmsetup table
 No devices found

 (initramfs) exit
 ...
 (initramfs) dmsetup table | sort
 mauricfo4--vg-root: 0 63750144 linear 252:3 2048
 mauricfo4--vg-swap_1: 0 2834432 linear 252:3 63752192
 mpatha-part1: 0 14336 linear 252:0 2048
 mpatha-part2: 0 499712 linear 252:0 16384
 mpatha-part3: 0 66590720 linear 252:0 516096
 mpatha: 0 67108864 multipath 0 0 2 1 round-robin 0 1 1 8:0 1 round-robin 0 1 1 8:16 1

 (initramfs) lvm pvdisplay | grep Name
   ...
   PV Name /dev/mapper/mpatha-part3
   VG Name ma...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-18 12:07 EDT-------
*** Bug 137290 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-22 12:30 EDT-------
*** Bug 131024 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-22 12:33 EDT-------
*** Bug 129478 has been marked as a duplicate of this bug. ***

------- Comment From <email address hidden> 2016-02-22 12:39 EDT-------
*** Bug 131422 has been marked as a duplicate of this bug. ***

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.5.0-7ubuntu15

---------------
multipath-tools (0.5.0-7ubuntu15) xenial; urgency=medium

  [ Mauricio Faria de Oliveira ]
  * Remove partition device nodes of individual paths (for LVM on multipath)
    (LP: #1540401)
    - debian/multipath-tools.dm-mpath-lvm.udev: udev rule for that.
    - debian/initramfs/hooks: copy the udev rule and partx to the initramfs.
    - debian/initramfs/init-top: load the dm-multipath module for 'multipath -c'.
    - debian/rules: install the udev rule (priority 56: after 55-scsi-sg3_id)
  * debian/rules: don't ship 95-multipath.rules udev rules anymore; they are
    not necessary with multipath-tools listening for udev events directly.

 -- Mathieu Trudel-Lapierre <email address hidden> Thu, 11 Feb 2016 19:08:14 -0500

Changed in multipath-tools (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-23 17:56 EDT-------
Hi @mathieu-tl,

Cool, thanks; glad to see the patch made Xenial.

This is still on the queue for Trusty, correct?
- there's no 'Affects' line for Trusty on the LP bug header yet.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

It's on my list, yes.

The idea however is that I haven't uploaded it because it is so annoying to test these patches from a PPA, I uploaded the changes for xenial to the archive for further testing since it looked like it worked sufficiently to not break too many use cases.

Changed in multipath-tools (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-25 12:18 EDT-------
Hi @mathieu-tl,

(In reply to comment #69)
> It's on my list, yes.
>
> The idea however is that I haven't uploaded it because it is so annoying to
> test these patches from a PPA, I uploaded the changes for xenial to the
> archive for further testing since it looked like it worked sufficiently to
> not break too many use cases.

Ok. Thanks for the update.

If that helps, I've been testing that more manually, either installing the packages at the end of the installation (cd /target/tmp && wget <...>.deb && chroot /target dpkg -i /target/tmp/*.deb), or by installing the packages after system is already installed (just manually fixing the boot problem at initramfs time break=pre-multipath for example).

Regards.

------- Comment From <email address hidden> 2016-02-25 12:21 EDT-------
Correction:

-packages at the end of the installation (cd /target/tmp && wget <...>.deb &&
-chroot /target dpkg -i /target/tmp/*.deb),

+packages at the end of the installation (cd /target/tmp && wget <...>.deb &&
+chroot /target dpkg -i /tmp/*.deb),

BTW, if you have a chance, it'd be nice to have a Trusty 'affects' row for tracking purposes. Otherwise it seems this is already resolved (despite your comment). Thanks.

Revision history for this message
bugproxy (bugproxy) wrote : multipath, fdisk, lsblk, sysctl -a output from lpar

Default Comment by Bridge

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-02 18:41 EDT-------
I verified the problem is resolved w/ the package in -proposed on the QEMU guest scenario.
Waiting on bug repoter's verification, then will mark verification-done.

Thanks

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-03-03 08:04 EDT-------
(In reply to comment #78)
> I verified the problem is resolved w/ the package in -proposed on the QEMU
> guest scenario.
> Waiting on bug repoter's verification, then will mark verification-done.
>
> Thanks

Thank you Mauricio !!

I verified the same on PowerVM Lpar scenario, issue is fixed with packages in -proposed :

root@miglp2:~# dpkg -l|grep multipath
ii multipath-tools 0.4.9-3ubuntu7.10 ppc64el maintain multipath block device access
ii multipath-tools-boot 0.4.9-3ubuntu7.10 all Support booting from multipath devices

Lpar boots up with /boot :
---------------
root@miglp2:~# df |grep boot
/dev/mapper/mpath3-part2 241965 60518 168955 27% /boot

Thanks,
Manju

Revision history for this message
bugproxy (bugproxy) wrote : Boot log with excerpts of systemd-udevd --debug

------- Comment on attachment From <email address hidden> 2016-02-03 19:01 EDT-------

Adding an attachment (xz) of the boot log with systemd-udevd --debug running,
in order to verify the udev processing of the rules.

You may notice multipath -c is called several times only for SUBSYSTEM==BLOCK devices, and partx -d is called only for the devices part of a multpath device (ie, multipath -c has exit code 0).

tags: added: verification-done
removed: verification-needed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-03 09:46 EDT-------
Excellent, Manju. Thanks. Marking verification-done.

tags: added: verification-needed
removed: verification-done
bugproxy (bugproxy)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.11 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-04-06 10:16 EDT-------
The verification for this bug has already been done on ...7.10.
The 7.11 update is unrelated to this bug, so remarking verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.11

---------------
multipath-tools (0.4.9-3ubuntu7.11) trusty; urgency=medium

  * debian/patches/series: add dm-multipath-backlist-nvme-5c412e47.patch to
    the series file; it was missing in the previous upload and thus the patch
    intending to fix bug 1551828 was not applied. (LP: #1551828)

multipath-tools (0.4.9-3ubuntu7.10) trusty; urgency=medium

  [ Mathieu Trudel-Lapierre ]
  * debian/patches/dm-multipath-backlist-nvme-5c412e47.patch: blacklist NVMe
    from multipath, otherwise kpartx calls will hang. This is because mpath
    works at the request level (which NVMe bypasses), so multipathing is not
    supported on NVMe. (LP: #1551828)

  [ Mauricio Faria de Oliveira ]
  * Remove partition device nodes of individual paths (for LVM on multipath)
    (LP: #1540401)
    - debian/multipath-tools.dm-mpath-lvm.udev: udev rule for that.
    - debian/initramfs/hooks: copy the udev rule and partx to the initramfs.
    - debian/initramfs/init-top: load dm-multipath module for 'multipath -c'.
    - debian/rules: install the udev rule and init-top.

 -- Mathieu Trudel-Lapierre <email address hidden> Mon, 21 Mar 2016 10:41:36 -0400

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
bugproxy (bugproxy) wrote : lvdisplay, pvdisplay and vgdisplay output from lpar

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Details about the problem and the solution

------- Comment on attachment From <email address hidden> 2016-02-03 18:59 EDT-------

Adding attachment w/ description and steps of problem and solution.

Revision history for this message
bugproxy (bugproxy) wrote : Boot log with excerpts of systemd-udevd --debug

------- Comment on attachment From <email address hidden> 2016-02-03 19:01 EDT-------

Adding an attachment (xz) of the boot log with systemd-udevd --debug running,
in order to verify the udev processing of the rules.

You may notice multipath -c is called several times only for SUBSYSTEM==BLOCK devices, and partx -d is called only for the devices part of a multpath device (ie, multipath -c has exit code 0).

Revision history for this message
bugproxy (bugproxy) wrote : [PATCH V2] booting with LVM on multipath devices

------- Comment on attachment From <email address hidden> 2016-02-04 10:20 EDT-------

Hi Canonical,

Please consider this patch V2 for review.

It includes 2 important changes:
- The '--nr 1-1024' to the partx command.
- Built on top of the patches for multipathd/initramfs (LP #1526984) and kpartx w/ spaces (LP #1432062) -- both required.

Thanks,

Revision history for this message
bugproxy (bugproxy) wrote : Patch for Xenial

Default Comment by Bridge

To post a comment you must log in.