LVM boot problem - volumes not activated after upgrade to Xenial

Bug #1573982 reported by Yavor Nikolov
130
This bug affects 22 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
curtin
Invalid
Undecided
Unassigned
lvm2 (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

Soon after upgrade to Xenial (from 15.10) the boot process got broken. I'm using LVM for /root swap and other partitions.

===
The current behaviour is:

When I boot short after the Grub login screen I'm getting log messages like:

---
Scanning for Btrfs filesystems
resume: Could not state the resume device file: '/dev/mapper/VolGroup....'
Please type in the full path...
---

Then I press ENTER, for a few minutes some errors about floppy device access are raised (for some reason it tries to scan fd0 when floppy drive is empty). And then:

---
Gave up waiting for root device. Common problems: ...
...
ALERT! UUID=xxx-xxx.... does not exist.
Dropping to a shell.
---

From the BusyBox shell I managed to recover the boot by issuing "lvm vgchange -ay", then exit and then boot continues fine (all LVM file systems are successfully mounted).

===
One workaround so far is creating /etc/initramfs-tools/scripts/local-top/lvm2-manual script doing "lvm vgchange -ay". But I'm looking for cleaner solution.

Boot used to work fine with 15.10. Actually the first boot after upgrading to Xenial actually worked OK too, I'm not sure what might changed meanwhile (I've been fixing some packages installation since mysql server upgrade has failed).

===
# lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lvm2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Uxorious (uxorious) wrote :

I'm not seeing ANY LVM volumes active on system boot.
(I'm not putting any of the necessary boot paths on LVM).

After booting the system, the volume is visible but not active.
If I put one of the drives in sftab, booting Ubuntu breaks.

Is there a workaround to make the system do "vgchange -a y" during boot?

Revision history for this message
Yavor Nikolov (yavor-nikolov) wrote :

My workaround is as I explained in the issue description: I added a script in /etc/initramfs-tools/scripts/local-top/ folder which performs `vgchange -ay`.

Revision history for this message
MatthewHawn (steamraven) wrote :

I just ran into this upgrading from 14.04. My system is a btrfs raid across two LVM Volume Groups. Both volume groups need to be activated at boot, before the "btrfs device scan". The system used to do this.

Putting a vgchange in a script in local-top fixes this.

Thanks!

Revision history for this message
MatthewHawn (steamraven) wrote :

The apparent cause seems to be lvm2 (2.02.133-1ubuntu8). From the Changelog (https://launchpad.net/ubuntu/xenial/+source/lvm2/+changelog)

lvm2 (2.02.133-1ubuntu8) xenial; urgency=medium

  * Drop debian/85-lvm2.rules. This is redundant now, VGs are already
    auto-assembled via lvmetad and 69-lvm-metad.rules. This gets rid of using
    watershed, which causes deadlocks due to blocking udev rule processing.
    (LP: #1560710)
  * debian/rules: Put back initramfs-tools script to ensure that the root and
    resume devices are activated (lvmetad is not yet running in the initrd).
  * debian/rules: Put back activation systemd generator, to assemble LVs in
    case the admin disabled lvmetad.
  * Make debian/initramfs-tools/lvm2/scripts/init-premount/lvm2 executable and
    remove spurious chmod +x Ubuntu delta in debian/rules.

 -- Martin Pitt <email address hidden> Wed, 30 Mar 2016 10:56:49 +0200

The initramfs-tools script does not activate all of the logical volumes and its detection is lacking in certain edge cases like mine.

Revision history for this message
Databay (rs-databay) wrote :

I can confirm this bug to be present also in lvm2 (2.02.133-1ubuntu10).

I got the affected system (upgraded via do-release-upgrade on 09.08.2016) back up with above mentioned workaround:

Creating /etc/initramfs-tools/scripts/local-top/lvm2 script doing "lvm vgchange -ay". And making it executable.

Shouldn't this bug get some priority since it possibly makes a remote-system inaccesible ?

Revision history for this message
eulPing (francois-jeanmougin) wrote :

I can confirm same issue here after upgrade or 14.04 to 16.04.
Note that on my system, / is not on LVM.

lvm is not initiated at boot time nor at init time and the system gave up mounting /usr (/ is not on LVM on my system). For me, this is even worst, even when / is mounted and we are supposed to be in a sort of "userland", LVM is not up.

I had to mount -- bind proc, run, sys and dev to /root/
Then lvm vgchange -ay
then mount -a
[This is required to run update-iniramfs as this script is in /usr and requires /var]
Then mount -o remount rw /
Then create a lvm2 script in local-top as described earlier [THANK YOU!]
Then update initramfs with update-initramfs -k all -u
Then sync and umount
exit the chroot
reboot

This is not an obvious process to follow, especially ending up with an undocumented script in local-top :).

Good luck all!

Revision history for this message
Lisio (lisio) wrote :

Faced with the same behavior yesterday, the only workaround for me became adding line "vgchange -ay" to /usr/share/initramfs-tools/scripts/local-top/lvm2.

Didn't change any config for a couple of months before this issue, only executed apt-get upgrade on regular basis.

However, now I get the following warnings during boot:

Sep 20 12:39:17 server systemd[1]: Started File System Check on /dev/data/data.
Sep 20 12:39:17 server systemd[1]: Mounting /data...
Sep 20 12:39:17 server systemd[1]: dev-disk-by\x2dlabel-web.device: Dev dev-disk-by\x2dlabel-web.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/virtual/block/dm-0
Sep 20 12:39:17 server systemd-fsck[840]: web: clean, 906007/134217728 files, 455468592/536870912 blocks
Sep 20 12:39:17 server systemd[1]: Started File System Check on /dev/data/web.
Sep 20 12:39:17 server systemd[1]: Mounting /web...
Sep 20 12:39:17 server systemd[1]: Mounted /data.
Sep 20 12:39:17 server kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Sep 20 12:39:17 server systemd[1]: dev-disk-by\x2dlabel-web.device: Dev dev-disk-by\x2dlabel-web.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/virtual/block/dm-0
Sep 20 12:39:17 server systemd[1]: Mounted /web.
Sep 20 12:39:17 server kernel: EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)

Revision history for this message
Benpro (benpro82) wrote :

I wonder if this is due to the use of systemd. As seen on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774082 for Debian.

Revision history for this message
Jarod (jarod42) wrote :

Last night I ran into the same problem. I upgraded from 12.04 LTS to 16.04.1 LTS Server and got stuck at boot.
The last message complained about a UUID not being present. It turned out it was the /usr FS. Doing an "lvm lvscan" from the initrd prompt showed all but one LVs inactive. The only one active was bootvg/root.
I then booted via rescue system and added "lvm vgchange -ay" in /usr/share/initramfs-tools/scripts/local-top/lvm2 right before "exit 0". After running "update-initramfs -k all -c" and rebooting the server got up again.

The bootvg is on a RAID1 disk controlled via mdadm.

mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sat Dec 20 16:49:58 2014
     Raid Level : raid1
     Array Size : 971924032 (926.90 GiB 995.25 GB)
  Used Dev Size : 971924032 (926.90 GiB 995.25 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Jan 23 09:50:47 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : XXX:1
           UUID : xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxx814e
         Events : 21001

    Number Major Minor RaidDevice State
       0 8 19 0 active sync /dev/sdb3
       2 8 3 1 active sync /dev/sda3

pvs
  PV VG Fmt Attr PSize PFree
  /dev/md1 bootvg lvm2 a-- 926.90g 148.90g

The /boot FS is on sda1/sdb1 also via RAID1
sda2 and sda2 are swap

fdisk -l /dev/sda
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 1026047 1024000 500M fd Linux raid autodetect
/dev/sda2 1026048 9414655 8388608 4G 82 Linux swap / Solaris
/dev/sda3 9414656 1953525167 1944110512 927G fd Linux raid autodetect

lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04

dpkg -l lvm2
ii lvm2 2.02.133-1ubuntu amd64 Linux Logical Volume Manager

Revision history for this message
Akshay Moghe (akshay-moghe) wrote :

Facing a similar problem on a debootstrap rootfs.

Even after ensuring that the lvm2 package is installed (and hence the initramfs scripts are present) I still get dropped to a shell in the initramfs. Running `lvchange -ay` causes the volume to show up and subsequently the bootup will succeed. I presume "fixing" the script (as described in comment-8) will fix the problem but I'd like to see a fix where I'm not forced to re-roll my own initrd.

Any pointers as to why this might be happening?

Revision history for this message
Tore Anderson (toreanderson) wrote :

I ran across the same bug. It was caused by the root filesystem being specified on the kernel command line with the root=UUID=<foo> syntax. This is not handled by the case "$dev" in stanza in activate() in /usr/share/initramfs-tools/scripts/local-top/lvm2. See attached screenshot. If I change the kernel command line to say root=/dev/vg0/root instead it works.

Revision history for this message
Chris Sanders (chris.sanders) wrote :

I've run across this today and it affects MAAS.
MAAS version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)

Configuring an LVM based drive with a raid on top of it for the root partition will trigger this. Deploying the default kernel / OS will fail due to inactive volume groups.

The fix as expected:
lvm vgchange -ay
mdadm --assemble --scan
exit

Then apply the above mentioned script to make it stick.

affects: maas (Ubuntu) → maas
Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Chris,

Can you attach the output of:

maas <user> machine get-curtin-config <systemid>

And attach the follow curtin log (you can grab that from the UI under the Installation tab).

Also, this seems an issue widely with Ubuntu.

Curtin is the one that writes this configuration, so marking this as Incomplete for MAAS and opening it in curtin.

Changed in maas:
status: New → Incomplete
Ryan Harper (raharper)
Changed in curtin:
status: New → Incomplete
Revision history for this message
Chris Sanders (chris.sanders) wrote :

The machine I was using has been redeployed without LVM. If I get a chance to redeploy I'll grab the requested logs. It's fairly trivial to trigger if you have a machine available to deploy with lvm boot as described above.

Revision history for this message
TJ (tj) wrote :

Attached is a patch (generated on 16.04) that activates volume groups when root=UUID=... is on the kernel command-line.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "activate VGs when root=UUID=" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
deehefem (deehefem) wrote :

Patch works fine for me... Kinda odd it's been two years and it hasn't been rolled into the upgrade. 70 machines I have to patch after upgrading :/

Revision history for this message
Cameron Paine (cbp) wrote :

This bug report enabled me to recover quickly from a planned upgrade (14.04 -> 16.04) that went south. FWIW I'm able to confirm that it's a live issue.

All of our critical workstations are deployed with lvs on top of md devices. Some, including the one I was upgrading, use md mirrors.

FWIW:

$ cat /proc/cmdline
root=/dev/mapper/sysvg-root ro quiet splash
$ uname -a
Linux lab-netvista 3.13.0-87-generic #133-Ubuntu SMP Tue May 24 18:33:01 UTC 2016 i686 i686 i686 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

If there's anything else I can provide to assist in resolution please let me know.

Cameron

Revision history for this message
Adam Seering (aseering) wrote :

This bug report just enabled me to recover from an upgrade to Ubuntu 18.04.1. So I can confirm that this is still an issue.

Root partition on an LVM volume; LVM physical volume on a software (mdadm) RAID.

The workaround in this comment solved the problem for me:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1573982/comments/10

Let me know if I can provide any additional useful information.

Revision history for this message
Thomas Stadler (tomina) wrote :

Could someone please describe me how to add the patch from TJ?

Revision history for this message
Steve Dodd (anarchetic) wrote :

Confused to see no movement on this bug?

The logical thing seemed to be add another case to /usr/share/initramfs-tools/scripts/local-top/lvm2 calling lvchange_activate with no parameters, but it seems that doesn't work - does activation/auto_activation_volume_list need to be set in lvm.conf perhaps?

I decided giving an explicit root=/dev/vg/lv on the command line was probably more transparent than burying a setting in lvm.conf anyway.

tags: added: id-5c51da4d8556ee2e7ae3a108
Changed in maas:
status: Incomplete → Invalid
Revision history for this message
alienn (spamme-ubuntu) wrote :

Well. Today I installed a fresh 18.04 server and just ran into this issue.
My disk setup is as following:
/dev/sda1 - bios
/dev/sda2 - /boot
/dev/sdb (LVM)
  - vg-0/Usr
  - vg-0/Home
  - vg-0/Root
  - vg-0/Swap
/dev/sdc (LVM
  - vg-1/Var

Upon reboot I get an error within initramfs that the root was not found. Executing vgchange -ay activates the volumes and I can continue with ctrl-d .
Why is this not fixed? And why is the last state "invalid"?

- Nicki

Revision history for this message
martin short (martin-sk) wrote :

I ran to the same issue yesterday with up-to-date (May 24,2019) 18.04.02 LTS. As mentioned above problem is /usr being on a separate LV.

packages:
initramfs-tools-bin 0.130ubuntu3.7
linux-image-generic 4.15.0.50.52
udev 237-3ubuntu10.21
libdevmapper1.02.1:amd64 2:1.02.145-4.1ubuntu3

It drops to the initramfs shell because it can't find the /usr.
From initramfs:

init:268

if read_fstab_entry /usr; then
        log_begin_msg "Mounting /usr file system"
        mountfs /usr
        log_end_msg
fi

scripts/local:260

local_mount_fs()

It fails because it can't find the device. Mapper didn't create any device either - there's no /dev/dm-X for that LV.

/etc/fstab entries use devices by UUID by default. There's no such UUID on the system during this time. It seems scripts can't handle it and drop to the shell.

If I change the entry to:

/dev/vg00/lv_usr /usr ext4 defaults 0 0

system is able to activate the device during the boot. But still it seems other modules (like btrfs) have precedence and it takes few seconds for system to pick up this LV.

It beats me why VG is not activated as a whole (vgchange -a y vg00), maybe there's some reasoning behind it.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I don't believe this is a curtin issue; I've marked it as Invalid for curtin. (Please do set it back to New if this is an error!)

Changed in curtin:
status: Incomplete → Invalid
Revision history for this message
Eric Desrochers (slashd) wrote :

Please see (LP: #1854981).

Revision history for this message
Eric Desrochers (slashd) wrote :

Feel free to test lvm2 in bionic-proposed (2.02.176-4.1ubuntu3.18.04.2) and provide feedback to #1854981 .

- Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

This bug was fixed in the package lvm2 - 2.02.176-4.1ubuntu3.18.04.2

---------------
lvm2 (2.02.176-4.1ubuntu3.18.04.2) bionic; urgency=medium

  * d/p/fix-auto-activation-at-boot.patch: (LP: #1854981)
    Allow LV auto-activation (e.g. /usr on it's separate LV)
---------------

Changed in lvm2 (Ubuntu):
status: Confirmed → Fix Released
Changed in lvm2 (Ubuntu Bionic):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.