Change to mount sequence order breaks persistence on casper-rw partitions

Bug #1489855 reported by Thomas Weissel
128
This bug affects 23 people
Affects Status Importance Assigned to Milestone
casper (Ubuntu)
Fix Released
Undecided
Michael Hudson-Doyle

Bug Description

the system boots fine when using a casper-rw FILE but drops to a busybox when using a partition

the short log would be:
___________________________________________________
Begin: Running /scripts/casper-premount ... done.
done.
umount: can't umount /cdrom: Device or resource busy
Warning: Unable to find the persistent home medium
umount: can't umount /cdrom: Device or resource busy
Warning: Impossible to include the casper-sn Snapshot
umount: can't umount /cdrom: Device or resource busy
Warning: Impossible to include the home-sn Snapshot
done.
___________________________________________________

removing the "persistence" keyword from the syslinux.cfg works and the live usb drive boots just fine.
using a persistence file instead of the partition and the usb drive boots just fine.

i found older bug reports concerning the same problem but the fix proposed in 2010 is already integrated into the script "casper".

so i started the flashdrive with the casper debug= option and i see the following (after probing several other devices it finally finds the right device and partition (sdb3)

___________________________________________________

+ cow_backing_mp=/cdrom
+ [ -e /cdrom/casper-rw ]
+ umount /cdrom
+ sys2dev /sys/block/sdb/sdb3
+ sysdev=/block/sdb/sdb3
+ udevadm info -q name -p /block/sdb/sdb3
+ echo /dev/sdb3
+ devname=/dev/sdb3
+ /sbin/blkid -s LABEL -o value /dev/sdb3
+ [ casper-rw = casper-rw ]
+ echo /dev/sdb3
+ return
+ cowprobe=/dev/sdb3
+ [ -b /dev/sdb3 ]
+ cowdevice=/dev/sdb3
+ get_fstype /dev/sdb3
+ local FSTYPE
+ local FSSIZE
+ fstype
+ eval FSTYPE=ext4 FSSIZE=8458862592
+ FSTYPE=ext4 FSSIZE=8458862592
+ [ ext4 != unknown ]
+ echo ext4
+ return 0
+ cow_fstype=ext4
+ cow_mountopt=rw,noatime
+ mount -t ext4 -o rw,noatime /dev/sdb3 /cow
+ [ ! -d /cow/upper ]
+ mkdir -p /cow/upper
+ continue
+ continue
+ mkdir -p /cow/work
+ [ -f /cow/format ]
+ [ DEFAULT = DEFAULT ]
+ modprobe -q -b overlay
+ grep -q ^overlay$
+ cut -f2 /proc/filesystems
+ UNIONFS=overlay
+ break
___________________________________________________

this looks fine to me.. it looks like it recognizes everything .. it's ext4 .. label casper-rw.. it's mounting it...

a little bit further down in the loooong log file it states the following:

______________________________________________

+ cow_backing_mp=/home-rw-backing
+ [ -e /home-rw-backing/home-rw ]
+ umount /home-rw-backing
+ sys2dev /sys/block/sdb/sdb3
+ sysdev=/block/sdb/sdb3
+ udevadm info -q name -p /block/sdb/sdb3
+ echo /dev/sdb3
+ devname=/dev/sdb3
+ /sbin/blkid -s LABEL -o value /dev/sdb3
+ [ casper-rw = home-rw ]
+ get_fstype /dev/sdb3
+ local FSTYPE
+ local FSSIZE
+ fstype
+ eval FSTYPE=ext4 FSSIZE=8458862592
+ FSTYPE=ext4 FSSIZE=8458862592
+ [ ext4 != unknown ]
+ echo ext4
+ return 0
+ [ ext4 = vfat ]
+ homecow=
+ [ -b ]
+ [ n != y ]
+ log_warning_msg Unable to find the persistent home medium
+ _log_msg Warning: Unable to find the persistent home medium\n
+ [ n = y ]
+ printf Warning: Unable to find the persistent home medium\n
Warning: Unable to find the persistent home medium
__________________________________________________________

a warning about the home medium is shown in both cases (file and partition) but the persistence file works never the less..

my system:

kubuntu linux 15.10 beta1

uname -a
Linux wald 4.1.0-3-generic #3-Ubuntu SMP Tue Jul 28 12:25:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Related branches

Revision history for this message
JoseStefan (josestefan) wrote :

Did you ever resolve this?

I think I have the same issue with the final release of Ubuntu 15.10
ece816e12f97018fa3d4974b5fd27337 *ubuntu-15.10-desktop-amd64.iso

Here's what I've done so far, I've combined concepts of different guides. I'm starting from a Windows 10 environment.
1) I used diskpart on the windows command line to wipe the partition table off the usb drive, As window's "Disk Management" can be restrictive (next step). Any good partition tool would do.
2) Created a 1.2 GB fat32 partition, 4096 cluster size. The rest of the usb is left unpartitioned. I determined the size by trial and error, originally I used a guide that suggest installing on the whole usb and resizing/shrinking, I found this to be easier.
3) Used the tool at pendrive linux. Added about 70mb of persistence.

Until this point everything works as expected. It Boots to the live environment, it has persistence and I'm using only a portion of the USB.

4) I created two ext4 partitions. My USB drive is 16GB, and wanted to have more control of what would be in and out of casper-rw So the first one is labeled "casper-rw" and the second one is labeled "data".

At this point I can reboot, and everything is just like before, the system will continue to use the casper FILE, and not take advantage of the PARTITION. Both partitions are automatically mounted on /media/ubuntu

It's when I delete the casper-rw file that the problem happens. After a few failed attempts, I decided to move it out instead. I'm using windows for this, and it sits on my desktop while I test. I can then revert successfully.

Summary:
When I boot without the casper-rw FILE, I get a busybox, even though I have the casper-rw PARTITION created.

I expect it to automatically revert to using the partition.

Revision history for this message
JoseStefan (josestefan) wrote :

I repeated my same steps with 15.04 and the casper-rw PARTITION works.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in casper (Ubuntu):
status: New → Confirmed
Revision history for this message
syscon-hh (syscon-kono) wrote :

My interim solution to use '--- persistent' on a separate partition:

1. *rename* the partition 'casper-rw' into *system*

2. add a file 'casper-rw' of *0 Byte' into first/main *vfat* partition

3. add a new partition *home-rw* (optional)

4. on UEFI-computers add *persistent* to the first menuentry of the */boot/grub/grub.cfg*

Now start the usb-device up to *busybox* appears, then type to prompt:

1. blkid -> identify partition 'system' -> exemplary -> '/dev/sdb2'

2. mount /dev/sdb2 /cow -> ignore messages

3. exit

The live session will start and save all changes made to the live session - have fun.

What I'm locking for will be a solution such as integrated entry into the preseed-files or ???

Revision history for this message
erasmo52 (erasmo-is) wrote :

Hi, i experienced the same problem. I tried also to install the new kubuntu 16.04 (xenial) and it have the same bug.
The interim way proposed by syscon-hh worked for me too, but we need to find a better normal way to fix the problem.
Probably there must be something to modify in one of the scripts Handling the boot process at the init-top level.
There is somebody out there who can study the case and propose a solution?
Where is the kubuntu staff? Why this bug still unassigned to somebody.
I am trying to study the case, but .... it would be better if somebody more skilled than me could do it.
Anyway thanks to everybody.

Revision history for this message
Andrew Tapia (catgaming) wrote :

This also appears to be a problem with the final release of Lubuntu 15.10. Taking (basically) the same steps as JoseStefan yields the same results.

Revision history for this message
Pelle Johnsen (pelle-johnsen) wrote :

Ran into the same issue with Ubuntu 16.04 beta, trying syscon-hh's workaround.

Revision history for this message
Pelle Johnsen (pelle-johnsen) wrote :

The workaround also works for me .. but it is very tedious to have to go through these steps on every boot. I have tried getting the casper teams attention here: https://answers.launchpad.net/ubuntu/+source/casper/+question/289574

Revision history for this message
JoseStefan (josestefan) wrote :

Note that this used to work on 15.04 and is broken for 15.10 and seems like 16.04 too.
So maybe someone can diff the changes to see what part of the code is likely to cause this?

Even though the original bug report is for Kubuntu, I experienced the bug with regular Ubuntu. Don't know if the bug needs to be updated to reflect that. Also don't know if all the other flavors would be affected.

Revision history for this message
JoseStefan (josestefan) wrote :

Tested with the 14.04.4 LTS (trusty), and also got a busy box.
807fa1f246b719d28d0b362fd2f31855 *ubuntu-14.04.4-desktop-amd64.iso

Compared the change logs, still haven't looked at the source (programming not my thing). They have this change in common:
casper (1.340.2) trusty; urgency=low
casper (1.363) wily; urgency=low

* scripts/casper: migrate existing persistent disk images to the updated upper/work form. Record the actual format used and ensure we use the existing format and abort if not possible. (LP: #1481117)

https://bugs.launchpad.net/bugs/1481117

I guess my next test is to find the older Ubuntu 14.04.3 iso and test with that, as it doesn't have that change implemented, maybe as far back as 14.04.2

Revision history for this message
JoseStefan (josestefan) wrote :

While performing the steps I indicated before:
* Ubuntu 14.04.3 failed (busybox)
* Ubuntu 14.04.2 worked as expected (df confirms correct disk space)

so in theory it was working with
casper (1.340)

and broken with
casper (1.340.1) trusty; urgency=low

but 15.04 (vivid) is working with:
casper (1.360) vivid; urgency=medium
which already implements (1.340.1) as (1.347 and 1.348)

So I don't know what to make of that.

Revision history for this message
Pelle Johnsen (pelle-johnsen) wrote :

I also tested with elementary os 0.3.2 (which is 14.04 based). This was working fine.

Revision history for this message
Max (max1975) wrote :

Hi to all,

I actually found the relevant difference between a working 15.04 and a not working 16.04. In the function setup_unionfs () in scripts/casper (located in the ramdisk) the sequence of mounting changed from root partition then persistent partition (15.04) to 1st persistent then root partition (16.04) for whatever reason. If you change back (simple cut and paste the section) to the sequence of 15.04 and build then a new ramdisk you can run 16.04 with a persistent partition.

br max

Revision history for this message
Thomas Weissel (xapient) wrote :

seriously?

could you please post a more precise description (line numbers from > to)
i just edited the file

/usr/share/initramfs-tools/scripts/casper

then executed

sudo update-initramfs -u

and now i'm rebuilding my life system (this takes a while)

hopefully i interchanged the right lines.. (i only had a 14.04 casper file as reference and the differences are quite big)

Revision history for this message
Thomas Weissel (xapient) wrote :

the patched setup_unionfs() function

Revision history for this message
Max (max1975) wrote :

Apparently you already figured it out by yourself. Have fun.

Joel Ong (joel-ong)
summary: - kubuntu 15.10 beta1 live usb drops to busybox with persistence PARTITION
+ Change to mount sequence order breaks persistence
summary: - Change to mount sequence order breaks persistence
+ Change to mount sequence order breaks persistence on casper-rw
+ partitions
Revision history for this message
Thomas Weissel (valueerror) wrote :

ok... i ran into this bug AGAIN and i was dumb enough to think that this would definitely be resolved by now (since the bugfix is already provided here) i lost hours with debugging until i finally realized that this bug is still there..

could you please fix this ? it's 2 minutes work ... copy and paste ! and casper is not usable in persistent mode.. this bug is critical !

Revision history for this message
Vecdi Burak Bengi (burakbengi) wrote :

please fix this bug with already! proposed fix. this bug is (let's say it one more time) CRITICAL!

Revision history for this message
g (garethic) wrote :

Does this mean I can, say, take the squashfs from a broken Kubuntu 16 live usb & replace the aquashfs of a working kubuntu 14 usb? Or will that break everything else in sight?
Because, I am NFI on how/where to find Casper.

Molto congrats on finding the solution!

Revision history for this message
Thomas Weissel (xapient) wrote :

well.. no.. if you change the squashfs file you replace the whole system.. you will then have ubuntu 14.04 and NOT 16.04

just use the information from post #14 (the path to the file in question) and replace the faulty version of the function setupunionfs() with the one i attached to post #15

glhf

Revision history for this message
Pankaj Mohan (proaudience) wrote :

There is a workaround for avoiding this bug when creating such pen-drives. You should create three partitions : 1) A fat32 formatted 350MB for storing 'boot' and 'EFI' folders, 2) An ext4 formatted 2GB for storing the rest of Ubuntu's ISO content & 3) An ext4 formatted and labelled as 'casper-rw' in whatever size remains available for persistence. Finally enter 'set root=(hd0,2)' (without single quotes) in grub.cfg and the drive starts working as per your wishes.

One could do the same even with two partitions, provided they were ready to use the ISO file itself instead of its extracted content. I've described all this in my blog post linked below. Remember, this is a layman's perspective, and written by gathering information through a trial and error approach, so kindly don't expect any technical explanations for whatever I've been able to list down there...

http://proaudience.com/2017/09/creating-ubuntu-pen-drives-with-persistence/

Revision history for this message
EvilSupahFly (seann-giffin) wrote :

On ext4, the journal is written to the same location over and over again, which can have a seriously negative impact on the lifespan of the flash memory. If you want your USB stick to last longer, consider using ext2 instead because it doesn't use journaling. Otherwise, well done Pankaj!

Revision history for this message
JoseStefan (josestefan) wrote :

I've been tracking this bug for a few years now. For a bug of such HEAT shouldn't it by now at least have an importance other than UNDECIDED? And shouldn't the bug have enough information to be TRIAGED instead of CONFIRMED. It just seems this bug is not moving forward and just going to sit in it's current state forever.

For every new Ubuntu release, I'm afraid to invest more time trying to get persistence to work, because I've already dedicated much time to this subject in the past, and have personally failed. Instead I come back to his bug to check the status before even trying. So I'm not sure of the status of the bug for current releases of Ubuntu.

I'm creating these USB sticks from Windows. There are some workaround posted here, like Pankaj's. But those steps require that you already be booted on Ubuntu to begin with. So that would at least require creating 2 USB sticks if you are starting from Windows. And like I said, I don't want to invest more time and resources on the subject than I have already. Would be nice if the Windows tools were updated with his observations and created an alternate persistence structure. But that's another subject.

Revision history for this message
Thomas Weissel (xapient) wrote :

this is really disappointing...

   mount: mounting /cow on /root falied: invalid argument overlay mount failed

i just changed the lines in /usr/share/initramfs-tools/scripts/casper and the live system booted without any problems...

could you please finally fix this bug !!

@seann-giffin i thought the same thing and therefore changed to ext2 - unfortunately it happens a lot that the flashdrives were removed in an unfortunate way ... we had a lot of filesystem errors... it seemed that ext4 was more healthy on the long run

Revision history for this message
DC-THINK (libratwo) wrote :

look into the code, u can also see the fault posted debug log above:

when using a USB HDD(/dev/sdb) partition /cdrom(/dev/sdb1 vfat) casper-rw(/dev/sdb2 ext2)

1. mount /dev/sdb1 /cdrom (/casper is exist(will load filesystem.squashfs lately))
2. find casper-rw(file and partition in same time(cause bug))
   casper.setup_unionfs-> casper-helper.find_cow_device
   for each /dev/sdb*
      a. find file
          a.1 /dev/sdb* not mounted(mount /dev/sdb* to /casper-rw-backing)
          a.2 /dev/sdb1 mounted(remount -rw on /cdrom)
          then check the casper-rw file's existence
          ***after check(not exist), it will umount the /dev/sdb*, for /cdrom will be umounted, here is bug ***
      b. find partition just check LABEL = casper-rw
    in order of /dev/sdb1 /dev/sdb2, so /cdrom be umounted

workaround may be:
  1. put casper-rw(/dev/sdb1) before /cdrom(/dev/sdb2) partition
  2. or modify find_cow_device, 1st find partiton, 2nd find file(don't do in the same time)

Revision history for this message
Akeo (pbatard) wrote :

This is a pretty serious and rather obvious bug, once you understand what's going on.

As pointed out by @DC-THINK, whom I will mostly be paraphrasing here, the gist of it is: /usr/share/initramfs-tools/scripts/casper-helpers may unmount a previously mounted device, that it should *NOT* leave unmounted on exit.

The following is an alternate description of what happens, so that it may helps dev assess the seriousness of the issue:

For this example, I will assume that you have extracted the installation media on a vfat image (say /dev/sdb1). It actually doesn't matter if you actually have a persistent partition or not, as it will fail even with a single vfat partition (which I tested with GPT/FAT32 single partition drive and ubuntu-19.04-desktop-amd64.iso after adding 'persistent' to the "Try Ubuntu without installing" in grub.cfg:

1. /dev/sdb1 is *ALREADY mounted (as /cdrom) when we enter find_cow_device(), as it was mounted during the init process.
2. As we are processing all (non floppy) block devices, we start processing /dev/sdb*, and therefore start to look at /dev/sdb1.
3. Because we are processing the vfat partition, we don't find label 'casper-rw', so we proceed to look for a 'casper-rw' file.
4. To look for that file, the first thing that the script issues is 'try_mount' which succeeds at remounting /dev/sdb1 "rw".
5. We now look for a 'casper-rw' on the newly mounted /dev/sdb1, and don't find it, since it doesn't exist.
6. [HERE IS THE BUG] /dev/sdb1 is now **UNCONDITIONALLY** unmounted... instead of being remounted to the mountpoint it was using (/cdrom) when we entered the function call.
7. Because /cdrom has become unavailable, all kind of bad things happen, starting with the casper script complaining...

In other words, the bug is: find_cow_device() can and DOES unmount legitimate devices it has no business of unmounting.

Ergo, find_cow_device() must be fixed, possibly by keeping history of already mounted devices in try_mount() and using a new restore_mount() call instead of the unconditional umount currently used.

Alternatively, try_mount device should not degrade write access (i.e. it may do ro -> rw but not rw -> ro) and optionally return the existing mountpoint of an already mounted devices, so that find_cow_device() can determine if it should umount the device or not.

Hope this helps. Note that this bug is indeed very problematic for Windows users, and will become even more so as the current Windows recommended tool for Ubuntu installtion media (Rufus) is about to introduce persistent partition [Disclaimer: I am the author of Rufus].

tags: added: id-5cfac2e9c17e1d85a84198b8
Revision history for this message
JoseStefan (josestefan) wrote :

I gave up on this bug. My workaround is to install a full system onto a USB. 2 USBs required for the setup.

I create the boot media as per general instructions. Last time I used Rufus.

And I set my target to 2nd empty USB, using the custom partition options.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

There's a lot going on in this bug but I've just uploaded a change to casper that fixes at least something in the right area. With this change, I can run this script on an ISO:

#!/bin/bash
ISO=$1

truncate -s 1G $ISO
maxend=$(sfdisk $ISO -l -q -o end | tail -n +2 | sort -n | tail -n1)
start=$(((maxend + 1 + 0xfff) & ~0xfff))
echo "start=$start" | sfdisk $ISO -a -q
dev=$(sudo losetup -Pf --show $ISO)
sudo mkfs -L casper-rw -t ext4 ${dev}p3
sudo losetup -d $dev

And then boot it attached to a VM with "persistent" on the command line and persistence seems to work as far as I can tell. What else is broken? :)

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.413

---------------
casper (1.413) eoan; urgency=medium

  * Fix ftbfs by restoring empty conf.d missing from git import.

 -- Michael Hudson-Doyle <email address hidden> Tue, 16 Jul 2019 13:40:33 +1200

Changed in casper (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Akeo (pbatard) wrote :

Thanks for looking into this.

I've been trying to validate the above fix, but I'm still seeing the same issue. In other words, Ubuntu Live still seems to bail out to the busybox console as soon as you use add 'persistent' to the Kernel options.

Here's what I did:
- Extracted the content of http://cdimage.ubuntu.com/daily-live/current/eoan-desktop-amd64.iso to a FAT32 partition on an USB Flash drive
- Added a 4 GB ext4 casper-rw partition to same drive (mkfs -L casper-rw -t ext4
 /dev/sda2)
- Because the current image is dated 2019.07.15, which may be older than your fix, I extracted casper/filesystem.squashfs from the USB media with unsquashfs to a temporary ./squashfs-root/
- I picked up http://archive.ubuntu.com/ubuntu/pool/main/c/casper/casper_1.413.tar.xz, and replaced ./squashfs-root/usr/share/initramfs-tools/scripts/ with the content of ./casper/scripts/ from that archive.
- For good measure I also replaced the 4 files in ./squashfs-root/usr/share/casper/ with their latest version from ./casper/bin/ on the archive.
- I then recreated the casper/filesystem.squashfs image with mksquashfs squashfs-root/ /mnt/usb_media/casper/filesystem.squashfs -noappend -always-use-fragments
- I added 'persistent' to the kernel options in grub.cfg then tried to boot the media on a UEFI system

The end result was still a boot failure with the message "mount: mounting /cow on /root failed: Invalid argument", and I'm pretty confident the same will hold true (without having to go through the whole squashfs update) with the next Live ISOs that is generated on http://cdimage.ubuntu.com/daily-live/current/, so I don't believe the issue has been properly fixed...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1489855] Re: Change to mount sequence order breaks persistence on casper-rw partitions
Download full text (3.3 KiB)

On Thu, 18 Jul 2019 at 23:40, Akeo <email address hidden> wrote:

> Thanks for looking into this.
>
> I've been trying to validate the above fix, but I'm still seeing the
> same issue. In other words, Ubuntu Live still seems to bail out to the
> busybox console as soon as you use add 'persistent' to the Kernel
> options.
>
> Here's what I did:
> - Extracted the content of
> http://cdimage.ubuntu.com/daily-live/current/eoan-desktop-amd64.iso to a
> FAT32 partition on an USB Flash drive
> - Added a 4 GB ext4 casper-rw partition to same drive (mkfs -L casper-rw
> -t ext4
> /dev/sda2)
> - Because the current image is dated 2019.07.15, which may be older than
> your fix, I extracted casper/filesystem.squashfs from the USB media with
> unsquashfs to a temporary ./squashfs-root/
> - I picked up
> http://archive.ubuntu.com/ubuntu/pool/main/c/casper/casper_1.413.tar.xz,
> and replaced ./squashfs-root/usr/share/initramfs-tools/scripts/ with the
> content of ./casper/scripts/ from that archive.
> - For good measure I also replaced the 4 files in
> ./squashfs-root/usr/share/casper/ with their latest version from
> ./casper/bin/ on the archive.
> - I then recreated the casper/filesystem.squashfs image with mksquashfs
> squashfs-root/ /mnt/usb_media/casper/filesystem.squashfs -noappend
> -always-use-fragments
>

This doesn't update the files where they matter though: the ones that
matter are in the initrd (casper/initrd in this case). So you'd need to
unpack that (it's multiple cpio archives concatenated together, although
for testing this we can only care about the last one), overwrite the casper
stuff in there and pack it up again.

Something like:

mkdir /tmp/initrd
cd /tmp/initrd
(cpio -t; cpio -t; lz4cat | cpio -i ) < /mnt/casper/initrd
cp /path/to/new/casper scripts/casper # etc
find . | LC_ALL=C sort | cpio -R 0:0 -o -H newc | gzip > ../initrd.new
cp ../initrd.new /mnt/casper/initrd

(this is all extremely obscure, yes)

- I added 'persistent' to the kernel options in grub.cfg then tried to boot
> the media on a UEFI system
>

> The end result was still a boot failure with the message "mount:
> mounting /cow on /root failed: Invalid argument", and I'm pretty
> confident the same will hold true (without having to go through the
> whole squashfs update) with the next Live ISOs that is generated on
> http://cdimage.ubuntu.com/daily-live/current/, so I don't believe the
> issue has been properly fixed...
>

All the above said, I didn't do anything to fix the problem you've hit here.

I don't understand why find_cow_device is written the way that it is. I
guess it must pre-date us having udev in the initrd because you can check
for a filesystem with the right label now by looking in /dev/disk/by-label
now. And when looking at vfat partitions that might or might not have a
casper-rw file, if it finds that the partition is mounted it could check
for the existence of the backing file _before_ it does mount games that it
then might have to undo. I'll see if I can rewrite it :)

Cheers,
mwh

--
> You received this bug notification because you are a member of Ubuntu
> Installer Team, which is subscribed to casper in Ubuntu.
> https://bugs.launchpad.net/bugs/148985...

Read more...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

OK, that wasn't so bad: https://code.launchpad.net/~mwhudson/ubuntu/+source/casper/+git/casper/+merge/370351

It seems to work for your test case, my test case (where you dd the image then create a partition), and even for the case where you create a file called casper-rw containing an ext4 filesystem in a vfat partition.

I've made a new initrd with this change in at https://people.canonical.com/~mwh/initrd.new although it's 60 odd megs so it might be easier to follow my instructions above to repack the initrd for yourself.

Changed in casper (Ubuntu):
status: Fix Released → In Progress
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Revision history for this message
Akeo (pbatard) wrote :

> the ones that matter are in the initrd (casper/initrd in this case).

D'oh! Of course they are... The way I was trying to test your changes was indeed pointless.

> OK, that wasn't so bad: https://code.launchpad.net/~mwhudson/ubuntu/+source/casper/+git/casper/+merge/370351

Awesome.

I just tested your changes, by simply updating scripts/casper-helpers from the 2018.07.15 eoan-desktop-amd64.iso in http://cdimage.ubuntu.com/daily-live/current/ with your proposed ones (the instructions on how to recreate initrd were very helpful btw), and I too can confirm that the case I have been testing (vfat partition with copy of the ISO content + casper-rw partition) is fixed. Persistent partitions now seem to work beautifully with Ubuntu Live. Great work!!

Btw, I should point out that, the reason I've been pushing for this issue to be fixed is that I am the developer of Rufus, which (currently [1]) is the application Ubuntu recommends when creating a bootable Ubuntu USB Flash Drive on Windows. And because this has been requested by a quite few users, I very recently added the ability to create persistent partitions. However, this bug made it a bit vexing to not be able to properly use that feature with Ubuntu (which is why, at the moment, Rufus does not automatically add 'persistent' to the kernel options when creating the drive -- I'm planning to add that once Eoan is out). So I really appreciate the effort spent on fixing this. Many, many thanks!

/Pete

[1] https://tutorials.ubuntu.com/tutorial/tutorial-create-a-usb-stick-on-windows#1

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah great, thanks for testing! I've uploaded the fix to eoan just now. I guess after some testing we should think about backporting to bionic (although it's too late to try to get it into 18.04.3 now)

And it's interesting to hear why you're interest. The reason I am interested in this sort of thing is that I'm thinking about automatically creating a casper-rw partition if there is space for it and using it to store the installer logs even if the user hasn't selected persistence. (Because currently for the server installer there's no easy way for the user to get the logs off the live system in the case of an install failure).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.414

---------------
casper (1.414) eoan; urgency=medium

  * Use udev-created symlinks to find filesystems by label.
  * Fix find_cow_device and find_files to not unmount filesystems that were
    already mounted (LP: #1489855)

 -- Michael Hudson-Doyle <email address hidden> Mon, 22 Jul 2019 10:29:08 +1200

Changed in casper (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Piotr Martycz (pmartycz) wrote :

Hi, is this change effective for other Ubuntu flavors?

I tried booting Xubuntu 19.10 prepared with Rufus but despite persistence being enabled the casper-rw remained unmounted in live system. The same thing works with offical Ubuntu 19.10 image.

Revision history for this message
xieliwei (xieliwei) wrote :

Since the patch didn't make it in time for 18.04.3 but 18.04 is the latest LTS release available before 20.04, I'd wager there'd be at least a few others looking to create a persistent LTS USB drive facing this issue. Also, I spent a few hours figuring this out so hopefully this saves someone some time.

I've made a patched version of the "casper/initrd" file that one can simply replace with in an already-created USB drive. Its SHA1 is d70c6189ae6497422b704219c3f8dd1a8fba6fe6.

*Only works from bootable drives made from Ubuntu 18.04.3 ISO!*
https://drive.google.com/file/d/1X25ZafWhUn9ZpDNS1QHWmCaDS7Fl9sOT

If you do not trust a random binary, here are the steps required to reproduce it:

1. Extract "casper/initrd" from the ISO or USB drive to a temporary directory
(You can use unmkinitramfs but let's extract manually.)
(Based on mwhudson's incantation a few comments up, but for LZMA used in 18.04.)

$ cd /path/to/temp/dir; (cpio -t; cpio -t; lzcat | cpio -i ) < /path/to/casper/initrd

2. (Optional) Apply 0e52b46 so that the actual patch won't be offset
(This patch is not relevant for our case but precedes the patch we want to apply. It does fix a bug though!)
(Ignore the error about not finding "debian/changelog", it is only included in the casper package and not the initrd.)

$ wget -qO - https://git.launchpad.net/~mwhudson/ubuntu/+source/casper/patch/?id=0e52b46ea2a0eff48e0e8b26cb6a7174989ceb27 | patch -tp1

3. Apply a452b0b, which is what we want
(Again, ignore the error about not finding "debian/changelog", it is only included in the casper package and not the initrd.)

$ wget -qO - https://git.launchpad.net/~mwhudson/ubuntu/+source/casper/patch/?id=a452b0b0d5874ae48ee7ea742e500e48ffd8d0a4 | patch -tp1

4. Repackage into LZMA file
(Again based on mwhudson's incantation a few comments up, though gzip should probably work fine too.)

$ find . | LC_ALL=C sort | cpio -R 0:0 -o -H newc | lzma -c > /tmp/initrd.new.lz

[Here, you should be able to use the new initrd for almost all cases by replacing the one on your USB drive. However, the original initrd has microcode embedded before the actual initrd, so it will be nice if we can restore that.]

5. Here it gets tricky as we try to extract the microcode segment
(If you have binwalk installed, here's a command to automatically do this for you)

$ TGT_FILE=/path/to/casper/initrd; dd if=$TGT_FILE of=/tmp/initrd.microcode bs=`binwalk $TGT_FILE | grep LZMA | head -1 | cut -d ' ' -f1` count=1

(Else, you'll have to trust me when I tell you the segment spans from 0-2441216 for the official 18.04.3 ISO)

$ dd if=/path/to/casper/initrd of=/tmp/initrd.microcode bs=2441216 count=1

6. Put the files together to get the initrd that's identical in structure to the original

$ cat /tmp/initrd.microcode /tmp/initrd.new.lz > /tmp/initrd

[You should now be able to replace the initrd in the casper directory on your USB drive with the newly-built one in /tmp]

HTH

Revision history for this message
sudodus (nio-wiklund) wrote :

I hoped that you (the developers) had upgraded casper in 18.04.4 LTS. But if it is too late for that, I ask you to prepare the the upgrade well in time for 18.04.5 LTS.

I am willing to test that the iso file works correctly, both 'as usual' and to create persistent live drives using the new feature (like in 19.10 and Focal Fossa).

Revision history for this message
Akeo (pbatard) wrote :

Please note that this appears to be broken again in Ubuntu 20.04!

Could the Ubuntu maintainers please treat this bug, which have been very negatively affecting scores of Ubuntu users a bit more seriously?

By the looks of it, the plight of 18.04 LTS users will continue in 20.04 when it comes to being able to use persistent partitions that reside on the same drive, and the inability for Ubuntu to permanently fix an issue that has already been affecting users for YEARS is going to become all the more damaging.

PLEASE, PLEASE, PLEASE, understand that this issue is VERY NEGATIVELY affecting Ubuntu users! Just perform a search for "mounting /cow on /root failed" on any site such as reddit or superuser or askubuntu, if you need some convincing that this is a MAJOR BUG that needs to be properly fixed.

For crying out loud, do not let this bug, that was fixed in 19.10, resurface its ugly head again!

Revision history for this message
Akeo (pbatard) wrote :

After some additional tests, it seems I was a bit too hasty to declare that there exists a regression, as, even though a similar error is displayed as the one from the original bug, the persistent partition appears to be mounted regardless.

For the record however, `/var/log/boot.log` displays the following error while mounting the persistent partition:

--------------------------------------------------------
ln: /tmp/mountroot-fail-hooks.d//scripts/init-premount/lvm2: No such file or directory
mount: mounting /cow on /root/cow failed: No such file or directory
adduser: The user `ubuntu' already exists.
[FAILED] Failed unmounting /cdrom.
--------------------------------------------------------

Also, the introduction of `/casper/vmlinuz$casper_flavour` in `grub.cfg` does add a new hurdle to utilities, such as Rufus, that attempt to add the required 'persistent' keyword to the kernel options, because, of course, the new `$casper_flavour` variable makes it even more difficult to insert the keyword in a location that is not going to cause trouble.

I sure wish persistence on Linux didn't require developers of boot utilities to treat every single release as its own special case, and/or end up to the boot process to throwing error messages that seem to indicate that there exists and issue.

As such, I would strongly encourage to test at least UEFI boot in the following manner:
- One FAT32 partition where the whole content of the Ubuntu ISO has been extracted (and grub.cfg patched to enabled persistence)
- One ext3 or ext4 casper-rw partition following the FAT32 partition (since this has been the longtime recommended way of enabling persistence for Ubuntu)

The above is a a very reasonable way to expect persistence to be achieved for 20.04 LTS, so if it does throw errors, as it currently appears to do, I would assert that there are still some improvements that could be made.

Revision history for this message
Akeo (pbatard) wrote :

Actually, no, the success I got was actually a fluke. This is currently broken again in 20.04, as per https://bugs.launchpad.net/ubuntu/+source/casper/+bug/1863672.

Revision history for this message
C.S.Cameron (cscameron) wrote :

I added a persistent partition to a UNetbootin focal-desktop-amd64-20200325 install and it works okay. Just like pre-14.04.

Revision history for this message
C.S.Cameron (cscameron) wrote :

Making a Grub2 booter that uses Persistent partitions is not a problem. Start with a 1MB grub2 core.img partition flagged bios_grub. Add a 250MB FAT32 EFI partition flagged boot,esp. next add an ext4 partition large enough for the Ubuntu ISO's contents and finish with a ext4 casper-rw partition and a NTFS data partition if desired. Copy the ISO's contents to the root partition and recopy boot and EFI folders to to the EFI partition. Mount the EFI partition to mnt and Install grub. Add set root=(hd0,3) to grub.cfg. Add " persistent" after ---. I think Grub2 does not like a FAT32 root. I believe the above works on all versions of Ubuntu since 12.04. Bootloading is based on mkusb by Sudodus.

Revision history for this message
Akeo (pbatard) wrote :

> Making a Grub2 booter that uses Persistent partitions is not a problem [if you create 4 partitions in a very specific way and with this specific file system for the content extracted from the ISO]

I hope you can see the issue with the above because then I could add:

As demonstrated above, making a Grub2 booter that uses Persistent partitions *is* a problem if you create partitions in a different way with only 2 partitions, and without being tied to a specific file system for the ISO extracted content.

You can't just declare that a bug should be minimized, because there is a (rather complex, especially for Windows users, who can't easily copy data to ext file systems) way to work around it. On its own, the presence of a workaround does not invalidate the potential severity of a bug.

> I think Grub2 does not like a FAT32 root.

That's not it. The cause of the bug is explained above, and is entirely self-contained within the Ubuntu casper scripts. It's a big unfortunate that, in order to try to make a point at https://askubuntu.com/questions/1226318 and deflect your erroneous initial assertion, you seem not to have properly read the data that is provided to you about the exact cause and natureof this bug, which has nothing to do with GRUB.

The root of the issue, which I took great pains to detail and which the person who fixed the bug corroborated, is that in some circumstances, which we expect to the the ones that most people trying to create a persistent drive would match (because 2 partitions, one with the ISO content, other one for 'casper-rw', is the *simplest* and *most straightforward* way of creating a persistent media manually) casper scripts will unmount the boot partition and fail to remount it.

That's the only issue, which is a rather major and unfortunate problem and one, I will posit, that was left unaddressed for years on account that people like you seem to have been just happy to tell users affected by it that they should go through non-straightforward workarounds, like the one you describe, instead of trying to help raise this bug's visibility and priority so that this rather major issue got fixed.

Revision history for this message
C.S.Cameron (cscameron) wrote :

So is there still a bug with 20.04 as you suggest? If the bug was only squashed with 19.10 are you going to give up on persistence for Rufus and just wait for the bug to get fixed? Perhaps another five years. Seems better to me to work with what we have rather that wait for a perfect world.

Revision history for this message
Akeo (pbatard) wrote :
Download full text (3.1 KiB)

> So is there still a bug with 20.04 as you suggest?

At the time I posted my comment, yes there absolutely was.

A regression had been introduced in the daily 20.04 builds, that made persistence consistently fail on some machines. This was confirmed by other people too.

Now, since I added my reports onto it, the most problematic aspect of this issue has now been fixed, though there still exist a non-breaking corollary issue (introduction of a new 'writable' name for persistent partitions) that a few of us would like to see addressed.

It's all in the bug report(s) I linked to.

> If the bug was only squashed with 19.10 are you going to give up on persistence for Rufus and just wait for the bug to get fixed?

That was a consideration I had, since, at the time, it looked to me like Ubuntu made little effort to backport the bugfix into 18.04 (which, I will assert, they should have if they actually cared about their LTS users) and had suddenly broken persistence for 20.04 again. But with 20.04 being somewhat usable for persistence again, the only major issue that persists is that 18.04 hasn't been fixed, which is the precise reason we are having this "discussion".

> Perhaps another five years.

And that is exactly my issue.

People who should have made a bigger deal of this issue (including asking for a 18.04 backport) appear to have just been content to declare, as you are attempting to do, that the manner in which Rufus and other methods that happen to create persistent partitions that trigger this bug, are "unnatural", which is simply ridiculous since it is really the most straightforward way to do it. Therefore, instead of helping raise heat so that maintainers realised that this was a problem for many many users, helped ensure that the issue was left unaddressed for years instead, leading us precisely into the situation we are now, with user after user reporting `mount: mounting /cow on /root failed: Invalid argument` issues when they try to use 18.04 with persistence.

> Seems better to me to work with what we have rather that wait for a perfect world.

Seems better to me to have people understand the ramification of trying to brush an issue under the carpet and/or the negative impact that an unaddressed bug can have for Ubuntu users.

I'm pretty sure that, if the bug was with Rufus, we'd see quite very different tune from you, with something along the line of "How can you pretend to care about your users if you are going to leave this Rufus bug unaddressed for years?".

So maybe you want to make an attempt to review the situation a bit more objectively at last.

Plenty of users of Ubuntu are affected by this issue still, since 18.04 has not backported this bugfix. And you should really know this, because you've been seeing that issue pop up in askubuntu over and over again. So, if you actually care about Ubuntu users, you might want to stop this charade of trying to point the finger at anything but the actual bug, and instead, contribute to this report to help the maintainers realize that, maybe, the fix for this is something that they should have backported to 18.04, because it has been ending up affecting many many first time users of...

Read more...

Revision history for this message
Gabriel Dina (gabriel-joy) wrote :

I have a true very simple idea how to decide the importance of this bug, just add a field of how much time and money we lost with this shitty bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.