15.10beta crashes encrypted swap partition

Bug #1506139 reported by Hadmut Danisch
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
High
Unassigned

Bug Description

Hi,

I'm usually using a setup with three partitions on a disk

Partition 1: plain ext4 boot partition mounted on /boot
Partition 2: luks-encrypted swap
Partition 3: luks-encrypted btrfs for / /home ...

both mentioned in /etc/crypttab like

sda2_crypt UUID=a7976d5c-6191-436d-9cf9-2cedf17d0893 none luks,swap,discard
sda3_crypt UUID=339b9a90-8292-422d-a3cf-eeb0317e9f84 none luks,discard

With several machines I have installed 15.10 beta on and in several cases I experienced the problem that the swap is not activated at boot time and that /dev/disks/by-uuid does not contain a link to the swap partition, and the previously created luks-encrypted swap is destroyed after boot: It is not a luks partition anymore and filled with random (presumably encrypted) bytes without structure.

I first thought that this is a problem of the setup process, and repaired the swap manually. But then I found the partition destroyed again. This happend several times on several machines.

I am not sure yet what exactly would destroy the partition.

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: cryptsetup 2:1.6.6-5ubuntu2
ProcVersionSignature: Ubuntu 4.2.0-16.19-generic 4.2.3
Uname: Linux 4.2.0-16-generic x86_64
ApportVersion: 2.19.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
Date: Wed Oct 14 18:12:58 2015
InstallationDate: Installed on 2015-10-08 (5 days ago)
InstallationMedia: Xubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150924)
SourcePackage: cryptsetup
UpgradeStatus: No upgrade log present (probably fresh install)
crypttab:
 sda2_crypt UUID=a7976d5c-6191-436d-9cf9-2cedf17d0893 none luks,swap,discard
 sda3_crypt UUID=339b9a90-8292-422d-a3cf-eeb0317e9f84 none luks,discard

Revision history for this message
Hadmut Danisch (hadmut) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

The systemd package has taken over the handling of /etc/crypttab at boot from cryptsetup (without much coordination AFAICS), and it sounds like its interpretation of the crypttab is buggy.

"swap" is not synonymous with "random", and should not result in the device being clobbered, which is what is happening here. In particular, encrypted persistent swap needs to be supportable for users who wish to use this for suspend to disk, and this requires a LUKS header (with UUID).

Note however that for this use case, you *also* don't actually want to use 'swap' as an option in /etc/crypttab, because this is defined as "Run mkswap on the created device", and there's no need to do that if you have a persistent crypted swap.

affects: cryptsetup (Ubuntu) → systemd (Ubuntu)
Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → High
Revision history for this message
Hadmut Danisch (hadmut) wrote :

Well, life would be much easier if there was some usable documentation about what's going on within systemd.

By the way, I did not put in that 'swap' option manually, it was inserted by the xubuntu 15.10 beta installer on cdrom/usb image. If you choose to encrypt a partition and put a swap inside, it automatically adds that swap option. So at least this crypttab option, the behaviour of the installer, and systemd don't fit together.

Since you mention it: On my other machine with 15.10 I noticed the problem that the machine does not recover from hibernate, but performs a fresh boot, which meets your hint, that wake up does not work with that style of crypt swap.

whatever it is what fills the device with random data, should honor the luks option in the crypttab and use this thing as intended (i.e. configure the device mapper and do a swapon).

Revision history for this message
Martin Pitt (pitti) wrote :

I tried to reproduce this on today's ubuntu desktop amd64 image (20151014). I think I set up partitions like you described: 1 GB /boot on partition 1, 1 GB LUKS on partition 2 (and put swap on vda2_crypt), 8 GB LUKS on partition 3 (and put btrfs / on vda3_crypt).

Both during install and after a few reboots I see correct partition/file system types in "blkid":
$ blkid
/dev/mapper/vda3_crypt: UUID="5d281986-88e6-4a51-97a3-72f7af49792a" UUID_SUB="f31e41da-5fb9-401e-82c6-c8ba0fc031a6" TYPE="btrfs"
/dev/mapper/vda2_crypt: UUID="1fda8cc2-08b8-4c2c-820a-7ac07014ab3b" TYPE="swap"
/dev/vda1: UUID="947e51a6-196c-40f9-a9fe-9a53429bbaaa" TYPE="ext4" PARTUUID="1d8c299a-01"
/dev/vda2: UUID="7a5a8534-53ca-4cf9-ae69-3d164c9d7ab6" TYPE="crypto_LUKS" PARTUUID="1d8c299a-02"
/dev/vda3: UUID="aa700da9-6f7e-4de9-ae1d-3db0988dd0fe" TYPE="crypto_LUKS" PARTUUID="1d8c299a-03"

The only change was in the UUID of vda2_crypt as that gets re-mkswap-ed every time due to the "swap" option in crypttab. If that's undesired, this needs to be fixed in partman -- however, it doesn't sound like that's the actual issue you see.

My /etc/crypttab looks pretty much like your's:
vda2_crypt UUID=7a5a8534-53ca-4cf9-ae69-3d164c9d7ab6 none luks,swap,discard
vda3_crypt UUID=aa700da9-6f7e-4de9-ae1d-3db0988dd0fe none luks,discard

and /etc/fstab isn't surprising either:
dev/mapper/vda3_crypt / btrfs defaults,subvol=@ 0 1
# /boot was on /dev/vda1 during installation
UUID=947e51a6-196c-40f9-a9fe-9a53429bbaaa /boot ext4 defaults 0 2
/dev/mapper/vda3_crypt /home btrfs defaults,subvol=@home 0 2
/dev/mapper/vda2_crypt none swap sw 0 0

So I can't reproduce "destroys swap partition" just yet. From your description it sounds like something is destroying sda3 itself (i. e. the outer encrypted LUKS partition), *not* the unencrypted sda3_crypt, right?

Can you please give me some details:

 - What do you precisely do to "repair the swap manually"?
 - After that, please copy&paste the output of "sudo blkid", "sudo swapon -s", "cat /etc/crypttab", and "cat /etc/fstab".
 - Reboot
 - After that, please copy&paste all of the above commands again, so that we can compare.
 - Run "sudo journalctl -b > /tmp/journal.txt" and attach /tmp/journal.txt as well.

Thanks!

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Hadmut Danisch (hadmut) wrote :

> From your description it sounds like something is destroying sda3 itself (i. e. the outer encrypted LUKS partition), *not* the unencrypted sda3_crypt, right?

Right.

I've created the partitions with the graphical xubuntu installer from xubuntu 15.10 beta 1 cdrom put on a usb stick, and created both sda2 and sda3 as encrypted volumes, then put a swap in sda2_encrypted and btrfs in sda3_encrypted. This worked well with 14.04.

After booting I've realized that the machine had no swap, even no links to the partition under /dev/disks/by-uuid, and thus could not open the device manually.

I found that the partition was completely filled with random data, no luks header. cryptsetup isLuks said it is not a luks device, and xxd should no trace of a luks header anymore, completely overwritten.

I assumed it was a problem of the installer, not of the running system. My first suspicion was a corrupted partition table, but I did not find any problem with the partition itself. My next suspicion was a fault in the storage device, since I had replaced the old hard disk with a brand new SSD for the fresh install, but except from that problem I do not see any problems with storage, and I experienced these problems on two distinct machines. I do not see any problems on the other partitions and their file systems so far.

> - What do you precisely do to "repair the swap manually"?

cryptsetup luksFormat -c aes-xts-plain64 -s 512 /dev/sda2 (and enter the same password as for the root partition sda3)
cryptsetup luksOpen /dev/sda2 xxx
mkswap /dev/mapper/xxx

On one of the two machines (office machine, I'm using right now) this helped and the problem did not reoccur so far. That's why I first assumed that it was just a problem of the installation process (graphical xubuntu installer), because I had experienced more trouble with the installer used in the lubuntu 15.10 beta cdrom image.

I did the very same thing at my machine at home, also ran into that problem, again assumed that it was a problem of the xubuntu installer, fixed it as described above, but it reoccured. (Meanwhile there's more trouble with this machine, systemd hangs in the boot process, except when I open an emergency root session.)

>- After that, please copy&paste the output of ...

I'll reply to that once I am back home at that particular machine.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK, I am back at my home machine: The problem occured again, the machine destroyed again luks on /dev/sda2.

Furthermore, I have another problem: When doing a regular boot, but boot process hangs after systemd listed the names of several services (in most cases networking.service is the last printed, which is not quite useful, since these are, as I understand it, finished services, not the once that cause trouble. I did not yet find a way to make that damned systemd tell what it's doing.

Strange enough, the machine boots without problems if I choose there recovery mode, choose to aktive network from the menu, and then go on, so it works when recovery mode is part of the boot chain. I guess Ubuntu will have lots of fun with that systemd.

sudo blkid (sda2 currently damaged again)

/dev/mapper/sda3_crypt: UUID="9b9831d9-62f5-4fe0-872a-704bd66d5f7f" UUID_SUB="804b8d81-3f2b-4b24-894f-c63a71f3d442" TYPE="btrfs"
/dev/sda1: UUID="19e9998b-814c-4302-8003-f95e0e6a254e" TYPE="ext4" PARTUUID="04582fbe-af7b-4d10-ad4a-6d2277bbf679"
/dev/sda3: UUID="339b9a90-8292-422d-a3cf-eeb0317e9f84" TYPE="crypto_LUKS" PARTUUID="84cfaf1d-9da4-4c6c-bf45-6af03ba7b265"
/dev/sda2: PARTUUID="ab20073f-29c4-42d6-a971-af2cdb2e2339"

swapon -s : no output

/etc/crypttab:

#sda2_crypt UUID=a7976d5c-6191-436d-9cf9-2cedf17d0893 none luks,swap,discard
sda2_crypt /dev/disk/by-id/ata-SanDisk_SDSSDHII480G_***************-part2 none luks,swap,discard
sda3_crypt UUID=339b9a90-8292-422d-a3cf-eeb0317e9f84 none luks,discard

(I've replaced the serial number of my disk with *********)

/etc/fstab
/dev/mapper/sda3_crypt / btrfs defaults,subvol=@ 0 1
UUID=19e9998b-814c-4302-8003-f95e0e6a254e /boot ext4 defaults 0 2
/dev/mapper/sda3_crypt /home btrfs defaults,subvol=@home 0 2
/dev/mapper/sda2_crypt none swap sw 0 0

.
I'll now repair the partition as described, reboot and come again.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK, freshly rebootet. This time, sda2 has survived as a valid and operating luks partition.

crypttab and fstab not changed.

# swapon -s
Filename Type Size Used Priority
/dev/dm-1 partition 16308220 0 -1

# dir /dev/mapper
insgesamt 0
crw------- 1 root root 10, 236 Okt 15 22:17 control
lrwxrwxrwx 1 root root 7 Okt 15 22:18 sda2_crypt -> ../dm-1
lrwxrwxrwx 1 root root 7 Okt 15 22:17 sda3_crypt -> ../dm-0

# blkid
/dev/mapper/sda3_crypt: UUID="9b9831d9-62f5-4fe0-872a-704bd66d5f7f" UUID_SUB="804b8d81-3f2b-4b24-894f-c63a71f3d442" TYPE="btrfs"
/dev/sda1: UUID="19e9998b-814c-4302-8003-f95e0e6a254e" TYPE="ext4" PARTUUID="04582fbe-af7b-4d10-ad4a-6d2277bbf679"
/dev/sda2: UUID="e1a46217-7f77-46b8-b109-320d11e47d83" TYPE="crypto_LUKS" PARTUUID="ab20073f-29c4-42d6-a971-af2cdb2e2339"
/dev/sda3: UUID="339b9a90-8292-422d-a3cf-eeb0317e9f84" TYPE="crypto_LUKS" PARTUUID="84cfaf1d-9da4-4c6c-bf45-6af03ba7b265"
/dev/mapper/sda2_crypt: UUID="aa426924-b1ae-478e-9d2a-dc838a2df367" TYPE="swap"

I'll attach journal.txt

Revision history for this message
Martin Pitt (pitti) wrote :

> This time, sda2 has survived as a valid and operating luks partition.

Then the journal won't show the bits where it destroys it (but it's still useful for comparison). I'd like to see a journal when it does destroy the device. One way would be to just keep rebooting until that happens.

However, there might be a faster and also more useful way. First, stop only the swap partition and luks device:

   sudo systemctl stop systemd-cryptsetup@sda2_crypt.service

Now /dev/mapper/ should not have sda2_crypt any more, just sda3_crypt (for the root partition). Then you can run the commands in /run/systemd/generator/systemd-cryptsetup@sda2_crypt.service manually with extra debugging:

   sudo SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-cryptsetup attach sda2_crypt /dev/sda2 none luks,swap,discard
   (enter passphrase)
   sudo /lib/systemd/systemd-cryptsetup detach sda2_crypt
   # now check if the signature is still correct:
   sudo blkid -p /dev/sda2

You can try running this several times until it destroys your partition (FTR, I ran it successfully some 20 times). Does that reproduce the bug for you? If so, please copy&paste the output from the command cycle that did the destruction. If not, then I guess it's something else in the boot process, and then please reboot until it happens and attach the journal output from this boot.

Thanks!

> Furthermore, I have another problem: When doing a regular boot, but boot process hangs after systemd listed the names of several services

Please file a separate bug report about that. /usr/share/doc/systemd/README.Debian.gz describes how to debug such shutdown hangs. In particular, boot with enabling the debug shell, and when it hangs switch to it and check for running services and also save the journal. Thanks!

Revision history for this message
Hadmut Danisch (hadmut) wrote :

I've noticed something and I guess that both my problems - swap problem and systemd hanging while booting - are closely related.

Why?

The system does not hang at boot, when I choose the recovery mode from grub, and in the recovery mode select "network" to enable networking. Important: before activating the network, the console asks me to enter the password for sda2_crypt (swap). System then can boot up the regular way.

So it seems to be something in the systemd service order. The recovery menu's network option does something, the normal boot sequence doesn't.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK,

I have debugged this down and got big steps further in identifying the problem.

An important step for debugging was to learn how to debug systemd.

    http://freedesktop.org/wiki/Software/systemd/Debugging/

was quite helpful, that

   systemctl enable debug-shell.service

helps a lot. After that, one can get a root shell when the systemd boot process is hanging.

I have identified *two* problems, both in
/lib/systemd/systemd-cryptsetup

First problem:

The system boot procedure hangs because the process

    /lib/systemd/systemd-cryptsetup attach sda2_crypt /dev/disk/by-id/ata-SanDisk_SDSSDHII480G_**********-part2 none luks,swap,discard

hangs. It waits for password input, but for some reason it's prompt and input don't make it their way to the boot console or boot splash prompt. There's a problem with the procedure for requesting a password.

Killing that process from the debug console makes the boot process continue immediately (of course without working swap).

Once knowing that this is the process causing trouble, debugging get's much easier, since it is not required anymore to try this within a boot process. You can use a running machine with any test partition for easy debugging.

BTW: systemd does not use /etc/crypttab directly, but converts the contents of /etc/crypttab to dynamically created units first, which can be found under /run/systemd. It shows

ExecStart=/lib/systemd/systemd-cryptsetup attach 'sda2_crypt' '/dev/disk/by-id/ata-SanDisk_SDSSDHII480G_**********-part2' 'none' 'luks,swap,discard'
ExecStop=/lib/systemd/systemd-cryptsetup detach 'sda2_crypt'
ExecStartPost=/sbin/mkswap '/dev/mapper/sda2_crypt'

So one knows what happens right here.

You can easily call the given command from anywhere as root with any partition, without the need to edit /etc/crypttab, because it's all command line parameters here. Makes testing pretty easy now.

Second problem:

That damned systemd-cryptsetup ignores luks (or is unable to cope with modern luks settings).

That's what the dmsetup looks like for my root partition setup in the initramfs:

0 903712768 crypt aes-xts-plain64 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 8:3 4096 1 allow_discards

This looks good, because it's the same crypt-parameters (aes-xts) as I used when creating the luks partition, and it uses an offset of 4096, allowing the luks header to remain untouched.

But after running that systemd-cryptsetup for the sda2 partition (even after freshly partitioning it with cryptsetup), dmtable shows that:

0 32616448 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:2 0 1 allow_discards

which contains *two* wrong settings:

- it's the wrong cipher

- it's an offset of 0, which overwrites the luks header. That's why I am seeing garbage again and again.

So it turns out that systemd-cryptsetup is tripple-buggy:

- Password dialog not working in boot process, neither in splash or non-splash mode (that's why boot process hangs)

- wrong cipher

- no offset, thus overwriting the luks header.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

More info from the source code of systemd-cryptsetup:

 else if (streq(option, "luks"))
                arg_type = CRYPT_LUKS1;
...
 } else if (STR_IN_SET(option, "plain", "swap", "tmp"))
                arg_type = CRYPT_PLAIN;

so the swap argument overwrites the luks argument and resets the ecryption type to plain.

And indeed, when using

/lib/systemd/systemd-cryptsetup attach sda2_crypt /dev/sda2 none swap,luks,discard

(i.e. just change the order of the parameters, use swap,luks,discard instead of luks, swap, discard , as the ubuntu installer creates, it works and uses the luks partition correctly.

Revision history for this message
Hadmut Danisch (hadmut) wrote :

OK, I've finally found the problem(s). Was a bunch of little nasty problems, that's why it was difficult to debug.

1)

The 15.10 beta installer had filled /etc/initramfs-tools/conf.d/resume with

RESUME=UUID=a18e9ec9-1255-4dda-8298-8e10bdbe6835

which is never updated after first installation. Since I had to repair the swap device several times, this was not correct anymore, and furthermore /usr/share/initramfs-tools/hooks/cryptroot can't deal with it. It's important that

RESUME=sda2_crypt

is entered.

2) That's why /usr/share/initramfs-tools/hooks/cryptroot did not mention the device in /conf/conf.d/cryptroot of the initramfs.

Once it is correctly mentioned in this file (after fixing bug 1), the password is fetched and the device is opened at the initramfs phase, i.e. before systemd takes control. This works well.

3) If sda2_crypt is not mentioned in the initramfs' /conf/conf.d/cryptroot, it is not opened while initramfs has control. But then, once systemd takes control, systemd tries to open it since it is listed in /etc/crypttab.

But this does not work, since both systemd and plymouthd have bugs. plymouthd can go into an endlesss loop or completey fail, depending whether you have splash/graphical boot or textual.

Once bug 1 and 2 are solved, this issue does not occur anymore.

4) But then, system hangs while booting for another reason. systemd still tries to create a swap device and hangs forever. I could not reliably figure out why, but it looks as if it waits for systemd-cryptsetup for to do some things which it doesn't do since the crypt device is already open.

Solution: remove the swap option in /etc/crypttab

5) Finally seems to work.

Just for the notes: systemd (and plymouth) is so buggy and intransparent that it is far from beeing production-ready.

That cost me several evenings of work and headache.

Martin Pitt (pitti)
Changed in systemd (Ubuntu):
status: Incomplete → New
Revision history for this message
Hadmut Danisch (hadmut) wrote :

Further observations:

I meanwhile figured out three modes:

1) putting the swap flag into /etc/crypttab -> crashes the partition every now and then, but not always.

2) removing the flag from /etc/crypttab, but keeping it in /etc/fstab and keeping it in
/etc/initramfs-tools/conf.d/resume: Everything works well , but a) initramfs asks for the password twice (once per partition), which is sort of annoying, maybe the password sharing scripts coming with cryptsetup might work (didn't try yet with 15.10).

3) as 2, but removing /etc/initramfs-tools/conf.d/resume: Interesting effect. The bootprocess asks for password only once for the root partition(!), but systemd nevertheless mounts the swap partition as a second luks device without needing the password. Source code says something about a password cache, so the password from the root device seems to be cache for some time and accessible. Nice, but resume from hibernation does not work.

Unfortunately the ubuntu installer produces mode 1, which does not really work.

This mess should really be fixed for 16.04 LTS. In my eyes it's a major problem of systemd, but the initramfs code could also be extended to use the password cache (which is there and caches anyway) to avoid asking twice.

regards

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Martin Pitt (pitti)
Changed in systemd (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
Revision history for this message
eviljoel (eviljoel-t) wrote :

I'm also having this issue with Ubuntu 16.04.2.

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.