Fail to boot with LUKS on top of RAID1 if the array is broken/degraded

Bug #1879980 reported by Guilherme G. Piccoli
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cryptsetup (Debian)
Fix Released
Unknown
cryptsetup (Ubuntu)
Fix Released
Medium
Guilherme G. Piccoli
Xenial
Won't Fix
Medium
Guilherme G. Piccoli
Bionic
Fix Released
Medium
Guilherme G. Piccoli
Focal
Fix Released
Medium
Guilherme G. Piccoli
Groovy
Fix Released
Medium
Guilherme G. Piccoli
initramfs-tools (Ubuntu)
Fix Released
Medium
Guilherme G. Piccoli
Xenial
Won't Fix
Medium
Guilherme G. Piccoli
Bionic
Fix Released
Medium
Guilherme G. Piccoli
Focal
Fix Released
Medium
Guilherme G. Piccoli
Groovy
Fix Released
Medium
Guilherme G. Piccoli
mdadm (Ubuntu)
Opinion
Medium
Guilherme G. Piccoli
Xenial
Won't Fix
Medium
Guilherme G. Piccoli
Bionic
Opinion
Medium
Guilherme G. Piccoli
Focal
Opinion
Medium
Guilherme G. Piccoli
Groovy
Opinion
Medium
Guilherme G. Piccoli

Bug Description

[Impact]
* Considering a setup of a encrypted rootfs on top of md RAID1 device, Ubuntu is currently unable to decrypt the rootfs if the array gets degraded, like for example if one of the array's members gets removed.

* The problem has 2 main aspects: first, cryptsetup initramfs script attempts to decrypt the array only in the local-top boot stage, and in case it fails, it gives-up and show user a shell (boot is aborted).

* Second, mdadm initramfs script that assembles degraded arrays executes later on boot, in the local-block stage. So, in a stacked setup of encrypted root on top of RAID, if the RAID is degraded, cryptsetup fails early in the boot, preventing mdadm to assemble the degraded array.

* The hereby proposed solution has 2 components: first, cryptsetup script is modified to allow a gentle failure on local-top stage, then it retries for a while (according to a heuristic based on ROOTDELAY with minimum of 30 executions) in a later stage (local-block). This gives time to other initramfs scripts to run, like mdadm in local-block stage. And this is meant to work this way according to initramfs-tools documentation (although Ubuntu changed it a bit with wait-for-root, hence we stopped looping on local-block, see next bullet).

* Second, initramfs-tools was adjusted - currently, it runs for a while the mdadm local-block script, in order to assemble the arrays in a non-degraded mode. We extended this approach to also execute cryptsetup, in a way that after mdadm ends its execution, we execute at least once more time cryptsetup. In an ideal world we should loop on local-block as Debian's initramfs (in a way to remove hardcoded mdadm/cryptsetup mentions from initramfs-tools code), but this would be really a big change, non-SRUable probably. I plan to work that for future Ubuntu releases.

[Test case]
* Install Ubuntu in a Virtual Machine with 2 disks. Use the installer to create a RAID1 volume and an encrypted root on top of it.

* Boot the VM, and use "sgdisk"/"wipefs" to erase the partition table from one of the RAID members. Reboot and it will fail to mount rootfs and continue boot process.

* If using the initramfs-toos/cryptsetup patches hereby proposed, the rootfs can be mounted normally.

[Regression potential]

* There are potential for regressions, since this is a change in 2 boot components. The patches were designed in a way to keep the regular case working, it changes the failure case which is not currently working anyway.

* A modification in the behavior of cryptsetup was introduced: right now, if we fail the password 3 times (the default maximum attempts), the script doesn't "panic" and drop to a shell immediately; instead it runs once more (or twice, if mdadm is installed) before failing. This is a minor change given the benefit of the being able to mount rootfs in a degraded RAID1 scenario.

* Other potential regressions could show-up as boot problems, but the change in initramfs-tools specifically is not invasive, it just may delay boot time a bit, given we now run cryptsetup multiple times on local-block, with 1 sec delays between executions.

Related branches

Changed in mdadm (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

The issue basically is about a failure in mounting root if we have a stacked setup of LUKS on top of RAID1, when RAID1 is degraded (like a member missing). What happens in detail is a conjuncture of factors leading to this problem:

(a) The initramfs script for cryptroot currently is present in two initram stages: local-top and local-block. Problem is that if the script fails on local-top phase, it panics and opens a console, not allowing the boot process to continue. In this case, subsequent scripts are not executed automatically.

(b) The mdadm initramfs script to mount degraded arrays runs on local-block stage. It provides a heuristic that tries a regular array assemble for (2/3*ROOTDELAY) times, and then it assembles the array as degraded, in which is called the "poor man last resort" mechanism.

So, the first and far more serious issue is cryptroot early fail at local-top phase. So an idea I've implemented to fix this was to allow some retries on local-block stage, given local-block should loop for a while running its scripts (at least according to documentation and Debian's initramfs code). But guess what ?

(c) In Ubuntu, we have wait-on-root, which aims to speed-up the boot, in my shallow understanding. Basically, we have wait-for-root consuming almost all the ROOTDELAY time (30s as default, if not specified), and local-block scripts run only once. Except...mdadm, which has the previously mentioned heuristic of running 2/3*ROOTDELAY times. And for that, we have a hack on initramfs-tools to cope with mdadm (!), as per commit: salsa.debian.org/kernel-team/initramfs-tools/-/commit/033c948bb0 .

So, to fix the cryptroot inability to mount root device on top a degraded RAID1 is a matter of coordinate mdadm and cryptroot, and (if my approach is taken), loop on local-block. Below are the steps I took to circumvent this long-term issue:

1) Allows cryptsetup to retry on local-block stage, relying in a heuristic based on ROOTDELAY (we try 1/4*ROOTDELAY times) and on initramfs looping at local-block phase.

2) Reduce the heuristic frequency on mdadm, in order it doesn't "beat" the cryptroot attempts, i.e., cryptroot must execute more times. for this, we reduced the heuristic for 1/5*ROOTDELAY.

3) Make local-block on Ubuntu loop again, but still rely on wait-for-root in a first step; also, I removed that mdadm heinous hack from initramfs-tools, it works without...that...if local-block loops.

Below I'll submit groovy debdiffs to gather reviews on my approach. Also, a PPA with packages built, in case somebody else wanna give them a try: https://launchpad.net/~gpiccoli/+archive/ubuntu/lp1879980
Thanks,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Not a debdiff - I found easier to just add the patches as in my local git repository of the packages.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

List of somewhat duplicate bugs:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/120375
(after comment #74)

https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/251164
(propose some alternative solutions we can think about, like failure hooks)

https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/659899

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003309
(partially related)

blog post: feeding.cloud.geek.nz/posts/the-perils-of-raid-and-full-disk-encryption-on-ubuntu

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I have a report of a Bionic user that tested the packages on my PPA with success.
I changed a small bit though, from the first proposal (just for consistency): moved the cryptsetup clean-up script to local-bottom instead of init-bottom.

Thanks,

Guilherme

description: updated
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Debian merge request for the cryptsetup patch was just submitted: https://salsa.debian.org/cryptsetup-team/cryptsetup/-/merge_requests/18

Cheers,

Guilherme

Changed in cryptsetup (Ubuntu):
status: Confirmed → In Progress
Changed in cryptsetup (Debian):
status: Unknown → New
Changed in mdadm (Ubuntu):
status: Confirmed → Opinion
Changed in initramfs-tools (Ubuntu):
status: Confirmed → In Progress
Changed in mdadm (Ubuntu Xenial):
status: New → Opinion
Changed in mdadm (Ubuntu Bionic):
status: New → Opinion
Changed in cryptsetup (Ubuntu Xenial):
status: New → Opinion
Changed in cryptsetup (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
status: New → In Progress
Changed in cryptsetup (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
Changed in cryptsetup (Ubuntu Bionic):
importance: Undecided → Medium
Changed in cryptsetup (Ubuntu Xenial):
status: Opinion → Won't Fix
Changed in cryptsetup (Ubuntu Focal):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Changed in initramfs-tools (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → Won't Fix
Changed in initramfs-tools (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Changed in initramfs-tools (Ubuntu Focal):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → In Progress
Changed in mdadm (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: Opinion → Won't Fix
Changed in mdadm (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
Changed in mdadm (Ubuntu Focal):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
importance: Undecided → Medium
status: New → Opinion
description: updated
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

One relevant discussion would be why we decided to not change mdadm code anymore. What happens here is that we have an inter-dependency between mdadm and cryptroot - we first changed the mdadm max counter to "untangle" that relation, in a way cryptroot would run more times than mdadm.

But studying better the initramfs-tools, we notice that we could use the same "hack" currently in code to execute mdadm on local-block for cryptroot, and add an extra cryptroot run if mdadm was executed. This way, we make things work as expected (ab-)using the same code already present on initramfs-tools, without requiring modifying yet another boot component.

I've set mdadm as "Opinion" in this LP because *it is affected*, in fact, it is part of the problem. But...not changing mdadm is a cheaper option in my opinion. At least for now..I plan to try a refactor on initramfs-tools to cope with inter-relations of components on local-block, regardless of their number (and this will require changing mdadm).

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Oh, I forgot to mention - Xenial won't be fixed. It's a release pretty stable with less then a year remaining of regular support, and with older code. So, in my opinion (again) it's safer to keep it as is, and consider that degraded RAID1 + encrypted rootfs is fully supported on Bionic and so on.
Cheers,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Worth to notice that the initramfs-tools debdiffs include a fix for LP #1879987 - we are doing a single SRU for 2 bugs.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Just got a test result from an user that reported the issue - the packages with the proposed patches [0] fixed the issue to him.
cheers,

Guilherme

[0] https://launchpad.net/~gpiccoli/+archive/ubuntu/lp1879980

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Attaching a new Bionic debdiff, which now includes a fix for a third bug (LP #1820929).

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

An issue on initramfs-tools autopkgtest was found in Groovy and Focal (see LP #1893675) - it's non-related with the fixes proposed here, but we need to make autopkgtest happy or we cannot get the package released, so here goes the V2 of the initramfs-tools debdiffs.
Notice the SRU is mainly driven in this LP and/or LP #1879987.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Worth to notice that the cryptsetup release is blocked on LP #1891473 - there's a failure on building this package from source introduced by a recent PPA builder upgrade (from Xenial to Bionic).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.137ubuntu12

---------------
initramfs-tools (0.137ubuntu12) groovy; urgency=medium

  * d/tests: Add explicit call to partprobe on net test, specially in
    prep-image and run-image. (LP: #1893675)

initramfs-tools (0.137ubuntu11) groovy; urgency=medium

  * scripts/functions: Prevent printf error carry over if the wrong
    console is set. (LP: #1879987)
      The function _log_msg() is "void" typed, returning whatever its
      last command returns. This function is the basic building block
      for all error/warning messages in initramfs-tools. If a bad console
      is provided to kernel on command-line, printf returns error, and so
      this error is carried over in _log_msg(). Happens that checkfs()
      function has a loop that runs forever in this scenario (*if* fsck
      is not present in initramfs and "quiet" is not passed in the
      command-line). If that happens, boot is stuck and cannot progress.
      The simple fix hereby merged is to return zero on _log_msg().

  * scripts/local: Re-execute cryptroot local-block script. (LP: #1879980)
      Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (like a member is removed/failed),
      initramfs-tools cannot mount the rootfs and the boot fails. We fix
      that issue here by allowing cryptroot script to re-run on local-block
      stage, given that mdadm is able to activate a degraded array in that
      point. There is a cryptsetup counter-part for this fix, but alone the
      initramfs-tools portion is innocuous.

 -- <email address hidden> (Guilherme G. Piccoli) Mon, 31 Aug 2020 18:04:00 -0300

Changed in initramfs-tools (Ubuntu Groovy):
status: In Progress → Fix Released
Revision history for this message
Eric Desrochers (slashd) wrote :

[sts-sponsor]

Sponsored in Focal/Bionic.

Thanks for your contribution.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Guilherme, or anyone else affected,

Accepted initramfs-tools into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/initramfs-tools/0.136ubuntu6.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in initramfs-tools (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (initramfs-tools/0.136ubuntu6.3)

All autopkgtests for the newly accepted initramfs-tools (0.136ubuntu6.3) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

initramfs-tools/0.136ubuntu6.3 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#initramfs-tools

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I've managed to verify the initramfs-tools Focal-proposed package (version 0.136ubuntu6.3) by following 2 approaches, given that we don't have its cryptsetup counter-part released yet:

(a) The verification by "negation" aims to check if we don't have regression, by testing if the new package changed the old behavior. And it passed: with the proposed initramfs-tools, if we have a degraded RAID1 holding the encrypted rootfs, Ubuntu fails to boot exactly like using the regular/released initramfs-tools package, so no regressions/behavior-change introduced.

(b) I've tested with a modified cryptsetup package [0] that included the fix counter-part, and in this case, we are able to boot as expected, proving that the initramfs-tools proposed package is working as it should.

Also, the armhf autopkgtest regression mentioned in the above comment is no longer an issue, it's fixed (test was re-executed and succeeded - likely a flaky network case).
Hence, marking this bug as verified for Focal.
Cheers,

Guilherme

[0] https://launchpad.net/~gpiccoli/+archive/ubuntu/lp1879980

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.136ubuntu6.3

---------------
initramfs-tools (0.136ubuntu6.3) focal; urgency=medium

  * scripts/functions: Prevent printf error carry over if the wrong
    console is set. (LP: #1879987)
      The function _log_msg() is "void" typed, returning whatever its
      last command returns. This function is the basic building block
      for all error/warning messages in initramfs-tools. If a bad console
      is provided to kernel on command-line, printf returns error, and so
      this error is carried over in _log_msg(). Happens that checkfs()
      function has a loop that runs forever in this scenario (*if* fsck
      is not present in initramfs and "quiet" is not passed in the
      command-line). If that happens, boot is stuck and cannot progress.
      The simple fix hereby merged is to return zero on _log_msg().

  * scripts/local: Re-execute cryptroot local-block script. (LP: #1879980)
      Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (like a member is removed/failed),
      initramfs-tools cannot mount the rootfs and the boot fails. We fix
      that issue here by allowing cryptroot script to re-run on local-block
      stage, given that mdadm is able to activate a degraded array in that
      point. There is a cryptsetup counter-part for this fix, but alone the
      initramfs-tools portion is innocuous.

  * d/tests: Add explicit call to partprobe on net test, specially in
    prep-image and run-image. (LP: #1893675)

 -- <email address hidden> (Guilherme G. Piccoli) Mon, 31 Aug 2020 13:43:29 -0300

Changed in initramfs-tools (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for initramfs-tools has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

After the cryptsetup FTBFS investigation (on LP #1891473), coincidentally a security fix was released for such package, that included a fix for the FTBFS. So, this is a "rebase" on top of the latest version for Focal/Groovy - Bionic wasn't affected, but I'm re-uploading its debdiff nevertheless.
Thanks,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Eric Desrochers (slashd) wrote :

@gpiccoli,

Can you break down everything this debdiff does per file being modified in the d/changelog along with the summary you have already provided ?

It would ease for future reference and make the d/changelog more accurate about the changes.

----
* d/cryptsetup-initramfs.install:
  - <DESCRIPTION>
* d/functions cryptsetup-2.3.3/debian/functions:
  - <DESCRIPTION>
* d/initramfs/scripts/local-block/cryptroot:
  - <DESCRIPTION>
* d/initramfs/scripts/local-bottom/cryptroot:
  - <DESCRIPTION>
* d/initramfs/scripts/local-top/cryptroot:
  - <DESCRIPTION>
----

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Eric, all the changes are part of the same functionality - the local-bottom script for example is a clean-up for the files created in the local-top phase. I think it'd be unnecessary verbosity to explain file by file, and this bug is waiting for a long time to be fixed (especially due to the FTBFS we needed to investigate and fix), so unless you think it's strictly necessary, I'd rather move this one forward without changing the description.
Thanks,

Guilherme

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Guilherme,

I can handle that while you're out.

cheers,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Eric,

I just updated the changelog with a more detailed description per file.
If it looks good to you for Groovy I'll update the stable releases too.

Thanks!
Mauricio

Revision history for this message
Eric Desrochers (slashd) wrote :

[sts-sponsors]

Sponsored and uploaded into groovy.
Let's now wait until the package lands in groovy-releases before proceeding with the SRU.

Thanks mfo and gpiccoli for your contributions.

- Eric

tags: added: patch
tags: added: sts-sponsor-mfo
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a bunch mfo!!

Revision history for this message
Eric Desrochers (slashd) wrote :

Autopkgtest failure found:

autopkgtest for systemd/246.4-1ubuntu1: amd64: Regression ♻ , arm64: Pass, armhf: Pass, ppc64el: Pass, s390x: Ignored failure

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy/groovy/amd64/s/systemd/20200922_224614_79186@/log.gz

autopkgtest [22:45:59]: @@@@@@@@@@@@@@@@@@@@ summary
timedated PASS
hostnamed PASS
localed-locale PASS
localed-x11-keymap PASS
logind PASS
unit-config PASS
storage PASS
networkd-test.py PASS
build-login PASS
boot-and-services PASS
udev PASS
root-unittests PASS
tests-in-lxd PASS
upstream FAIL timed out
boot-smoke PASS
systemd-fsckd FLAKY non-zero exit status 137

Revision history for this message
Eric Desrochers (slashd) wrote :

I'll retry the test before we investigate further.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cryptsetup - 2:2.3.3-1ubuntu6

---------------
cryptsetup (2:2.3.3-1ubuntu6) groovy; urgency=medium

  * Introduce retry logic for external invocations after mdadm (LP: #1879980)
    - Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (e.g., a member is removed/failed)
      the cryptsetup scripts cannot mount the rootfs, and the boot fails.
      We fix that issue here by allowing the cryptroot script to be re-run
      by initramfs-tools/local-block stage, as mdadm can activate degraded
      arrays at that stage.
      There is an initramfs-tools counter-part for this fix, but alone the
      cryptsetup portion is harmless.
    - d/cryptsetup-initramfs.install: ship the new local-bottom script.
    - d/functions: declare variables for local-top|block|bottom scripts
      (flag that local-block is running and external invocation counter.)
    - d/i/s/local-block/cryptroot: set flag that local-block is running.
    - d/i/s/local-bottom/cryptroot: clean up the flag and counter files.
    - d/i/s/local-top/cryptroot: change the logic from just waiting 180
      seconds to waiting 5 seconds first, then allowing initramfs-tools
      to run mdadm (to activate degraded arrays) and call back at least
      30 times/seconds more.

 -- <email address hidden> (Guilherme G. Piccoli) Wed, 16 Sep 2020 17:35:59 -0300

Changed in cryptsetup (Ubuntu Groovy):
status: In Progress → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Uploaded cryptsetup to Focal and Bionic.
Attaching the updated debdiffs for reference.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Uploaded initramfs-tools to Bionic.
Attaching the updated debdiff for reference.
(Rebased on top of the more recent -updates.)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Guilherme, or anyone else affected,

Accepted initramfs-tools into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/initramfs-tools/0.130ubuntu3.11 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in initramfs-tools (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
removed: verification-done
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Ok, I have accepted the initramfs-tools parts of this fix as they seem to make sense (and were carrying some other changes too).

But then when I was looking at the cryptsetup parts, well, I think I'd like to slow down a little. Have all these changes been consulted with the Ubuntu security team? Since I'm worried about the mentioned behavior change in cryptsetup. I would feel much better if the security team gave a +1 for this change before we proceed further, since it is touching sensitive parts of a system's privacy.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Łukasz,

Thanks for accepting initramfs-tools for now; it's much appreciated.

I'm not sure the security team reviewed this; but I can't confirm.
I'll try to find a reviewer there, to check the concerns you have.

Could you please confirm/correct/add the concerns/review points?
1) Behavior change in timeout waiting for crypto device to appear (180 secs to 5+30=35 secs)
2) Behavior change in how many times crypto passphrase is asked/fails before dropping to shell

Thanks,
Mauricio

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

In the benefit of the initramfs-tools upload to move on:

I have verified that it has no regressions with current
cryptsetup in bionic-updates and works as expected with
the patched version (in upload queue, not -proposed yet.)

Verification details in the next comment and detailed
topology and console output in the log file attached.

Setting verification-done-bionic.

cheers,
Mauricio

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

1) There are no changes with the cryptsetup in bionic-updates.

This is expected, because the changes in initramfs-tools
are gated by a file introduced in the patched cryptsetup.

Thus no regression from the initramfs-tools side.

...

And, additional testing with the _patched_ cryptsetup,
just in case so that we've it tested if/when it lands.

2) It works correctly with the patched crypsetup from PPA.

This validates that the changes in initramfs-tools work
as intended when paired with the new cryptsetup changes.

2.1) There are no changes in the good path/case
i.e., RAID is not degraded, user enters correct password.

2.2) There is one more iteration to ask password 3 times.
i.e., RAID is not degraded, user enters incorrect passwords.

- original behavior: 2x iterations of asking password 3 times
- modified behavior: 3x iterations of asking password 3 times

2.3) It can boot in the bad path/case
i.e., RAID is degraded, user now can enter password and boot.

- original behavior: times out in 180 seconds in local-top,
  checking LVM every 10 seconds, then drops to shell.

- modified behavior: times out in 5 seconds in local-top,
  moves on to local-block, checks for 30 seconds every 1
  second (including LVM); md/raid activated in 20 seconds.
  (asked for password and booted successfully.)

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Ubuntu Security Team,

I've subscribed you to this bug for a patch review asked by the SRU team.
Please find a request summary below, and feel free to ask for details.

There's a change being proposed to the cryptsetup boot logic
(debdiff in comment #44) so to allow an encrypted device on
top of MD RAID1 devices to have a chance to be detected and
boot at all if the raid array is degraded (e.g., lost disk.)

Currently, the system cannot boot in that scenario
(a degraded array is not yet activated when cryptsetup runs,
so the encrypted device with rootfs does not yet exist.)

The proposed patch introduces a retry logic that runs after
mdadm runs (as mdadm can activate the degraded array anyway)
and then re-check for the encrypted device (as it now exists.)

However, that introduces some behavior changes like the time
that is waited for the encrypted device before giving up and
trying mdadm, and the number of times asking for passphrase.

A summary of those changes is in comment #51 and description,
and the request from the SRU team in comment #48 with detail
suggestions in #49.

Thank you,
Mauricio

Revision history for this message
Alex Murray (alexmurray) wrote :

I can't see any potential security impact from this - yes it will now do another round of asking for passwords but 9 tries doesn't really help (from an attacker point-of-view) any more than 6 tries assuming this is a long passphrase - so consider this an ACK from the security team.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hey Alex, thank you very much for your prompt review.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Thank you Alex! In that case, let me review the change once again.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.130ubuntu3.11

---------------
initramfs-tools (0.130ubuntu3.11) bionic; urgency=medium

  [ Guilherme G. Piccoli ]
  * scripts/functions: Prevent printf error carry over if the wrong
    console is set. (LP: #1879987)
      The function _log_msg() is "void" typed, returning whatever its
      last command returns. This function is the basic building block
      for all error/warning messages in initramfs-tools. If a bad console
      is provided to kernel on command-line, printf returns error, and so
      this error is carried over in _log_msg(). Happens that checkfs()
      function has a loop that runs forever in this scenario (*if* fsck
      is not present in initramfs and "quiet" is not passed in the
      command-line). If that happens, boot is stuck and cannot progress.
      The simple fix hereby merged is to return zero on _log_msg().

  * scripts/local: Re-execute cryptroot local-block script. (LP: #1879980)
      Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (like a member is removed/failed),
      initramfs-tools cannot mount the rootfs and the boot fails. We fix
      that issue here by allowing cryptroot script to re-run on local-block
      stage, given that mdadm is able to activate a degraded array in that
      point. There is a cryptsetup counter-part for this fix, but alone the
      initramfs-tools portion is innocuous.

  [ Jay Vosburgh ]
  * scripts/functions: Change netplan render for net_failover master
    devices. (LP: #1820929)
      Modify the _render_netplan function to check for network interfaces
      that are net_failover master devices. When found, such devices are
      matched only by name, not by MAC address, as the MAC is not a unique
      identifier for the net_failover case. In the net_failover architecture,
      the MAC address is used to manage the membership of the net_failover
      interface set, thus multiple interfaces will be assigned the same MAC
      address.

 -- <email address hidden> (Guilherme G. Piccoli) Wed, 12 Aug 2020 17:12:11 -0300

Changed in initramfs-tools (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks a lot Alex for your review from a security point-of-view. And thanks again Lukasz for dealing with this SRU!
Cheers,

Guilherme

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Guilherme, or anyone else affected,

Accepted cryptsetup into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cryptsetup/2:2.2.2-3ubuntu2.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cryptsetup (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
removed: verification-done verification-done-focal
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Guilherme, or anyone else affected,

Accepted cryptsetup into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cryptsetup/2:2.0.2-1ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cryptsetup (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
removed: verification-done-bionic
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thanks for releasing the packages in -proposed Lukasz. I was able to complete the validation for the Focal version, following the procedure in the description. An user reported internally to me that the verification of Bionic version was also successful, hence I'm hereby marking this LP as verified for both releases.

Bionic version tested: 2:2.0.2-1ubuntu1.2
Focal version tested: 2:2.2.2-3ubuntu2.3

Cheers,

Guilherme

tags: added: verification-done verification-done-bionic verification-done-focal
removed: verification-needed verification-needed-bionic verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cryptsetup - 2:2.2.2-3ubuntu2.3

---------------
cryptsetup (2:2.2.2-3ubuntu2.3) focal; urgency=medium

  * Introduce retry logic for external invocations after mdadm (LP: #1879980)
    - Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (e.g., a member is removed/failed)
      the cryptsetup scripts cannot mount the rootfs, and the boot fails.
      We fix that issue here by allowing the cryptroot script to be re-run
      by initramfs-tools/local-block stage, as mdadm can activate degraded
      arrays at that stage.
      There is an initramfs-tools counter-part for this fix, but alone the
      cryptsetup portion is harmless.
    - d/cryptsetup-initramfs.install: ship the new local-bottom script.
    - d/functions: declare variables for local-top|block|bottom scripts
      (flag that local-block is running and external invocation counter.)
    - d/i/s/local-block/cryptroot: set flag that local-block is running.
    - d/i/s/local-bottom/cryptroot: clean up the flag and counter files.
    - d/i/s/local-top/cryptroot: change the logic from just waiting 180
      seconds to waiting 5 seconds first, then allowing initramfs-tools
      to run mdadm (to activate degraded arrays) and call back at least
      30 times/seconds more.

 -- <email address hidden> (Guilherme G. Piccoli) Wed, 16 Sep 2020 17:40:05 -0300

Changed in cryptsetup (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cryptsetup - 2:2.0.2-1ubuntu1.2

---------------
cryptsetup (2:2.0.2-1ubuntu1.2) bionic; urgency=medium

  * Introduce retry logic for external invocations after mdadm (LP: #1879980)
    - Currently, if an encrypted rootfs is configured on top of a MD RAID1
      array and such array gets degraded (e.g., a member is removed/failed)
      the cryptsetup scripts cannot mount the rootfs, and the boot fails.
      We fix that issue here by allowing the cryptroot script to be re-run
      by initramfs-tools/local-block stage, as mdadm can activate degraded
      arrays at that stage.
      There is an initramfs-tools counter-part for this fix, but alone the
      cryptsetup portion is harmless.
    - d/i/cryptroot-script: set files used by cryptsetup/initramfs-tools
      (flag that local-block is running and external invocation counter);
      change logic from just wait 180 seconds / activating LVM every 10
      seconds to waiting 5 seconds first, then allowing initramfs-tools
      to run mdadm (to activate degraded arrays) and call back at least
      30 times/seconds more / activating LVM every 1 second.
    - d/i/cryptroot-script-block: set flag that local-block is running.
    - d/i/cryptroot-script-bottom: clean up the flag and counter files.
    - d/rules: ship the new local-bottom script.

 -- <email address hidden> (Guilherme G. Piccoli) Mon, 03 Aug 2020 18:28:48 -0300

Changed in cryptsetup (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
John Gilmore (gnu-gilmore) wrote :
Download full text (4.5 KiB)

I am running a laptop with an internal nvme that is partitioned to contain an encrypted LUKS partition that contains an LVM2 PV. That PV contains part of an LV (lvubuntu) which also includes a partition residing on an external USB drive. The theory is that I'll make part of the external PV into a RAID-1 image of the internal partition, which will automatically back up the internal nvme whenever I plug in that drive, making it a USB-bootable bit-for-bit copy. (That part doesn't work yet because of a blocksize mismatch between the 4k hard drive and the 512b nvme partition; I'll need to dump, reformat, and restore the nvme LV into 4k blocks before LVM will succeed in making it part of a RAID set.)

I have been able to boot this laptop (without the external drive attached) even prior to this bug fix. I can't tell you exactly how it worked, but it worked, possibly because the RAID1 isn't set up yet; instead the LV just includes both PV's.

I upgraded to the released fix for this bug in 20.04, as part of installing all current security patches. Now when I boot the system, it detects the encrypted partition, asks for the password, gets it, and succeeds in fsck-ing the root partition. But then it loops and fsck's it again. And again. And again. (I noticed this by the long delay with the circling icon on the splash screen. I switched to ctrl-alt-F1 to see console messages and there I see it doing the fsck over and over.) Eventually THIS behavior times out too, and the system does boot.

But I suspect that this is not exactly what you wanted this patch to do. It used to boot very quickly, now after the patch for this bugfix, it doesn't.

I'm running 20.04.1 on an amd64 (Lenovo Ideapad Flex 5 14", AMD Ryzen 5 4500U) laptop.

# pvdisplay
  WARNING: Couldn't find device with uuid u2T6W5-rI3L-40C8-rZc3-uS6H-rMsV-U58ic5.
  WARNING: VG vgubuntu is missing PV u2T6W5-rI3L-40C8-rZc3-uS6H-rMsV-U58ic5 (last written to /dev/mapper/luks-d43f3ec2-2633-46d0-bd1b-1c9f46e13c4d).
  --- Physical volume ---
  PV Name /dev/mapper/nvme0n1p3_crypt
  VG Name vgubuntu
  PV Size 237.24 GiB / not usable 0
  Allocatable yes
  PE Size 4.00 MiB
  Total PE 60734
  Free PE 253
  Allocated PE 60481
  PV UUID SQ4XNm-dKJQ-ild3-dx2k-lrGL-fqay-5rY8nY

  --- Physical volume ---
  PV Name [unknown]
  VG Name vgubuntu
  PV Size <3.64 TiB / not usable 4.00 MiB
  Allocatable yes
  PE Size 4.00 MiB
  Total PE 953610
  Free PE 817930
  Allocated PE 135680
  PV UUID u2T6W5-rI3L-40C8-rZc3-uS6H-rMsV-U58ic5

# lvdisplay
  WARNING: Couldn't find device with uuid u2T6W5-rI3L-40C8-rZc3-uS6H-rMsV-U58ic5.
  WARNING: VG vgubuntu is missing PV u2T6W5-rI3L-40C8-rZc3-uS6H-rMsV-U58ic5 (last written to /dev/mapper/luks-d43f3ec2-2633-46d0-bd1b-1c9f46e13c4d).
  --- Logical volume ---
  LV Path /dev/vgubuntu/root
  LV Name root
  VG Name vgubuntu
  LV UUID Dy2wIe-nhEd-BZN5-i3bg-AU3P-uenQ-fuPkSR
  LV Writ...

Read more...

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

John, thanks a lot for your report! Definitely we don't want to delay boots - although I'm happy to hear that eventually it boots. Can you send me logs so I can understand what's going on?
My suggestion is to follow the steps below (as root user):

(0) [optional] Force a log rotation, in order we only capture the relevant/latest data:
logrotate -f /etc/logrotate.conf ;

(1) Add "debug ignore_loglevel" to your kernel command-line (usually done by editing /etc/default/grub or /etc/default/grub.d/[somefile]); update grub after editing the conf file (through "update-grub" tool);

(2) Now that'll seem a bit counter-intuitive: reboot the machine, and all initramfs-tools verbose output will go to a file, *including* the password requests for LUKS (I'm not sure why this happens, I feel it's bug but we can live with that for now, to collect your data). So your system might seem hung - write the password and press ENTER how many times it's usually asked (when system appears hung) - hopefully you manage to boot your system, even if takes a while.

(3) Collect 2 files and attach them here please: "/var/log/syslog" and "/run/initramfs/initramfs.debug".

Hopefully with that I can understand exactly what's causing this weird behavior in your setup.
Cheers,

Guilherme

tags: removed: sts-sponsor-mfo
Changed in cryptsetup (Debian):
status: New → Fix Released
Revision history for this message
Apteryx (maxco) wrote :

I'm curious; would this fix other issues such as Btrfs RAID1 on top of fully encrypted disks?

To post a comment you must log in.