aes-xts and aes-cbc failing to initialize on power since 4.10
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Critical
|
Unassigned |
Bug Description
Hi,
I got to this having a failing systemd testcase for my qemu upload but we tracked that to a regression with the 4.10 upload IMHO.
TL;DR:
- xts-aes and cbc-aes fail to initialize on power (other arch fine) since kernel 4.10
socket(AF_ALG, SOCK_SEQPACKET, 0) = 5
bind(5, {sa_family=AF_ALG, sa_data=
Details:
Failing "storage" subtest of systemd
https:/
test_luks_
LUKS device by plain device name, empty ... device-mapper: reload ioctl on temporary-
Failed to setup dm-crypt key mapping for device /dev/sda.
Check that kernel supports aes-xts-plain64 cipher (check syslog for more info).
Failed to stop <email address hidden>: Unit <email address hidden> not loaded.
FAIL
[...]
Studying the history of that test shows that it failed rather often with the same error:
http://
I discussed with pitti as systemd and autopkgtest expert:
[10:14] <cpaelzer> pitti: hi, I see in the autopkgtest history of systemd that the "storage" test failed rather often on ppc64el
[10:15] <cpaelzer> pitti: I' currently ran into the same as well, since it failed/worked so often without any visible difference - are these just "retried until passing"
[10:15] <cpaelzer> pitti: or is there some background to learn?
[10:15] --> joedborg (~joedborg@
[10:15] <cpaelzer> FYI - issues like in https:/
[10:16] <pitti> cpaelzer: indeed, "reload ioctl on temporary-
[10:16] <pitti> that should be easy enough to reproduce outside the systemd test suite too, for a kernel bug report
[10:17] <pitti> https:/
[10:17] <cpaelzer> pitti: ok, so in proposed retrying til pass and in a test env trying to debug and understand
[10:17] <cpaelzer> pitti: thanks
[10:18] <cpaelzer> depending which of both is faster the debugging might resolve the retrying :-)
[10:18] <pitti> the test uses scsi_debug as a drive, I suggest to try with both that and a simple loop device to see whether that makes a difference
[10:18] <cpaelzer> thank you, I will do so
[10:19] <pitti> cpaelzer: y and x seem fine, so might be a regression of linux 4.11?
[10:20] <pitti> actually it started failing with trigger linux-meta/
[10:20] <pitti> https:/
[10:20] <cpaelzer> yeah I see that this was the "date" it started
[10:20] <pitti> i. e. 4.9 → 4.10
[10:20] <cpaelzer> former fails had other errors
So I went down debugging this:
1. Zesty guest on ppc64el (isolation-machine, so KVM)
2. repro as-is in this guest
pull-lp-source systemd
cd debian/tests
sudo ./storage (needs-roos, so sudo)
=> reproducable
3. debug
Dmesg shows related errors
[ 712.187600] Failed to allocate transformation for 'xts(aes)': -2
[ 712.249614] Failed to allocate transformation for 'xts(aes)': -2
[ 712.250818] device-mapper: table: 253:0: crypt: Error allocating crypto tfm
[ 712.251990] device-mapper: ioctl: error adding target to table
4. simplify repro
modprobe scsi_debug
# in my case /dev/sda
echo fookey | sudo cryptsetup --batch-mode luksFormat --key-file - /dev/sda
Here we can also confirm that it is not the devtype - e.g. also loop devices fail.
I checked the kernel, but the config has set the algorithm:
$ cat /boot/config-
CONFIG_CRYPTO_XTS=y
There are similar older bugs - but they are only "similar" both are fixed in our kernel and cryptsetup versions:
kernel: https:/
cryptsetup: https:/
Fortunately cryptsetup can check all kind of algorithms easily:
$ cryptsetup benchmark
[...]
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b N/A N/A
serpent-cbc 128b 58,3 MiB/s 68,2 MiB/s
twofish-cbc 128b 118,9 MiB/s 134,6 MiB/s
aes-cbc 256b N/A N/A
serpent-cbc 256b 58,8 MiB/s 68,2 MiB/s
twofish-cbc 256b 119,0 MiB/s 134,4 MiB/s
aes-xts 256b N/A N/A
serpent-xts 256b 62,0 MiB/s 68,1 MiB/s
twofish-xts 256b 127,8 MiB/s 132,6 MiB/s
aes-xts 512b N/A N/A
serpent-xts 512b 62,2 MiB/s 68,1 MiB/s
twofish-xts 512b 128,2 MiB/s 132,5 MiB/s
So aes-xts (the default) is not avail?
Really?
BTW - This at least helps to further simplify the test (no need for a device at all)
$ cryptsetup benchmark --cipher aes-xts
But in /proc/crypto I can see:
name : xts(aes)
driver : p8_aes_xts
module : vmx_crypto
priority : 2000
refcnt : 1
selftest : passed
internal : no
type : blkcipher
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
geniv : <default>
The test is not failing on x86, check there:
=> Working just fine on x86
$ cat /proc/crypto (from x86)
[...]
name : xts(aes)
driver : xts(ecb(aes-asm))
module : kernel
priority : 200
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
chunksize : 16
On the power system the listed module "vmx_crypto" is loaded.
The related output in /proc/crypto did not change since xenial.
Stracing to confirm it is not the old accept before bind issue:
Call (xenial):
socket(PF_ALG, SOCK_SEQPACKET, 0) = 5
bind(5, {sa_family=AF_ALG, sa_data=
Bad (Zesty):
socket(AF_ALG, SOCK_SEQPACKET, 0) = 5
bind(5, {sa_family=AF_ALG, sa_data=
The old kernel bug was about "bind" needed to be (setting key types) before accept.
The new crypsetup does that, so it is NOT the old issue (as we could assume since it was fixed in 4.4 and various stable updates).
Yet it is very close - now the bind itself fails setting the xts-aes type (and cbc-aes fails too btw).
=> kernel update regression from the move to 4.10
ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.10.0-9-generic ppc64le
.var.log.platform: Error: [Errno 13] Permission denied: '/var/log/platform'
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Mär 6 09:26 seq
crw-rw---- 1 root audio 116, 33 Mär 6 09:26 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.4-0ubuntu2
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Mar 6 10:25:00 2017
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 001 Device 003: ID 0627:0001 Adomax Technology Co., Ltd
Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
PciMultimedia:
ProcFB: 0 OFfb vga
ProcKernelCmdLine: BOOT_IMAGE=
ProcLoadAvg: 0.00 0.00 0.00 1/129 24971
ProcLocks:
1: POSIX ADVISORY WRITE 1431 00:13:16508 0 EOF
2: FLOCK ADVISORY WRITE 1423 00:13:16392 0 EOF
3: POSIX ADVISORY WRITE 1422 00:13:16194 0 EOF
4: POSIX ADVISORY WRITE 1387 00:13:15603 0 EOF
5: POSIX ADVISORY WRITE 398 00:13:11036 0 EOF
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.10.0-9-generic (buildd@
RelatedPackageV
linux-
linux-
linux-firmware 1.163
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
cpu_cores: Number of cores present = 1
cpu_coreson: Number of cores online = 1
cpu_smt: Error: command ['ppc64_cpu', '--smt'] failed with exit code 255: Machine is not SMT capable
I know you often need to see if/where things break/fix so I ran a bunch of mainlines (and releases) against the quick test:
good: 4.11.0- 041100rc1- generic 041001- generic 041000- generic generic
bad: 4.10.1-
bad: 4.10.0-9-generic
bad: 4.10.0-
good: 4.9.0-040900-
good: 4.4.0-64-generic
Also I wanted to stress that outside of the actual test what falls out of dmesg really is
"Failed to allocate transformation for 'xts(aes)': -2"
(that might have been lost in the stack of logs before)
IMHO that is what we should search for fixes, upstream, ... especially to differ from the bug in the 4.4 time.