aes-xts and aes-cbc failing to initialize on power since 4.10

Bug #1670311 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Critical
Unassigned

Bug Description

Hi,
I got to this having a failing systemd testcase for my qemu upload but we tracked that to a regression with the 4.10 upload IMHO.

TL;DR:
- xts-aes and cbc-aes fail to initialize on power (other arch fine) since kernel 4.10
  socket(AF_ALG, SOCK_SEQPACKET, 0) = 5
  bind(5, {sa_family=AF_ALG, sa_data="skcipher\0\0\0\0\0\0"}, 88) = -1 ENOENT (No such file or directory)

Details:

Failing "storage" subtest of systemd
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/ppc64el/s/systemd/20170306_005811_4a060@/log.gz

test_luks_by_devname (__main__.CryptsetupTest)
LUKS device by plain device name, empty ... device-mapper: reload ioctl on temporary-cryptsetup-4601 failed: No such file or directory
Failed to setup dm-crypt key mapping for device /dev/sda.
Check that kernel supports aes-xts-plain64 cipher (check syslog for more info).
Failed to stop <email address hidden>: Unit <email address hidden> not loaded.
FAIL
[...]

Studying the history of that test shows that it failed rather often with the same error:
http://autopkgtest.ubuntu.com/packages/systemd/zesty/ppc64el

I discussed with pitti as systemd and autopkgtest expert:
[10:14] <cpaelzer> pitti: hi, I see in the autopkgtest history of systemd that the "storage" test failed rather often on ppc64el
[10:15] <cpaelzer> pitti: I' currently ran into the same as well, since it failed/worked so often without any visible difference - are these just "retried until passing"
[10:15] <cpaelzer> pitti: or is there some background to learn?
[10:15] --> joedborg (~joedborg@2a02:c7f:9638:ab00:438:8be:af59:58e9) has joined this channel.
[10:15] <cpaelzer> FYI - issues like in https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/ppc64el/s/systemd/20170306_005811_4a060@/log.gz
[10:16] <pitti> cpaelzer: indeed, "reload ioctl on temporary-cryptsetup-5254 failed" -- that ioctl seems to be unreliable in cryptsetup somehow
[10:16] <pitti> that should be easy enough to reproduce outside the systemd test suite too, for a kernel bug report
[10:17] <pitti> https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/tree/debian/tests/storage#n74 is the cryptsetup luksFormat call
[10:17] <cpaelzer> pitti: ok, so in proposed retrying til pass and in a test env trying to debug and understand
[10:17] <cpaelzer> pitti: thanks
[10:18] <cpaelzer> depending which of both is faster the debugging might resolve the retrying :-)
[10:18] <pitti> the test uses scsi_debug as a drive, I suggest to try with both that and a simple loop device to see whether that makes a difference
[10:18] <cpaelzer> thank you, I will do so
[10:19] <pitti> cpaelzer: y and x seem fine, so might be a regression of linux 4.11?
[10:20] <pitti> actually it started failing with trigger linux-meta/4.10.0.8.10
[10:20] <pitti> https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/ppc64el/s/systemd/20170217_024850_73931@/log.gz was the last successful one, with autopkgtest [01:48:28]: testbed running kernel: Linux 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC 2017
[10:20] <cpaelzer> yeah I see that this was the "date" it started
[10:20] <pitti> i. e. 4.9 → 4.10
[10:20] <cpaelzer> former fails had other errors

So I went down debugging this:

1. Zesty guest on ppc64el (isolation-machine, so KVM)
2. repro as-is in this guest
   pull-lp-source systemd
   cd debian/tests
   sudo ./storage (needs-roos, so sudo)
   => reproducable
3. debug
   Dmesg shows related errors
[ 712.187600] Failed to allocate transformation for 'xts(aes)': -2
[ 712.249614] Failed to allocate transformation for 'xts(aes)': -2
[ 712.250818] device-mapper: table: 253:0: crypt: Error allocating crypto tfm
[ 712.251990] device-mapper: ioctl: error adding target to table
4. simplify repro
   modprobe scsi_debug
   # in my case /dev/sda
   echo fookey | sudo cryptsetup --batch-mode luksFormat --key-file - /dev/sda

Here we can also confirm that it is not the devtype - e.g. also loop devices fail.

I checked the kernel, but the config has set the algorithm:
$ cat /boot/config-4.10.0-9-generic | grep CRYPTO_XTS
CONFIG_CRYPTO_XTS=y

There are similar older bugs - but they are only "similar" both are fixed in our kernel and cryptsetup versions:
kernel: https://bugzilla.kernel.org/show_bug.cgi?id=112631
cryptsetup: https://gitlab.com/cryptsetup/cryptsetup/issues/284

Fortunately cryptsetup can check all kind of algorithms easily:
$ cryptsetup benchmark
[...]
# Algorithm | Key | Encryption | Decryption
     aes-cbc 128b N/A N/A
 serpent-cbc 128b 58,3 MiB/s 68,2 MiB/s
 twofish-cbc 128b 118,9 MiB/s 134,6 MiB/s
     aes-cbc 256b N/A N/A
 serpent-cbc 256b 58,8 MiB/s 68,2 MiB/s
 twofish-cbc 256b 119,0 MiB/s 134,4 MiB/s
     aes-xts 256b N/A N/A
 serpent-xts 256b 62,0 MiB/s 68,1 MiB/s
 twofish-xts 256b 127,8 MiB/s 132,6 MiB/s
     aes-xts 512b N/A N/A
 serpent-xts 512b 62,2 MiB/s 68,1 MiB/s
 twofish-xts 512b 128,2 MiB/s 132,5 MiB/s

So aes-xts (the default) is not avail?
Really?

BTW - This at least helps to further simplify the test (no need for a device at all)
$ cryptsetup benchmark --cipher aes-xts

But in /proc/crypto I can see:
name : xts(aes)
driver : p8_aes_xts
module : vmx_crypto
priority : 2000
refcnt : 1
selftest : passed
internal : no
type : blkcipher
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
geniv : <default>

The test is not failing on x86, check there:
=> Working just fine on x86

$ cat /proc/crypto (from x86)
[...]
name : xts(aes)
driver : xts(ecb(aes-asm))
module : kernel
priority : 200
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 16
min keysize : 32
max keysize : 64
ivsize : 16
chunksize : 16

On the power system the listed module "vmx_crypto" is loaded.
The related output in /proc/crypto did not change since xenial.

Stracing to confirm it is not the old accept before bind issue:

Call (xenial):
socket(PF_ALG, SOCK_SEQPACKET, 0) = 5
bind(5, {sa_family=AF_ALG, sa_data="skcipher\0\0\0\0\0\0"}, 88) = 0

Bad (Zesty):
socket(AF_ALG, SOCK_SEQPACKET, 0) = 5
bind(5, {sa_family=AF_ALG, sa_data="skcipher\0\0\0\0\0\0"}, 88) = -1 ENOENT (No such file or directory)

The old kernel bug was about "bind" needed to be (setting key types) before accept.
The new crypsetup does that, so it is NOT the old issue (as we could assume since it was fixed in 4.4 and various stable updates).
Yet it is very close - now the bind itself fails setting the xts-aes type (and cbc-aes fails too btw).

=> kernel update regression from the move to 4.10

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-9-generic 4.10.0-9.11
ProcVersionSignature: User Name 4.10.0-9.11-generic 4.10.0
Uname: Linux 4.10.0-9-generic ppc64le
.var.log.platform: Error: [Errno 13] Permission denied: '/var/log/platform'
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mär 6 09:26 seq
 crw-rw---- 1 root audio 116, 33 Mär 6 09:26 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.4-0ubuntu2
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Mar 6 10:25:00 2017
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 003: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
PciMultimedia:

ProcFB: 0 OFfb vga
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-4.10.0-9-generic root=LABEL=cloudimg-rootfs earlyprintk
ProcLoadAvg: 0.00 0.00 0.00 1/129 24971
ProcLocks:
 1: POSIX ADVISORY WRITE 1431 00:13:16508 0 EOF
 2: FLOCK ADVISORY WRITE 1423 00:13:16392 0 EOF
 3: POSIX ADVISORY WRITE 1422 00:13:16194 0 EOF
 4: POSIX ADVISORY WRITE 1387 00:13:15603 0 EOF
 5: POSIX ADVISORY WRITE 398 00:13:11036 0 EOF
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.10.0-9-generic (buildd@bos01-ppc64el-025) (gcc version 6.3.0 20161229 (User Name 6.3.0-2ubuntu1) ) #11-User Name SMP Mon Feb 20 13:45:11 UTC 2017
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-9-generic N/A
 linux-backports-modules-4.10.0-9-generic N/A
 linux-firmware 1.163
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
cpu_cores: Number of cores present = 1
cpu_coreson: Number of cores online = 1
cpu_smt: Error: command ['ppc64_cpu', '--smt'] failed with exit code 255: Machine is not SMT capable

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I know you often need to see if/where things break/fix so I ran a bunch of mainlines (and releases) against the quick test:

good: 4.11.0-041100rc1-generic
bad: 4.10.1-041001-generic
bad: 4.10.0-9-generic
bad: 4.10.0-041000-generic
good: 4.9.0-040900-generic
good: 4.4.0-64-generic

Also I wanted to stress that outside of the actual test what falls out of dmesg really is
"Failed to allocate transformation for 'xts(aes)': -2"
(that might have been lost in the stack of logs before)
IMHO that is what we should search for fixes, upstream, ... especially to differ from the bug in the 4.4 time.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I think this is the fix upstream:
https://github.com/torvalds/linux/commit/33a8b3e99dc68abfe25e140144ad268e70660be3

FYI - This is in 4.11-rc1 but not in 4.10.

Aside of the actual issue (cryptsetup unusable) this also makes all packages that trigger cryptsetup based tests fail on ppc64el so there might be some noise in proposed (e.g. any systemd test as this debugging started with).

Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I think this is crit, but please do not be offended and feel free to re-assess matching your rules.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I just was able to check that at least on s390 this is working (as it was listed as maybe also affected).

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Christian - I think this is a duplicate of bug #1666483

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tim - I agree, feel free to close as dup if you want - I can comment and repro in the dupped bug then once something to test is available.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.