[UBUNTU 20.04] Include patches to avoid self-detected stall with Secure Execution

Bug #1979296 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
High
Canonical Kernel Team
Jammy
Fix Released
High
Canonical Kernel Team

Bug Description

SRU Justification:
==================

[Impact]

 * On IBM Z secure execution environments under heavy load
   (means with over-committed resources - KVM guests)
   rcu_sched self-detected stalls can occur,
   which lead to LPAR crashes.

[Fix]

 * 57c5df13eca4 57c5df13eca4017ed28f9375dc1d246ec0f54217 "KVM: s390: pv: add macros for UVC CC values"

 * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm"

 * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390: pv: avoid stalls when making pages secure"

[Test Plan]

 * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution)
   enabled is required.

 * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4)
   or 22.04 LTS on top.

 * Install a kernel that incl. the above two patches/commits

 * Bring the system under high load with KVM guests.

 * Monitor dmesg for 'rcu_sched self-detected stalls'
   and/or look for crashes.

 * Due to hardware requirements this test needs to be conducted by IBM.

[Where problems could occur]

 * The definition from 57c5df13eca4 are missing in both jammy
   and focal, but shouldn't harm.

 * The change in 1e2aa46de526 only uses uv_call_sched instead
   of just uv_call, which should lead to a snappier system
   under high load, but may consume overall some more cycles.

 * With f0a1a0615a6f the uv_call_sched cannot simply replace
   uv_call, due to locks being held.

 * Instead __uv_call is replacing uv_call, which does not loop.

 * But due to these changes of the (uv) calls,
   - in case erroneous - they may lead to wrong states,
   and even broken ultravisor calls
   and with that broken secure execution (SE).

 * As a side effect the uv might no longer loop over all pages,
   and in worst case leaving some unprotected.

 * All this is s390x-only functionality,
   that is only available on IBM z15 / LinuxONE III systems and newer,
   and only is the optional feature 'FC 115' in place,
   which is limited to 'secure-execution' workloads.

[Other Info]

 * Patches are upstream accepted with kernel 5.16.

 * Commit 1e2aa46de526 is already included in jammy
   but 57c5df13eca4 and f0a1a0615a6f are missing.

 * Focal requires all 3 commits 57c5df13eca4, 1e2aa46de526 and f0a1a0615a6f.

 * Since impish is very close to it's EOL, it's not covered by this SRU.
__________

---Problem Description---
rcu_sched self-detected stall with Secure Execution

When the system is busy and additional Secure Execution guests are started, the LPAR crashes.
Christian Borntraeger looked at the stack trace and identified two commits which should fix the issue:

1e2aa46de526a5adafe580bca4c25856bb06f09e
and
f0a1a0615a6ff6d38af2c65a522698fb4bb85df6

Please include these two fixes into 20.04, and 18.04 HWE.

Here the stack trace:

[592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU
[592792.725089] rcu: 4-....: (2099 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984
[592792.725133] (t=2100 jiffies g=26268505 q=410280)
[592792.725135] Task dump for CPU 4:
[592792.725137] qemu-system-s39 R running task 0 2557923 1644255 0x06000004
[592792.725139] Call Trace:
[592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0)
[592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100
[592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100
[592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980
[592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80
[592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70
[592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0
[592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0
[592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0
[592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0
[592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134
[592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340
[592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340)
[592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0
[592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580
[592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0
[592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760
[592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690
[592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0
[592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40
[592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8

Contact Information = <email address hidden>, <email address hidden>

---uname output---
5.4.0-90-generic #101-Ubuntu

Machine Type = 8562 A00-GT2

---System Hang---
 LPAR crashed and needed to be re-booted

---Debugger---
 A debugger is not configured

---Steps to Reproduce---
 Cause high load. Then start Secure Execution enabled KVM guest

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-198658 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Undecided → High
Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Even if the two commits applied cleanly (well for jammy only one of them is needed, see bug description for more details), I get compile errors, like:

On 22.04:
"
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c: In function ‘make_secure_pte’:
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:198:19: error: ‘UVC_CC_OK’ undeclared (first use in this function)
  198 | if (cc == UVC_CC_OK)
      | ^~~~~~~~~
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:198:19: note: each undeclared identifier is reported only once for each function it appears in
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:200:24: error: ‘UVC_CC_BUSY’ undeclared (first use in this function); did you mean ‘SIGP_CC_BUSY’?
  200 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
      | ^~~~~~~~~~~
      | SIGP_CC_BUSY
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:200:45: error: ‘UVC_CC_PARTIAL’ undeclared (first use in this function)
  200 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
      | ^~~~~~~~~~~~~~
make[4]: *** [/<<PKGBUILDDIR>>/scripts/Makefile.build:285: arch/s390/kernel/uv.o] Error 1
make[3]: *** [/<<PKGBUILDDIR>>/scripts/Makefile.build:548: arch/s390/kernel] Error 2
make[2]: *** [/<<PKGBUILDDIR>>/Makefile:1875: arch/s390] Error 2
make[2]: *** Waiting for unfinished jobs....
"

Similar on 20.04:
"
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c: In function ‘make_secure_pte’:
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:195:12: error: ‘UVC_CC_OK’ undeclared (first use in this function)
  195 | if (cc == UVC_CC_OK)
      | ^~~~~~~~~
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:195:12: note: each undeclared identifier is reported only once for each function it appears in
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:197:17: error: ‘UVC_CC_BUSY’ undeclared (first use in this function); did you mean ‘SIGP_CC_BUSY’?
  197 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
      | ^~~~~~~~~~~
      | SIGP_CC_BUSY
/<<PKGBUILDDIR>>/arch/s390/kernel/uv.c:197:38: error: ‘UVC_CC_PARTIAL’ undeclared (first use in this function)
  197 | else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
      | ^~~~~~~~~~~~~~
make[4]: *** [/<<PKGBUILDDIR>>/scripts/Makefile.build:270: arch/s390/kernel/uv.o] Error 1
make[3]: *** [/<<PKGBUILDDIR>>/scripts/Makefile.build:519: arch/s390/kernel] Error 2
make[2]: *** [/<<PKGBUILDDIR>>/Makefile:1762: arch/s390] Error 2
make[2]: *** Waiting for unfinished jobs....
"

I assume that maybe a pre-required commit (or more is missing)?

Changed in ubuntu-z-systems:
status: New → Incomplete
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-06-21 11:45 EDT-------
> I assume that maybe a pre-required commit (or more is missing)?

Right, those defines were added with

commit 57c5df13eca4017ed28f9375dc1d246ec0f54217
Author: Claudio Imbrenda <email address hidden>
AuthorDate: Mon Sep 20 15:24:49 2021 +0200
Commit: Christian Borntraeger <email address hidden>
CommitDate: Mon Oct 25 09:20:38 2021 +0200

KVM: s390: pv: add macros for UVC CC values

Would you prefer to pick this commit too or do you want backports?

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :

Looks like there was a little overlap in the last two posts...

If they can be cleanly cherry-picked, than it's all good (and I could).

description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Test kernels (jammy 5.15, focal 5.4 and bionic hwe-5.4) are currently build in PPA:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1979296
https://launchpad.net/~fheimes/+archive/ubuntu/lp1979296/+packages

Revision history for this message
Frank Heimes (fheimes) wrote :

SRU request submitted to the Ubuntu kernel team mailing list for jammy and focal.
https://lists.ubuntu.com/archives/kernel-team/2022-June/thread.html#131188
Changing status to 'In Progress' for jammy and focal.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Incomplete → In Progress
Changed in linux (Ubuntu Jammy):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Focal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
assignee: Frank Heimes (fheimes) → nobody
importance: High → Undecided
Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-123.139 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-13 11:50 EDT-------
Gave it some testing with secure execution guests with 1GB ramdisks that needed to be unpacked during boot and started 100 of them at the same time. No rcu stall.

Looks good.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Frank Heimes (fheimes) wrote :

Glad to hear/read that - thx, for the verification!

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-43.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-15 10:04 EDT-------
(In reply to comment #15)
> This bug is awaiting verification that the linux/5.15.0-43.46 kernel in
> -proposed solves the problem. Please test the kernel and update this bug
> with the results. If the problem is solved, change the tag
> 'verification-needed-jammy' to 'verification-done-jammy'. If the problem
> still exists, change the tag 'verification-needed-jammy' to
> 'verification-failed-jammy'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

Same testcase as on focal successful also on jammy.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package linux - 5.15.0-43.46

---------------
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
    (LP: #1896350)
    - nbd: don't handle response without a corresponding request message
    - nbd: make sure request completion won't concurrent
    - nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
    - nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
    events (LP: #1965241)
    - PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
    - PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
    - [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
    - TP 8010 (LP: #1948626)
    - nvme: add CNTRLTYPE definitions for 'identify controller'
    - nvme: send uevent on connection up
    - nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
    (LP: #1980951)
    - s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
    converged OVS bridge for control and data planes (LP: #1978820)
    - net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
    - ASoC: amd: add Yellow Carp ACP6x IP register header
    - ASoC: amd: add Yellow Carp ACP PCI driver
    - ASoC: amd: add acp6x init/de-init functions
    - ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
    - ASoC: amd: add acp6x pdm platform driver
    - ASoC: amd: add acp6x irq handler
    - ASoC: amd: add acp6x pdm driver dma ops
    - ASoC: amd: add acp6x pci driver pm ops
    - ASoC: amd: add acp6x pdm driver pm ops
    - ASoC: amd: enable Yellow carp acp6x drivers build
    - ASoC: amd: create platform device for acp6x machine driver
    - ASoC: amd: add YC machine driver using dmic
    - ASoC: amd: enable Yellow Carp platform machine driver build
    - ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
    - [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
    (LP: #1959973)
    - drivers/s390/char: Add Ultravisor io device
    - s390/uv_uapi: depend on CONFIG_S390
    - [Co...

Read more...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.9 KiB)

This bug was fixed in the package linux - 5.4.0-124.140

---------------
linux (5.4.0-124.140) focal; urgency=medium

  * CVE-2022-2586
    - SAUCE: netfilter: nf_tables: do not allow SET_ID to refer to another table
    - SAUCE: netfilter: nf_tables: do not allow RULE_ID to refer to another chain

  * CVE-2022-2588
    - SAUCE: net_sched: cls_route: remove from list when handle is 0

  * CVE-2022-34918
    - netfilter: nf_tables: stricter validation of element data

linux (5.4.0-123.139) focal; urgency=medium

  * focal/linux: 5.4.0-123.139 -proposed tracker (LP: #1981284)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure
    - KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm

  * Focal update: v5.4.195 upstream stable release (LP: #1980407)
    - batman-adv: Don't skb_split skbuffs with frag_list
    - hwmon: (tmp401) Add OF device ID table
    - mac80211: Reset MBSSID parameters upon connection
    - net: Fix features skip in for_each_netdev_feature()
    - ipv4: drop dst in multicast routing path
    - drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()
    - netlink: do not reset transport header in netlink_recvmsg()
    - mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
    - dim: initialize all struct fields
    - hwmon: (ltq-cputemp) restrict it to SOC_XWAY
    - s390/ctcm: fix variable dereferenced before check
    - s390/ctcm: fix potential memory leak
    - s390/lcs: fix variable dereferenced before check
    - net/sched: act_pedit: really ensure the skb is writable
    - net/smc: non blocking recvmsg() return -EAGAIN when no data and
      signal_pending
    - net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
    - gfs2: Fix filesystem block deallocation for short writes
    - hwmon: (f71882fg) Fix negative temperature
    - ASoC: max98090: Reject invalid values in custom control put()
    - ASoC: max98090: Generate notifications on changes for custom control
    - ASoC: ops: Validate input values in snd_soc_put_volsw_range()
    - s390: disable -Warray-bounds
    - net: emaclite: Don't advertise 1000BASE-T and do auto negotiation
    - tcp: resalt the secret every 10 seconds
    - tty: n_gsm: fix mux activation issues in gsm_config()
    - usb: cdc-wdm: fix reading stuck on device close
    - usb: typec: tcpci: Don't skip cleanup in .remove() on error
    - USB: serial: pl2303: add device id for HP LM930 Display
    - USB: serial: qcserial: add support for Sierra Wireless EM7590
    - USB: serial: option: add Fibocom L610 modem
    - USB: serial: option: add Fibocom MA510 modem
    - slimbus: qcom: Fix IRQ check in qcom_slim_probe
    - serial: 8250_mtk: Fix UART_EFR register address
    - serial: 8250_mtk: Fix register address for XON/XOFF character
    - dr...

Read more...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.