[UBUNTU 20.10] Applications runing in QEMU/KVM get translation faults

Bug #1906255 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Critical
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Medium
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:
==================

[Impact]

* The commit 0b0ed657fe "s390: remove critical section cleanup from entry.S" was introduced in kernel 5.8, but instigated a problem where FPU registers were not properly restored when entering the SIE (start interpretive execution) instruction.

* This leads to crashes of applications runnning inside KVM, as most of the programs in use nowdays are using FPU registers for backing of general register content.

* To fix this interrupts in load_fpu_regs() need to be disabled - otherwise an interrupt might come in after the registers are loaded, but before CIF_FPU is cleared in load_fpu_regs().

* When the interrupt returns, CIF_FPU will be cleared and the registers will never be restored.

[Fix]

* 1179f170b6f0af7bb0b3b7628136eaac450ddf31 1179f170b6f0 "s390: fix fpu restore in entry.S"

[Test Case]

* IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.

* A KVM host needs to be setup as well as an KVM guest (use again 20.10).

* Run (ideally context switching) workload that makes use of FP instructions inside of the KVM guest.

* Monitor the health of the guest for crashes (logs).

[Regression Potential]

* Even if the code changes are quite overseeable, there is still a certain risk for regression, because:

* the modifications affect a critical part of the kernel (arch/s390/kernel/entry.S)

* affect the handling of the FPU registers

* and are always in use if KVM guests run

* So in worst case the changes may have an even bigger impact on FPU workloads in KVM guests

* and may not only crash in case of FPU usage, but also KVM in general.

* But the code is peurly s390x specific, hence affects IBM Z only,

* and it got already upstream accepted with v5.10-rc6

* and a test kernel (based on groovy master-next) was build for further testing.

[Other]

* The patch got upstream accepted with kernel v5.10-rc6, hence it will land sooner or later in Hirsute.

* It was initially planned to address groovy via 5.8 upstream stable update, and in fact the patch was already marked for this, but it didn't made it because 5.8 already reached it's EOL.

* Hence this SRU is submitted for groovy only.

__________

commit 0b0ed657fe ("s390: remove critical section cleanup from entry.S") introduced a problem where FPU registers were not properly restored when entering SIE. This leads to crashes of applications runnning inside kvm, as most of the programs in use nowdays are using FPU registers for backing of general register content.

Fix is upstream:
author Sven Schnelle <email address hidden> 2020-11-20 14:17:52 +0100
committer Heiko Carstens <email address hidden> 2020-11-23 11:52:13 +0100
commit 1179f170b6f0af7bb0b3b7628136eaac450ddf31 (patch)
tree 19e8acb64e0968b41de4899cc1315c41b002839e /arch/s390/kernel/entry.S
parent 78d732e1f326f74f240d416af9484928303d9951 (diff)
download linux-1179f170b6f0af7bb0b3b7628136eaac450ddf31.tar.gz
s390: fix fpu restore in entry.S
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.

The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.

Cc: <email address hidden> # 5.8
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <email address hidden>
Signed-off-by: Sven Schnelle <email address hidden>
Acked-by: Christian Borntraeger <email address hidden>
Reviewed-by: Heiko Carstens <email address hidden>
Signed-off-by: Heiko Carstens <email address hidden>

CVE References

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-189961 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

Since "s390: fix fpu restore in entry.S" is needed to fix "s390: remove critical section cleanup from entry.S" and "s390: remove critical section cleanup from entry.S" got introduced with kernel 5.8, this affects kernel 5.8 only - and with that only groovy and later.

And since "s390: fix fpu restore in entry.S" landed upstream in linux-next (with 'next-20201124' and '5.10-rc5) and got also tagged for 5.8 stable ("Cc: <email address hidden> # 5.8"),
this ticket is just a tracker to make sure the fix is really picked up by the kernel team
with a future LP bug like "Groovy update: v5.8.? upstream stable release".

Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Groovy):
assignee: nobody → Frank Heimes (fheimes)
Changed in linux (Ubuntu):
assignee: Frank Heimes (fheimes) → nobody
Revision history for this message
Frank Heimes (fheimes) wrote :

A kernel test build (based on groovy master-next) is available here:
https://people.canonical.com/~fheimes/lp1906255

Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted for groovy:
https://lists.ubuntu.com/archives/kernel-team/2020-December/thread.html#115231
changing status to 'In Progress'.

description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Groovy):
importance: Undecided → Medium
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Groovy):
assignee: Frank Heimes (fheimes) → nobody
Ian May (ian-may)
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Frank Heimes (fheimes) wrote :

So far I didn't faced any of such translation faults (crashes in the logs) on a groovy KVM/host/guest combo. But not sure if I'm having proper context switching workload running ...

Revision history for this message
Frank Heimes (fheimes) wrote :

Since 5.10 landed in hirsute-proposed, I'm updating the hirsute status to Fix Committed.

Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

Since it runs stable for some days now, I'm setting tag verification-groovy-done.

tags: added: verification-done-groovy
removed: verification-needed-groovy
bugproxy (bugproxy)
tags: added: targetmilestone-inin2010
removed: targetmilestone-inin---
Revision history for this message
Frank Heimes (fheimes) wrote :

Now that kernel 5.10 landed in hirsute's release pocket:
linux-generic | 5.10.0.14.16 | hirsute
the 'hirsute' part can be updated to 'Fix Released".

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

For groovy the patch is in Ubuntu-5.8.0-44.50

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (129.8 KiB)

This bug was fixed in the package linux - 5.8.0-44.50

---------------
linux (5.8.0-44.50) groovy; urgency=medium

  * groovy/linux: 5.8.0-44.50 -proposed tracker (LP: #1914805)

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - Revert "UBUNTU: SAUCE: e1000e: bump up timeout to wait when ME un-configure
      ULP mode"
    - e1000e: Only run S0ix flows if shutdown succeeded
    - Revert "e1000e: disable s0ix entry and exit flows for ME systems"
    - e1000e: Export S0ix flags to ethtool

  * suspend only works once on ThinkPad X1 Carbon gen 7 (LP: #1865570) //
    [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - e1000e: bump up timeout to wait when ME un-configures ULP mode

  * Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to
    IDENTIFY (I/O error, err_mask=0x4) (LP: #1894778)
    - PCI: vmd: Offset Client VMD MSI-X vectors

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * [UBUNTU 21.04] vfio: pass DMA availability information to userspace
    (LP: #1907421)
    - vfio/type1: Refactor vfio_iommu_type1_ioctl()
    - vfio iommu: Add dma available capability

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron /...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-02-24 05:16 EDT-------
IBM Bugzilla status->closed, Fix Released for all requested distros

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.