[22.04 FEAT] Enhanced Interpretation for PCI Functions on s390x - kernel part

Bug #1853306 reported by bugproxy
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Medium
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Medium
Canonical Kernel Team
Jammy
Fix Released
Undecided
Canonical Kernel Team
Kinetic
Won't Fix
Undecided
Unassigned
Lunar
Fix Released
Undecided
Unassigned
Mantic
Fix Released
Medium
Canonical Kernel Team

Bug Description

[ Impact ]

 * Currently the PCI passthrough implementation for s390x is based on
   intercepting PCI I/O instructions, which leads to a reduced I/O performance
   compared to the execution of PCI instructions directly in LPAR.

 * Hence users may face I/O bottlenecks when using PCI devices in passthrough
   mode based on the current implementation.

 * For avoiding this and to improve performance, the interpretive execution
   of the PCI store and PCI load instructions get enabled.

 * A further improvement is achieved by enabling the Adapter-Event-Notification
   Interpretation (AENI).

 * Since LTS releases are the main focus for stable and long running KVM
   workloads, it is highly desired to get this backported to the jammy kernel
   (and because the next LTS is still some time away).

[ Test Plan ]

* Hardware used: z14 or greater LPAR, PCI-attached devices
  (RoCE VFs, ISM devices, NVMe drive)

* Setup: Both the kernel and QEMU features are needed for the feature
  to function (an upstream QEMU can be used to verify the kernel early),
  and the facility is only avaialble on z14 or newer.
  When any of those pieces is missing,
  the interpretation facility will not be used.
  When both the kernel and QEMU features are included in their respective
  packages, and running in an LPAR on a z14 or newer machine,
  this feature will be enabled automatically.
  Existing supported devices should behave as before with no changes
  required by an end-user (e.g. no changes to libvirt domain definitions)
  -- but will now make use of the interpretation facility.
  Additionally, ISM devices will now be eligible for vfio-pci passthrough
  (where before QEMU would exit on error if attempting to provide an ISM
  device for vfio-pci passthrough, preventing the guest from starting)

* Testing will include the following scenarios, repeated each for RoCE,
  ISM and NVMe:

  1) Testing of basic device passthrough (create a VM with a vfio-pci
     device as part of the libvirt domain definition, passing through
     a RoCE VF, an ISM device, or an NVMe drive. Verify that the device
     is available in the guest and functioning)
  2) Testing of device hotplug/unplug (create a VM with a vfio-pci device,
     virsh detach-device to remove the device from the running guest,
     verify the device is removed from the guest, then virsh attach-device
     to hotplug the device to the guest again, verify the device functions
     in the guest)
  3) Host power off testing: Power off the device from the host, verify
     that the device is unplugged from the guest as part of the poweroff
  4) Guest power off testing: Power off the device from within the guest,
     verify that the device is unusuable in the guest,
     power the device back on within the guest and verify that the device
     is once again usable.
  5) Guest reboot testing: (create a VM with a vfio-pci device,
     verify the device is in working condition, reboot the guest,
     verify that the device is still usable after reboot)

Testing will include the following scenarios specifically for ISM devices:

1) Testing of SMC-D v1 fallback: Using 2 ISM devices on the same VCHID
   that share a PNETID, create 2 guests and pass one ISM device
   via vfio-pci device to each guest.
   Establish TCP connectivity between the 2 guests using the libvirt
   default network, and then use smc_run
   (https://manpages.ubuntu.com/manpages/jammy/man8/smc_run.8.html)
   to run an iperf workload between the 2 guests (will include both
   short workloads and longer-running workloads).
   Verify that SMC-D transfer was used between the guests instead
   of TCP via 'smcd stats'
   (https://manpages.ubuntu.com/manpages/jammy/man8/smcd.8.html)

2) Testing of SMC-D v2: Same as above,
   but using 2 ISM devices on the same VCHID that have no PNETID specified

Testing will include the following scenarios specifically for RoCE devices:

1) Ping testing: Using 2 RoCE VFs that share a common network,
   create 2 guests and pass one RoCE device to each guest.
   Assign IP addresses within each guest to the associated TCP interface,
   perform a ping between the guests to verify connectivity.

2) Iperf testing: Similar to the above, but instead establish an iperf
   connection between the 2 guests and verify that the workload
   is successful / no errors.
   Will include both short workloads and longer-running workloads.

Testing will include the following scenario specifically for NVMe devices:

1) Fio testing: Using a NVMe drive passed to the guest via vfio-pci,
   run a series of fio tests against the device from within the guest,
   verifying that the workload is successful / no errors.
    Will include both short workloads and longer-running workloads.

[ Where problems could occur ]

 * The modifications do not change the way users or APIs have to make
   use of PCI passthrough, only the internal implementation got modified.

 * The vast majority of the code changes/or additional code is s390x-specific,
   under arch/s390 and drivers/s390.

 * However there is also common code touched:

 * 'kvm: use kvfree() in kvm_arch_free_vm()' touches
   arch/arm64/include/asm/kvm_host.h, arch/arm64/kvm/arm.c,
   arch/x86/include/asm/kvm_host.h, arch/x86/kvm/x86.c,
   include/linux/kvm_host.h switches in kvm_arch_free_vm() from kfree() to
   kvfree() allowing to use the common variant, which is upstream since v5.16
   and with that well established.

 * And 'vfio-pci/zdev: add open/close device hooks' touches
   drivers/vfio/pci/vfio_pci_core.c and drivers/vfio/pci/vfio_pci_zdev.c
   include/linux/vfio_pci_core.h add now code to introduce device hooks.
   It's upstream since kernel 6.0.

 * 'KVM: s390: pci: provide routines for en-/disabling interrupt forwarding'
   expands a single #if statement in include/linux/sched/user.h.

 * 'KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices'
   adds s390x specific KVM_S390_ZPCI_OP and it's definition to
   include/uapi/linux/kvm.h.

 * And 'vfio-pci/zdev: different maxstbl for interpreted devices' and
   'vfio-pci/zdev: add function handle to clp base capability' expand
   s390x-specific (aka z-specific aka zdev) device structs in
   include/uapi/linux/vfio_zdev.h.

 * This shows that the vast majority of modifications are s390x specific,
   even in most of the common code files.

 * The remaining modifications in the (generally) common code files are
   related to the newly introduced kernel option 'CONFIG_VFIO_PCI_ZDEV_KVM'
   and documentation.

 * The s390x changes are more significant, and could not only harm
   passthrough itself for zPCI devices, but also KVM virtualization in general.

 * In addition to these kernel changes, qemu modifications are needed
   as well (that are addressed at LP#1853307), this modified kernel
   must be tested in combination with the updated qemu package.
   - The qemu autopkgtest will be a got fit to identify any regressions,
   also in the kernel.
   - In addition some passthrough related test will be done by IBM

__________

The PCI Passthrough implementation is based on intercepting PCI I/O instructions which leads to a reduced I/O performance compared to execution of PCI instructions in LPAR.
For improved performance the interpretive execution of the PCI store and PCI load instructions get enabled.
Further improvement is achieved by enabling the Adapter-Event-Notification

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-182254 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

Please specify the planned target kernel this is going to become upstream accepted.
Changing to Incomplete for now.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in ubuntu-z-systems:
status: New → Incomplete
importance: Undecided → Medium
assignee: nobody → Frank Heimes (frank-heimes)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-11-22 08:07 EDT-------
Planned Target : kernel 5.5

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [20.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part

But the planned kernel for 20.04 LTS is 5.4

Revision history for this message
Frank Heimes (fheimes) wrote :

Approach is to get it (commit ID) as early as possible,
and depending on when that actually will be, triaging if it can still land in 20.04 GA, SRU (post GA) or HWE.
But it all depends on the availability...

summary: - [20.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part
+ [20.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-01-28 04:11 EDT-------
Will not make it for 20.04 -> new Target 20.10

tags: added: targetmilestone-inin2010
removed: targetmilestone-inin2004
summary: - [20.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part
+ [21.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-27 02:25 EDT-------
Feature request moved to 21.04. Will not make it in time for 20.10

tags: added: targetmilestone-inin2104
removed: targetmilestone-inin2010
Revision history for this message
Frank Heimes (fheimes) wrote : Re: [21.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part

expected with kernel >= 5.12

summary: - [21.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part
+ [21.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-02-23 03:52 EDT-------
Feature will not make it into 21.04, Moved to 21.10

tags: added: targetmilestone-inin2110
removed: targetmilestone-inin2104
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-09-01 09:11 EDT-------
Feature will not make it into Impish / 21.10, hence moving to 22.04
Changing IBM Bugzilla Target Milestone: 21.10->22.04

tags: added: targetmilestone-inin2204
removed: targetmilestone-inin2110
Frank Heimes (fheimes)
summary: - [21.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part
+ [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-06-13 10:53 EDT-------
Hello Matt, thanks for pointing to your latest upstream version of this item (kernel part) which can be found here:
https://<email address hidden>/

Revision history for this message
Frank Heimes (fheimes) wrote (last edit ): Re: [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part

Okay, looks we are getting closer on this.
(I don't see it yet in linux-next)

Since this is a pretty big patch set (21 commits) may I ask if it's planned to get it upstream into 5.19 still?
That would be great and actually important, since 5.19 is the planned target kernel for Ubuntu 22.10/kinetic.

(And without having looked at the details, it seems that there are common code changes, which /can/ be difficult to get in otherwise ...)

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-09-01 08:26 EDT-------
Thanks to Matt, here are all (except 1 still missing) relevant kernel commits that will be available starting with kernel 6.0, with the oldest at the bottom:

5efab5cdf06b Documentation: kvm: extend KVM_S390_ZPCI_OP subheading underline
4ac34b94a534 MAINTAINERS: additional files related kvm s390 pci passthrough
db1c875e0539 KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices
ba6090ff8ae0 vfio-pci/zdev: different maxstbl for interpreted devices
faf3bfcb8950 vfio-pci/zdev: add function handle to clp base capability
8061d1c31f1a vfio-pci/zdev: add open/close device hooks
09340b2fca00 KVM: s390: pci: add routines to start/stop interpretive execution
3c5a1b6f0a18 KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding
3f4bbb4342ec KVM: s390: mechanism to enable guest zPCI Interpretation
73f91b004321 KVM: s390: pci: enable host forwarding of Adapter Event Notifications
98b1d33dac5f KVM: s390: pci: do initial setup for AEN interpretation
6438e30714ab KVM: s390: pci: add basic kvm_zdev structure
c435c54639aa vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM
d10384677630 s390/pci: stash dtsm and maxstbl
c68468ed3416 s390/pci: stash associated GISA designation
062f002485d4 s390/pci: externalize the SIC operation controls and routine
932b646727f9 s390/airq: allow for airq structure that uses an input vector
d2197485a188 s390/airq: pass more TPI info to airq handlers
b05a870c5e4e s390/sclp: detect the AISI facility
efef0db77c93 s390/sclp: detect the AENI facility
9db153f45230 s390/sclp: detect the AISII facility
e3d27b62110c s390/sclp: detect the zPCI load/store interpretation facility

Please note: There is still one additional fix making its way into 6.0 that has not yet merged (therefore no commit ID available yet):
https://<email address hidden>/

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part

I think I've found the missing commit in 6.0-rc4:
ca922fecda6c ca922fecda6caa5162409406dc3b663062d75089 "KVM: s390: pci: Hook to access KVM lowlevel from VFIO"
With that I'll try to get it into kinetic/22.10 (last minute - if everything cherry-picks cleanly).

Changed in ubuntu-z-systems:
status: Incomplete → New
Changed in linux (Ubuntu):
status: Incomplete → New
Changed in ubuntu-z-systems:
assignee: Frank Heimes (fheimes) → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
Frank Heimes (fheimes)
summary: - [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part
+ [22.04 FEAT] Enhanced Interpretation for PCI Functions on s390x - kernel
+ part
Revision history for this message
Frank Heimes (fheimes) wrote :

Pull request was submitted to kernel team's mailing list:
https://lists.ubuntu.com/archives/kernel-team/2022-September/thread.html#133371
changing status to 'In Progress'.

A test kernel was build on all major architectures in PPA and is available here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1853306

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: Frank Heimes (fheimes) → Canonical Kernel Team (canonical-kernel-team)
status: New → In Progress
Changed in ubuntu-z-systems:
status: New → In Progress
information type: Private → Public
Revision history for this message
Frank Heimes (fheimes) wrote :

This is now incl. in linux-generic | 5.19.0.18.18 | kinetic-proposed
hence updating the status to Fix Committed.

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.19.0-18.18

---------------
linux (5.19.0-18.18) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-18.18 -proposed tracker (LP: #1990366)

  * 5.19.0-17.17: kernel NULL pointer dereference, address: 0000000000000084
    (LP: #1990236)
    - Revert "UBUNTU: SAUCE: apparmor: Fix regression in stacking due to label
      flags"
    - Revert "UBUNTU: [Config] disable SECURITY_APPARMOR_RESTRICT_USERNS"
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - add an internal buffer""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't wait on cleanup""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't waste entropy""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - always add a pending
      request""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - unregister device before
      reset""
    - Revert "UBUNTU: SAUCE: Revert "virtio-rng: make device ready before making
      request""
    - Revert "UBUNTU: [Config] update configs after apply new apparmor patch set"
    - Revert "UBUNTU: SAUCE: apparmor: add user namespace creation mediation"
    - Revert "UBUNTU: SAUCE: selinux: Implement userns_create hook"
    - Revert "UBUNTU: SAUCE: bpf-lsm: Make bpf_lsm_userns_create() sleepable"
    - Revert "UBUNTU: SAUCE: security, lsm: Introduce security_create_user_ns()"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: AppArmor: Remove the exclusive
      flag"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add /proc attr entry for full
      LSM context"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Removed scaffolding function
      lsmcontext_init"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: netlabel: Use a struct lsmblob in
      audit data"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple
      object contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: audit: multiple subject lsm values
      for netlabel"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple task
      security contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Allow multiple records in an
      audit_buffer"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add a function to report
      multiple LSMs"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Create audit_stamp
      structure"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Keep multiple LSM data in
      audit_names"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx
      module selection"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: binder: Pass LSM identifier for
      confirmation"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: NET: Store LSM netlabel data in a
      lsmblob"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx in
      netlink netfilter"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_dentry_init_security"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_inode_getsecctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_secid_to_secctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM:...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : backport of c435c54639aa vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM

------- Comment (attachment only) From <email address hidden> 2023-05-17 10:28 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : <backport of 09340b2fca00 KVM: s390: pci: add routines to start/stop interpretive execution>
  • Edit (9.8 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2023-05-17 10:29 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : backport of 8061d1c31f1a vfio-pci/zdev: add open/close device hooks

------- Comment (attachment only) From <email address hidden> 2023-05-17 10:30 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2023-05-17 10:34 EDT-------
For backporting to jammy, the following is required and can be cherry-picked (except where noted a proposed backport has been provided). The most notable backport here is for '8061d1c31f1a vfio-pci/zdev: add open/close device hooks' where the backport makes a specific change to this s390-only code and its interface with vfio rather than pulling in a very large number of vfio pre-reqs.

78b497f2e62d kvm: use kvfree() in kvm_arch_free_vm()
1b553839e132 s390/sclp: add detection of IPL-complete-control facility
4e4dc65ab578 s390/pci: use phys_to_virt() for AIBVs/DIBVs
e3d27b62110c s390/sclp: detect the zPCI load/store interpretation facility
9db153f45230 s390/sclp: detect the AISII facility
efef0db77c93 s390/sclp: detect the AENI facility
b05a870c5e4e s390/sclp: detect the AISI facility
d2197485a188 s390/airq: pass more TPI info to airq handlers
932b646727f9 s390/airq: allow for airq structure that uses an input vector
062f002485d4 s390/pci: externalize the SIC operation controls and routine
c68468ed3416 s390/pci: stash associated GISA designation
d10384677630 s390/pci: stash dtsm and maxstbl
<backport of c435c54639aa vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM>
6438e30714ab KVM: s390: pci: add basic kvm_zdev structure
98b1d33dac5f KVM: s390: pci: do initial setup for AEN interpretation
73f91b004321 KVM: s390: pci: enable host forwarding of Adapter Event Notifications
3f4bbb4342ec KVM: s390: mechanism to enable guest zPCI Interpretation
3c5a1b6f0a18 KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding
<backport of 09340b2fca00 KVM: s390: pci: add routines to start/stop interpretive execution>
<backport of 8061d1c31f1a vfio-pci/zdev: add open/close device hooks>
faf3bfcb8950 vfio-pci/zdev: add function handle to clp base capability
ba6090ff8ae0 vfio-pci/zdev: different maxstbl for interpreted devices
<backport of db1c875e0539 KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices>
4ac34b94a534 MAINTAINERS: additional files related kvm s390 pci passthrough
5efab5cdf06b Documentation: kvm: extend KVM_S390_ZPCI_OP subheading underline
<backport of ca922fecda6c KVM: s390: pci: Hook to access KVM lowlevel from VFIO>
e8c924a4fb6e KVM: s390: pci: fix plain integer as NULL pointer warnings
70ba8fae2775 KVM: s390: pci: fix GAIT physical vs virtual pointers usage
189e7d876e48 KVM: s390: pci: register pci hooks without interpretation

Revision history for this message
bugproxy (bugproxy) wrote : backport of db1c875e0539 KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices

------- Comment (attachment only) From <email address hidden> 2023-05-17 10:31 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : backport of ca922fecda6c KVM: s390: pci: Hook to access KVM lowlevel from VFIO

------- Comment (attachment only) From <email address hidden> 2023-05-17 10:31 EDT-------

Frank Heimes (fheimes)
Changed in linux (Ubuntu Lunar):
status: New → Fix Released
Changed in linux (Ubuntu Kinetic):
status: New → Won't Fix
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

Test kernels (for all major archs, for regression testing) for jammy/22.04 (with VFIO zPCI pass-through for s390x enabled, so CONFIG_VFIO_PCI_ZDEV_KVM=y, means build in, not as module) are currently building here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1853306

Revision history for this message
Frank Heimes (fheimes) wrote :

PR for the above 30 commits requested for jammy, plus setting a default for a new kernel option

Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Pull request submitted to kernel team's mailing list:
https://lists.ubuntu.com/archives/kernel-team/2023-June/thread.html#140405
changing status to 'In Progress'.

A test kernel was build in PPA and is available here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1853306

Changed in linux (Ubuntu Jammy):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-79.86 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux verification-needed-jammy
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2023-07-18 07:57 EDT-------
Hi, I'm just returning from vacation and I note that this bug is at the very end of that 5 day window... Is there a way to extend this by a few more days so I have time to do a proper verification?

Revision history for this message
Frank Heimes (fheimes) wrote :

@mjrosato - ok, please take a few more days.

But what's needed is a detailed test plan about how this (L#1853306), in combination with the qemu part (LP#1853307 - see commet #23), that shows how this be verified.
It should be detailed, step-by-step and e2e - it's needed for this (actually every) SRU and requested by the SRU team, which is responsible for the final acceptance.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.1 KiB)

------- Comment From <email address hidden> 2023-07-18 09:54 EDT-------
Hi Frank -- Thanks alot. Here's some information about the testing I intend to do (I will duplicate it to the QEMU feature), let me know if you have any questions or if you need more details. I don't see a QEMU package available for testing yet so if need be I can use upstream QEMU to verify the kernel.

Testing will consist of the following (all on s390):

Hardware used: z14 or greater LPAR, PCI-attached devices (RoCE VFs, ISM devices, NVMe drive)

Setup: Both the kernel and QEMU features are needed for the feature to function (an upstream QEMU can be used to verify the kernel early), and the facility is only avaialble on z14 or newer. When any of those pieces is missing, the interpretation facility will not be used.
When both the kernel and QEMU features are included in their respective packages, and running in an LPAR on a z14 or newer machine, this feature will be enabled automatically. Existing supported devices should behave as before with no changes required by an end-user (e.g. no changes to libvirt domain definitions) -- but will now make use of the interpretation facility.
Additionally, ISM devices will now be eligible for vfio-pci passthrough (where before QEMU would exit on error if attempting to provide an ISM device for vfio-pci passthrough, preventing the guest from starting)

Testing will include the following scenarios, repeated each for RoCE, ISM and NVMe:

1) Testing of basic device passthrough (create a VM with a vfio-pci device as part of the libvirt domain definition, passing through a RoCE VF, an ISM device, or an NVMe drive. Verify that the device is available in the guest and functioning)
2) Testing of device hotplug/unplug (create a VM with a vfio-pci device, virsh detach-device to remove the device from the running guest, verify the device is removed from the guest, then virsh attach-device to hotplug the device to the guest again, verify the device functions in the guest)
3) Host power off testing: Power off the device from the host, verify that the device is unplugged from the guest as part of the poweroff
4) Guest power off testing: Power off the device from within the guest, verify that the device is unusuable in the guest, power the device back on within the guest and verify that the device is once again usable.
5) Guest reboot testing: (create a VM with a vfio-pci device, verify the device is in working condition, reboot the guest, verify that the device is still usable after reboot)

Testing will include the following scenarios specifically for ISM devices:

1) Testing of SMC-D v1 fallback: Using 2 ISM devices on the same VCHID that share a PNETID, create 2 guests and pass one ISM device via vfio-pci device to each guest. Establish TCP connectivity between the 2 guests using the libvirt default network, and then use smc_run (https://manpages.ubuntu.com/manpages/jammy/man8/smc_run.8.html) to run an iperf workload between the 2 guests (will include both short workloads and longer-running workloads). Verify that SMC-D transfer was used between the guests instead of TCP via 'smcd stats' (https://manpages.ubuntu.com/manpages/jammy/man8...

Read more...

Frank Heimes (fheimes)
description: updated
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2023-07-20 13:50 EDT-------
Verification of this kernel feature is complete. All tests were run using the kernel in jammy-proposed (5.15.0-79-generic #86-Ubuntu) on the host as well as in the KVM guest (except where otherwise noted, see below).
Because the necessary qemu changes are not yet in jammy-proposed, I used a custom qemu build based on the launchpad qemu repository + applied the QEMU feature backport. I will do a second verification of the QEMU feature once that package is available.

As for the testing itself: NVMe and RoCE testing went just as expected.

During ISM testing, I did run into one issue that is not caused by this feature (therefore I am marking verification complete) but it is something that will need to be fixed separately once an upstream patch is available. Specifically, when using the jammy-proposed kernel in the KVM guest I noted occasional UBSAN array-index-out-of-bounds warnings from net/smc/af_smc.c -- This is not directly due to the code added by this feature or even due to being run in a KVM guest, but rather a bug in net/smc code (specifically the SMC_STAT_PAYLOAD_SUB macro) that is much more obvious when ubsan is enabled. I've already identified the root cause, verified that it also exists upstream, and reported the issue to the SMC maintainers along with a suggested fix - without UBSAN enabled no warning is issued but an incorrect performance status counter is quietly incremented. I had panic_on_warn enabled in the KVM guest, therefore the warning caused the guest to crash; otherwise I likely would not have noticed this issue at all.

In order to complete all verification tests related to ISM devices, I proceeded with 2 different KVM guest kernel configurations and ran all ISM tests using both:
1) a custom built (upstream 6.4.0) guest kernel that did not include UBSAN
2) the jammy-proposed kernel that includes UBSAN but with panic_on_warn=0

All test scenarios passed with these guest kernels.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Frank Heimes (fheimes) wrote :

@mjrosato Many thx for this successful kernel verification!
(I'll cross-reference this as fyi to the qemu bug, too.)

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-igx/5.15.0-1002.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-igx verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-5.15/5.15.0-1016.16~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-nvidia-tegra-5.15 verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/5.15.0-1016.16 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra
Revision history for this message
Frank Heimes (fheimes) wrote :

This was opened for s390x and has s390x-specific changes only.
Hence no need for verification on nvidia tegra - removing the tags to unblock the process ...

tags: added: verification-done-focal verification-done-jammy
removed: verification-needed-focal verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (83.7 KiB)

This bug was fixed in the package linux - 5.15.0-79.86

---------------
linux (5.15.0-79.86) jammy; urgency=medium

  * jammy/linux: 5.15.0-79.86 -proposed tracker (LP: #2026531)

  * Jammy update: v5.15.111 upstream stable release (LP: #2025095)
    - ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
    - ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
    - x86/hyperv: Block root partition functionality in a Confidential VM
    - iio: adc: palmas_gpadc: fix NULL dereference on rmmod
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
    - selftests mount: Fix mount_setattr_test builds failed
    - asm-generic/io.h: suppress endianness warnings for readq() and writeq()
    - x86/cpu: Add model number for Intel Arrow Lake processor
    - wireguard: timers: cast enum limits members to int in prints
    - wifi: mt76: mt7921e: Set memory space enable in PCI_COMMAND if unset
    - arm64: Always load shadow stack pointer directly from the task struct
    - arm64: Stash shadow stack pointer in the task struct on interrupt
    - PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
    - PCI: qcom: Fix the incorrect register usage in v2.7.0 config
    - IMA: allow/fix UML builds
    - USB: dwc3: fix runtime pm imbalance on probe errors
    - USB: dwc3: fix runtime pm imbalance on unbind
    - hwmon: (k10temp) Check range scale when CUR_TEMP register is read-write
    - hwmon: (adt7475) Use device_property APIs when configuring polarity
    - posix-cpu-timers: Implement the missing timer_wait_running callback
    - blk-mq: release crypto keyslot before reporting I/O complete
    - blk-crypto: make blk_crypto_evict_key() return void
    - blk-crypto: make blk_crypto_evict_key() more robust
    - ext4: use ext4_journal_start/stop for fast commit transactions
    - staging: iio: resolver: ads1210: fix config mode
    - tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH
    - xhci: fix debugfs register accesses while suspended
    - tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
    - MIPS: fw: Allow firmware to pass a empty env
    - ipmi:ssif: Add send_retries increment
    - ipmi: fix SSIF not responding under certain cond.
    - kheaders: Use array declaration instead of char
    - wifi: mt76: add missing locking to protect against concurrent rx/status
      calls
    - pwm: meson: Fix axg ao mux parents
    - pwm: meson: Fix g12a ao clk81 name
    - soundwire: qcom: correct setting ignore bit on v1.5.1
    - pinctrl: qcom: lpass-lpi: set output value before enabling output
    - ring-buffer: Sync IRQ works before buffer destruction
    - crypto: api - Demote BUG_ON() in crypto_unregister_alg() to a WARN_ON()
    - crypto: safexcel - Cleanup ring IRQ workqueues on load failure
    - rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-
      ed
    - reiserfs: Add security prefix to xattr name in reiserfs_security_write()
    - KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted
    - relayfs: fix out-of-bounds access in relay_file_read
    - writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
 ...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1044.49 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-linux-aws'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1046.53 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2023-09-01 10:31 EDT-------
As this has been released to jammy -updates, we can close the bug.

Thanks everyone for all your work.

==> Changing the status to "CLOSED"

tags: removed: verification-needed-jammy-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-5.15/5.15.0-1046.51~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-done-focal-linux-aws-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-failed-focal-linux-aws-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-aws-5.15-v2 verification-needed-focal-linux-aws-5.15
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-xilinx-zynqmp/5.15.0-1024.28 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-done-jammy-linux-xilinx-zynqmp'. If the problem still exists, change the tag 'verification-needed-jammy-linux-xilinx-zynqmp' to 'verification-failed-jammy-linux-xilinx-zynqmp'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-xilinx-zynqmp-v2 verification-needed-jammy-linux-xilinx-zynqmp
Frank Heimes (fheimes)
tags: added: verification-done-focal-linux-aws-5.15 verification-done-jammy-linux-aws verification-done-jammy-linux-xilinx-zynqmp
removed: verification-needed-focal-linux-aws-5.15 verification-needed-jammy-linux-aws verification-needed-jammy-linux-xilinx-zynqmp
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-mtk/5.15.0-1030.34 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-mtk' to 'verification-done-jammy-linux-mtk'. If the problem still exists, change the tag 'verification-needed-jammy-linux-mtk' to 'verification-failed-jammy-linux-mtk'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-mtk-v2 verification-needed-jammy-linux-mtk
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug affects s390x only, hence I'm updating all further verification requests to done, to unblock potential ongoing processes.

tags: added: verification-done-jammy-linux-mtk
removed: verification-needed-jammy-linux-mtk
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.