Add support for mdev_set_iommu_device() kABI in Ubuntu 22.10 kernel

Bug #1988806 reported by Tarun Gupta
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Invalid
Undecided
Unassigned
linux (Ubuntu)
Triaged
Undecided
Unassigned
Jammy
Invalid
Undecided
Unassigned
Kinetic
Fix Released
Undecided
Unassigned
Lunar
Fix Released
Medium
Unassigned
Mantic
Triaged
Undecided
Unassigned
linux-hwe-5.19 (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Kinetic
Invalid
Undecided
Unassigned
Lunar
Invalid
Undecided
Unassigned
Mantic
Invalid
Undecided
Unassigned

Bug Description

With below commit in 5.16 upstream kernel, support for mdev_set_iommu_device() kABI was removed from kernel as there were no in-tree drivers making use of the kABI.

fda49d97f2c4 ("vfio: remove the unused mdev iommu hook")

This kABI is used by SRIOV based Nvidia vGPU on Ubuntu 22.04. Ampere+ (SRIOV based) Nvidia vGPU use kernel's mdev framework and use this kABI to pin all guest memory during VM boot.

As HWE kernel (for Ubuntu 22.04) will switch to 5.19 upstream kernel, it will not include this kABI and as a result will break SRIOV based Nvidia vGPU. So, filing this bug to request to have a custom patch in HWE kernel which doesn't remove the support for mdev_set_iommu_device() kABI.

CVE References

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

it's incorrect to file bugs againt `linux-meta` bugs like these should be filed against `linux` itself.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Kinetic):
status: New → Fix Released
Changed in linux (Ubuntu Jammy):
status: New → Invalid
Changed in linux-meta-hwe-5.19 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-meta-hwe-5.19 (Ubuntu Lunar):
status: New → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1988806

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
no longer affects: linux-meta-hwe-5.19 (Ubuntu Lunar)
no longer affects: linux-meta-hwe-5.19 (Ubuntu Kinetic)
no longer affects: linux-meta-hwe-5.19 (Ubuntu Jammy)
no longer affects: linux-meta-hwe-5.19 (Ubuntu)
Changed in linux-hwe-5.19 (Ubuntu Lunar):
status: New → Invalid
Changed in linux-hwe-5.19 (Ubuntu Kinetic):
status: New → Invalid
Changed in linux-hwe-5.19 (Ubuntu Jammy):
status: New → Fix Released
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@Tarun Gupta

Is https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/kinetic/commit/?h=master-next&id=e605d68b8accf43430144999e8206dd8511d135f still needed in v6.2 kernel. It no longer applies, and there has been a lot of work by @nvidia.com people on the driver in question (nicolinc kwankhede jgg)

Revision history for this message
Tarun Gupta (tarungupta) wrote :

Hi Dimitri,

Yes, this patch is still needed with Ubuntu 23.04 for making sure vGPU works with it.

You're right that the patch will not apply cleanly due to upstream changes in the code, should I post a new patch for this? Can you also let me know the latest by which this patch can go in Ubuntu 23.04 (Lunar)?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Yes, it is expected for you / Nvidia to update this patch if there is no other solution.

Please provide patch for v6.2 for lunar & vanilla kernel latest RC.

Ideally also submit it upstream, such that a solution acceptable upstream is eventually found.

This patch will need to go through acks, and integrated into an SRU cycles once available. Initial lunar kernel for GA has frozen weeks ago. So this will land in SRU cycle when next possible (April/may). Thus for a while 23.04 will have this regression.

I will separately check which rebase dropped this patch and how come notification about it did not get communicate prior to being identified and escalated to me just today. I do apologise about it.

Changed in linux (Ubuntu Lunar):
status: Incomplete → Triaged
milestone: none → lunar-updates
tags: added: regression-release
Changed in linux (Ubuntu Mantic):
milestone: lunar-updates → ubuntu-23.10-beta
Stefan Bader (smb)
Changed in linux (Ubuntu Lunar):
importance: Undecided → Medium
status: Triaged → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-25.25 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux verification-needed-lunar
Revision history for this message
Roxana Nicolescu (roxanan) wrote :

Hi Tarun Gupta. Can you verify if this works as expected with the latest lunar version in proposed?

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.4 KiB)

This bug was fixed in the package linux - 6.2.0-25.25

---------------
linux (6.2.0-25.25) lunar; urgency=medium

  * lunar/linux: 6.2.0-25.25 -proposed tracker (LP: #2024167)

  * ftrace in ubuntu_kernel_selftests failed with "check if duplicate events are
    caught" on J-5.15 P9 / J-kvm / L-kvm (LP: #1977827)
    - SAUCE: selftests/ftrace: Add test dependency

  * Add microphone support of the front headphone port on P3 Tower
    (LP: #2023650)
    - ALSA: hda/realtek: Add Lenovo P3 Tower platform

  * Add audio support for ThinkPad P1 Gen 6 and Z16 Gen 2 (LP: #2023539)
    - ALSA: hda/realtek: Add quirk for ThinkPad P1 Gen 6

  * Fix Disable thunderbolt clx make edp-monitor garbage while moving the
    touchpad (LP: #2023004)
    - drm/i915: Use 18 fast wake AUX sync len

  * Fix Monitor lost after replug WD19TBS to SUT port with VGA/DVI to type-C
    dongle (LP: #2021949)
    - thunderbolt: Increase timeout of DP OUT adapter handshake
    - thunderbolt: Do not touch CL state configuration during discovery
    - thunderbolt: Increase DisplayPort Connection Manager handshake timeout

  * Enable Tracing Configs for OSNOISE and TIMERLAT (LP: #2018591)
    - [Config] Enable OSNOISE_TRACER and TIMERLAT_TRACER configs

  * Fix only reach PC3 when ethernet is plugged r8169 (LP: #1946433)
    - r8169: use spinlock to protect mac ocp register access
    - r8169: use spinlock to protect access to registers Config2 and Config5
    - r8169: enable cfg9346 config register access in atomic context
    - r8169: prepare rtl_hw_aspm_clkreq_enable for usage in atomic context
    - r8169: disable ASPM during NAPI poll
    - r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll

  * introduce do_lib_rust=true|false to enable/disable linux-lib-rust package
    (LP: #2021605)
    - [Packaging] introduce do_lib_rust and enable it only on generic amd64

  * System either hang with black screen or rebooted on entering suspend on AMD
    Ryzen 9 PRO 7940HS w/ Radeon 780M Graphics (LP: #2020685)
    - drm/amdgpu: refine get gpu clock counter method
    - drm/amdgpu/gfx11: update gpu_clock_counter logic

  * generate linux-lib-rust only on amd64 (LP: #2020356)
    - [Packaging] generate linux-lib-rust only on amd64

  * No HDMI/DP audio output on dock(Nvidia GPU) (LP: #2020062)
    - ALSA: hda: Add NVIDIA codec IDs a3 through a7 to patch table

  * Add support for mdev_set_iommu_device() kABI in Ubuntu 22.10 kernel
    (LP: #1988806)
    - SAUCE: Add mdev_set_iommu_device() kABI.

  * Enable audio LEDs on HP laptops (LP: #2019915)
    - ALSA: hda/realtek: Fix mute and micmute LEDs for an HP laptop
    - ALSA: hda/realtek: Fix mute and micmute LEDs for yet another HP laptop

  * linux-*: please enable dm-verity kconfigs to allow MoK/db verified root
    images (LP: #2019040)
    - [Config] CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING=y

  * Lunar update: v6.2.13 upstream stable release (LP: #2023929)
    - ARM: dts: rockchip: fix a typo error for rk3288 spdif node
    - arm64: dts: rockchip: Lower sd speed on rk3566-soquartz
    - arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node
    - arm64: dts: qcom: ipq8074-hk10: ...

Changed in linux (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-riscv/6.2.0-27.28.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-riscv
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.2.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-hwe-6.2/6.2.0-26.26~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-hwe-6.2 verification-needed-jammy
Revision history for this message
Jose Ogando Justo (joseogando) wrote :

Hi Tarun,

In preparation to Mantic release, Can you confirm if fixes are no longer required? Does the driver contain fixes that no longer require the fix?

Thanks!

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

If this patch is still needed will you submit forward port by end of today? Given today is Kernel Feature Freeze with hardward support freeze on Monday.

Revision history for this message
Tarun Gupta (tarungupta) wrote :

Hi Dimitri and Jose,

I've posted the patch for review as requested for 23.10 kernel. Refer https://lists.ubuntu.com/archives/kernel-team/2023-September/142909.html .

Please help review

Thanks,
Tarun

Revision history for this message
Jose Ogando Justo (joseogando) wrote :

Hello Tarun,

More details about this patch will continue privately via email.

For the record here, this patch missed the deadline.

Regards,

Jose

Changed in ubuntu-release-notes:
status: New → Invalid
Changed in linux (Ubuntu Mantic):
milestone: ubuntu-23.10-beta → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.