Fix AMDGPU crash on 6.5 kernel

Bug #2047389 reported by AaronMa
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
High
AaronMa
linux (Ubuntu)
Status tracked in Noble
Jammy
Invalid
Undecided
Unassigned
Mantic
Fix Released
High
Unassigned
Noble
Fix Released
High
Unassigned
linux-oem-6.5 (Ubuntu)
Status tracked in Noble
Jammy
Fix Released
High
Unassigned
Mantic
Invalid
Undecided
Unassigned
Noble
Invalid
Undecided
Unassigned

Bug Description

[Impact]
AMD GPU crash with a message: " cp queue preemption timeout".

[Fix]
Disable MCBP(mid command buffer preemption) by default as old Mesa
hangs with it.

[Test]
Tested on hardware, play web video for more than 1 hour OK.

[Where problems could occur]
It may break AMD GPU.

This issue is only reproduced on v6.5+ kernel versions and fixes in v6.6, so SRU for oem-6.5, mantic.

AaronMa (mapengyu)
Changed in linux (Ubuntu Jammy):
status: New → Invalid
Changed in linux-oem-6.5 (Ubuntu Mantic):
status: New → Invalid
Changed in linux-oem-6.5 (Ubuntu Noble):
status: New → Invalid
tags: added: oem-priority originate-from-2047305 sutton
tags: added: originate-from-2043640
tags: added: originate-from-2045573
Changed in hwe-next:
status: New → In Progress
importance: Undecided → High
Changed in linux (Ubuntu Mantic):
importance: Undecided → High
Changed in linux (Ubuntu Noble):
importance: Undecided → High
Changed in linux-oem-6.5 (Ubuntu Jammy):
importance: Undecided → High
Changed in hwe-next:
assignee: nobody → AaronMa (mapengyu)
AaronMa (mapengyu)
description: updated
summary: - Fix AMDGPU crash on 6.5+ kernel
+ Fix AMDGPU crash on 6.5 kernel
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Mantic):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux-oem-6.5 (Ubuntu Jammy):
status: New → Confirmed
AaronMa (mapengyu)
Changed in linux (Ubuntu Noble):
status: Confirmed → Invalid
Changed in linux (Ubuntu Mantic):
status: Confirmed → Fix Committed
Timo Aaltonen (tjaalton)
Changed in linux-oem-6.5 (Ubuntu Jammy):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.5.0-16.16 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux' to 'verification-done-mantic-linux'. If the problem still exists, change the tag 'verification-needed-mantic-linux' to 'verification-failed-mantic-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-v2 verification-needed-mantic-linux
AaronMa (mapengyu)
tags: added: verification-done-mantic-linux
removed: verification-needed-mantic-linux
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-6.5/6.5.0-1013.14 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-oem-6.5' to 'verification-done-jammy-linux-oem-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-oem-6.5' to 'verification-failed-jammy-linux-oem-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-oem-6.5-v2 verification-needed-jammy-linux-oem-6.5
AaronMa (mapengyu)
tags: added: verification-done-jammy-linux-oem-6.5
removed: verification-needed-jammy-linux-oem-6.5
Timo Aaltonen (tjaalton)
Changed in linux (Ubuntu Noble):
status: Invalid → Fix Released
Changed in linux-oem-6.5 (Ubuntu Mantic):
status: Invalid → Won't Fix
status: Won't Fix → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-6.5 - 6.5.0-1013.14

---------------
linux-oem-6.5 (6.5.0-1013.14) jammy; urgency=medium

  * jammy/linux-oem-6.5: 6.5.0-1013.14 -proposed tracker (LP: #2049407)

  * Mute/mic LEDs no function on HP ZBook (LP: #2048729)
    - ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook

  * iwlwifi leads to system randomly hangs after suspend (LP: #2049184)
    - SAUCE: wifi: iwlwifi: fix a memory corruption

  * Fix BCM57416 lost after resume (LP: #2047518)
    - bnxt_en: Clear resource reservation during resume

  * Mute/mic LEDs and speaker no function on some HP platforms (LP: #2047504)
    - ALSA: hda/realtek: Add quirks for HP Laptops

  * drm: Update file owner during use (LP: #2047461)
    - drm: Update file owner during use

  * Add missing RPL P/U CPU IDs (LP: #2047398)
    - drm/i915/rpl: Update pci ids for RPL P/U

  * Fix AMDGPU crash on 6.5 kernel (LP: #2047389)
    - drm/amdgpu: disable MCBP by default

  * Audio device is not available, instead it shows dummy output in the settings
    (LP: #2047184)
    - ALSA: hda: intel-nhlt: Ignore vbps when looking for DMIC 32 bps format

  * Fix AMDGPU display on lower resolution modes (LP: #2046504)
    - drm/amd/display: fix mode scaling (RMX_.*)
    - drm/amd/display: fix the ability to use lower resolution modes on eDP

  * RTL8852CE WIFI read country list supporting 6 GHz from BIOS (LP: #2045622)
    - wifi: rtw89: Introduce Time Averaged SAR (TAS) feature
    - wifi: rtw89: acpi: process 6 GHz band policy from DSM
    - wifi: rtw89: regd: handle policy of 6 GHz according to BIOS
    - wifi: rtw89: regd: update regulatory map to R65-R44

  [ Ubuntu: 6.5.0-15.15 ]

  * mantic/linux: 6.5.0-15.15 -proposed tracker (LP: #2048549)
  * CVE-2024-0193
    - netfilter: nf_tables: skip set commit for deleted/destroyed sets
  * CVE-2023-6606
    - smb: client: fix OOB in smbCalcSize()
  * CVE-2023-6817
    - netfilter: nft_set_pipapo: skip inactive elements during set walk
  * CVE-2023-6932
    - ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet
  * CVE-2023-6931
    - perf: Fix perf_event_validate_size()
    - perf: Fix perf_event_validate_size() lockdep splat

 -- Timo Aaltonen <email address hidden> Tue, 16 Jan 2024 09:45:28 +0200

Changed in linux-oem-6.5 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (86.2 KiB)

This bug was fixed in the package linux - 6.5.0-17.17

---------------
linux (6.5.0-17.17) mantic; urgency=medium

  * mantic/linux: 6.5.0-17.17 -proposed tracker (LP: #2049026)

  * [UBUNTU 23.04] Regression: Ubuntu 23.04/23.10 do not include uvdevice
    anymore (LP: #2048919)
    - [Config] Enable S390_UV_UAPI (built-in)

linux (6.5.0-16.16) mantic; urgency=medium

  * mantic/linux: 6.5.0-16.16 -proposed tracker (LP: #2048372)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] remove helper scripts
    - [Packaging] update annotations scripts
    - debian/dkms-versions -- update from kernel-versions (main/2024.01.08)

  * Add missing RPL P/U CPU IDs (LP: #2047398)
    - drm/i915/rpl: Update pci ids for RPL P/U

  * Fix BCM57416 lost after resume (LP: #2047518)
    - bnxt_en: Clear resource reservation during resume

  * Hotplugging SCSI disk in QEMU VM fails (LP: #2047382)
    - Revert "PCI: acpiphp: Reassign resources on bridge if necessary"

  * Update bnxt_en with bug fixes and support for Broadcom 5760X network
    adapters (LP: #2045796)
    - bnxt_en: use dev_consume_skb_any() in bnxt_tx_int
    - eth: bnxt: move and rename reset helpers
    - eth: bnxt: take the bit to set as argument of bnxt_queue_sp_work()
    - eth: bnxt: handle invalid Tx completions more gracefully
    - eth: bnxt: fix one of the W=1 warnings about fortified memcpy()
    - eth: bnxt: fix warning for define in struct_group
    - bnxt_en: Fix W=1 warning in bnxt_dcb.c from fortify memcpy()
    - bnxt_en: Fix W=stringop-overflow warning in bnxt_dcb.c
    - bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP
    - bnxt_en: Let the page pool manage the DMA mapping
    - bnxt_en: Increment rx_resets counter in bnxt_disable_napi()
    - bnxt_en: Save ring error counters across reset
    - bnxt_en: Display the ring error counters under ethtool -S
    - bnxt_en: Add tx_resets ring counter
    - bnxt: use the NAPI skb allocation cache
    - bnxt_en: Update firmware interface to 1.10.2.171
    - bnxt_en: Enhance hwmon temperature reporting
    - bnxt_en: Move hwmon functions into a dedicated file
    - bnxt_en: Modify the driver to use hwmon_device_register_with_info
    - bnxt_en: Expose threshold temperatures through hwmon
    - bnxt_en: Use non-standard attribute to expose shutdown temperature
    - bnxt_en: Event handler for Thermal event
    - bnxt_en: Support QOS and TPID settings for the SRIOV VLAN
    - bnxt_en: Update VNIC resource calculation for VFs
    - Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN"
    - eth: bnxt: fix backward compatibility with older devices
    - bnxt_en: Do not call sleeping hwmon_notify_event() from NAPI
    - bnxt_en: Fix invoking hwmon_notify_event
    - bnxt_en: add infrastructure to lookup ethtool link mode
    - bnxt_en: support lane configuration via ethtool
    - bnxt_en: refactor speed independent ethtool modes
    - bnxt_en: Refactor NRZ/PAM4 link speed related logic
    - bnxt_en: convert to linkmode_set_bit() API
    - bnxt_en: extend media types to supported and autoneg modes
    - bnxt_en: Fix 2...

Changed in linux (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp-6.5/6.5.0-1013.13~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-gcp-6.5' to 'verification-done-jammy-linux-gcp-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-gcp-6.5' to 'verification-failed-jammy-linux-gcp-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-gcp-6.5-v2 verification-needed-jammy-linux-gcp-6.5
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.5.0-1013.13 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-azure' to 'verification-done-mantic-linux-azure'. If the problem still exists, change the tag 'verification-needed-mantic-linux-azure' to 'verification-failed-mantic-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-azure-v2 verification-needed-mantic-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-6.5/6.5.0-1013.13~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-done-jammy-linux-aws-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-failed-jammy-linux-aws-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-6.5-v2 verification-needed-jammy-linux-aws-6.5
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-6.5/6.5.0-1014.14 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-6.5' to 'verification-done-jammy-linux-nvidia-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-6.5' to 'verification-failed-jammy-linux-nvidia-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-6.5-v2 verification-needed-jammy-linux-nvidia-6.5
AaronMa (mapengyu)
Changed in hwe-next:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.