Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment events

Bug #1965241 reported by Sujith Pandel
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dellserver
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Michael Reed
Jammy
Fix Released
Medium
Michael Reed

Bug Description

SRU Justification:

[Impact]
Recovery from DownPort Containment events fail and the NVMe endpoint is not accessible in some scenarios.

[Fix]

These are some of the DPC fixes which help in handling some of the failure cases of DownPort Containment events.

Upstream kernel patches to be included into Ubuntu 22.04 and into Ubuntu 20.04.5:

Already in Jammy as of Ubuntu-5.15.0-1.1
PCI/portdrv: Enable Bandwidth Notification only if port supports it
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6&id=00823dcbdd415c868390feaca16f0265101efab4

PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6&id=ea401499e943c307e6d44af6c2b4e068643e7884

3134689f98 PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()

[Test Case]

1. Disable the memory space of NVMe end point device
2. Issue IO to the device
3. Observe dmesg. dmesg shows that EDR event is generated, link is contained and NVMe device is recovered.

2. Observe the dmesg

[Other Info]
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/test_dpc_1965241

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1965241

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Jeff Lane  (bladernr)
description: updated
Jeff Lane  (bladernr)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Jeff Lane (bladernr)
description: updated
Revision history for this message
Michael Reed (mreed8855) wrote : Re: Include DPC Fixes in Ubuntu 22.04 and 20.04

I found that this additional patch was also needed. Pcie_port_device_iter was added in 5.16 and referenced in the "PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset" patch

commit 3134689f98f9e09004a4727370adc46e7635b4be
Author: Lukas Wunner <email address hidden>
Date: Fri Oct 15 13:58:40 2021 -0500

    PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()

    Rename pm_iter() to pcie_port_device_iter() and make it visible outside
    CONFIG_PM and portdrv_core.c so it can be used for pciehp slot reset
    recovery.

    [bhelgaas: split into its own patch]
    Link: https://<email address hidden>/
    Link: https://lore.kernel.org<email address hidden>
    Signed-off-by: Lukas Wunner <email address hidden>
    Signed-off-by: Bjorn Helgaas <email address hidden>

Revision history for this message
Michael Reed (mreed8855) wrote :

I have provided test kernels at the following link:

https://people.canonical.com/~mreed/lp_1965241_DPC_Fix/

Revision history for this message
Michael Reed (mreed8855) wrote :
Michael Reed (mreed8855)
description: updated
Revision history for this message
Michael Reed (mreed8855) wrote (last edit ):

I have updated the test kernel with CONFIG_PCIE_EDR enabled.

https://people.canonical.com/~mreed/lp_1965241_DPC_Fix/

Jeff Lane  (bladernr)
Changed in linux (Ubuntu Jammy):
assignee: Jeff Lane (bladernr) → Michael Reed (mreed8855)
Revision history for this message
Narendra K (knarendra) wrote :

Michael,

We tried the test kernel from comment #5. From the sanity tests, basic functionality works as expected -

On a system where NVMe end point is connected to root port,

1. When an EDR event occurs, the link is contained and system does not crash.
2. The config space of NVMe end point device is restored.

The DPC functionality does not work as expected if CONFIG_PCIE_EDR is not enabled.

Test Case:

1. Disable the memory space of NVMe end point device
2. Issue IO to the device
3. Observe dmesg. dmesg shows that EDR event is generated, link is contained and NVMe device is recovered.

2. Observe the dmesg

Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
summary: - Include DPC Fixes in Ubuntu 22.04 and 20.04
+ Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort
+ Containment events
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
Jeff Lane  (bladernr)
tags: added: servcert-359
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-43.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Revision history for this message
Narendra K (knarendra) wrote :

Basic sanity test shows positive results with 5.15.0-43.46 kernel from -proposed repo.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package linux - 5.15.0-43.46

---------------
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
    (LP: #1896350)
    - nbd: don't handle response without a corresponding request message
    - nbd: make sure request completion won't concurrent
    - nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
    - nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
    events (LP: #1965241)
    - PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
    - PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
    - [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
    - TP 8010 (LP: #1948626)
    - nvme: add CNTRLTYPE definitions for 'identify controller'
    - nvme: send uevent on connection up
    - nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
    (LP: #1980951)
    - s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
    converged OVS bridge for control and data planes (LP: #1978820)
    - net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
    - ASoC: amd: add Yellow Carp ACP6x IP register header
    - ASoC: amd: add Yellow Carp ACP PCI driver
    - ASoC: amd: add acp6x init/de-init functions
    - ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
    - ASoC: amd: add acp6x pdm platform driver
    - ASoC: amd: add acp6x irq handler
    - ASoC: amd: add acp6x pdm driver dma ops
    - ASoC: amd: add acp6x pci driver pm ops
    - ASoC: amd: add acp6x pdm driver pm ops
    - ASoC: amd: enable Yellow carp acp6x drivers build
    - ASoC: amd: create platform device for acp6x machine driver
    - ASoC: amd: add YC machine driver using dmic
    - ASoC: amd: enable Yellow Carp platform machine driver build
    - ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
    - [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
    (LP: #1959973)
    - drivers/s390/char: Add Ultravisor io device
    - s390/uv_uapi: depend on CONFIG_S390
    - [Co...

Read more...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Michael Reed (mreed8855)
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1003.5~20.04.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.