[22.04 FEAT] Transparent PCI device recovery

Bug #1959532 reported by bugproxy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
High
Canonical Kernel Team

Bug Description

A Linux on Z admin can make use of PCI-based devices in case of errors without the need to perform manual recovery.

Details:
Use cooperative recovery strategies that allow drivers to recover from error scenarios without complete tear-down + re-init. See approach documented for Linux on Power in Linux/Documentation/PCI/pci-error-recovery.rst.

Business value:
Improved reliability, reduced down-times.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-196207 severity-high targetmilestone-inin2204
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

Changing ticket to Incomplete until (upstream) kernel version and/or commits are known.

Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-z-systems:
importance: Undecided → High
Changed in linux (Ubuntu):
status: New → Incomplete
Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-02-01 11:57 EDT-------
(In reply to comment #6)
> Changing ticket to Incomplete until (upstream) kernel version and/or commits
> are known.

This is upstream in v5.16 finnished with commit:

4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")

There are a couple of prerequisite commits so I'll follow up with a list
needed on top of v5.15 once I've done some testing but I believe it should
be rather small.

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Niklas, well, for this specific patch 4cdf2f4e24ff a clean cherry-pick is possible.
But if other (rather independent ones) are needed on top, I'll wait ...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-03-04 11:03 EDT-------
(In reply to comment #8)
> Hi Niklas, well, for this specific patch 4cdf2f4e24ff a clean cherry-pick is
> possible.
> But if other (rather independent ones) are needed on top, I'll wait ...

Sorry for the late reply. Even though this(In reply to comment #8)
> Hi Niklas, well, for this specific patch 4cdf2f4e24ff a clean cherry-pick is
> possible.
> But if other (rather independent ones) are needed on top, I'll wait ...

Sorry for the late reply. The commits needed for this feature all landed
in v5.16-rc1 and are:

1c8174fdc798489159a79466fca782daa231219a ("s390/pci: tolerate inconsistent handle in recover")
6526a597a2e856df9ae94512f9903caccd5196d6 ("s390/pci: add simpler s390dbf traces for events ")
4fe204977096e900cb91a3298b05c794ac24f540 ("s390/pci: refresh function handle in iomap")
da995d538d3a17610d89fea0f5813cf7921b3c2c ("s390/pci: implement reset_slot for hotplug slot")
dfd5bb23ad75bdabde89ac3166705a450bf16acb ("PCI: Export pci_dev_lock()")
4cdf2f4e24ff0d345fc36ef6d6aec059333a261e ("s390/pci: implement minimal PCI error recovery")

The first one isn't a strict dependency but a trivial bug fix to a commit in v5.15 in the same area.
I must have missed tagging that for stable. If you prefer I can also create a separate BZ for it.

Frank Heimes (fheimes)
Changed in linux (Ubuntu):
status: Incomplete → New
Changed in ubuntu-z-systems:
status: Incomplete → New
Frank Heimes (fheimes)
information type: Private → Public
Revision history for this message
Patricia Domingues (patriciasd) wrote :

built kernel available and tested: https://people.canonical.com/~patriciasd/kernel-lp1959532/
```
ubuntu@s1lp13:~$ uname -a
Linux s1lp13 5.15.0-23-generic #23 SMP Wed Mar 16 00:21:04 UTC 2022 s390x s390x s390x GNU/Linux
```

Revision history for this message
Patricia Domingues (patriciasd) wrote :
Frank Heimes (fheimes)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
status: New → In Progress
Changed in ubuntu-z-systems:
status: New → In Progress
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-hwe-5.15/5.15.0-25.25~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Frank Heimes (fheimes) wrote :

This feature was request for 22.04, hence the hwe kernel is more a fall out.
Hence updating tag to unblock the process.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.15.0-25.25

---------------
linux (5.15.0-25.25) jammy; urgency=medium

  * jammy/linux: 5.15.0-25.25 -proposed tracker (LP: #1967146)

  * Miscellaneous Ubuntu changes
    - SAUCE: Revert "scsi: core: Reallocate device's budget map on queue depth
      change"

 -- Paolo Pisati <email address hidden> Wed, 30 Mar 2022 17:28:11 +0200

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.