[SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client - TP 8010

Bug #1948626 reported by Reshmi Aravind
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Michael Reed
Jammy
Fix Released
Medium
Michael Reed

Bug Description

[Impact]
NVMe-oF suffers from a well-known discovery problem that fundamentally limits the size of realistic deployments. To address this discovery problem, the FMDS working group (within nvme.org) is working on two proposals that will allow NVMe-oF to be managed via a “network centric” provisioning process instead of an “end-node centric” one.
TP-8009 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1948625): will enable the Automated Discovery of NVMe-oF Discovery Controllers in an IP Network and will prevent an end-user from needing to manually configure the IP Address of Discovery Controllers.
TP-8010 (this launchpad): will define the concept of a Centralized Discovery Controller (CDC) and will allow end-users to manage connectivity from a single point of management on an IP Fabric by IP Fabric basis.

Here is code that implements TP8009 and TP8010:
https://github.com/martin-belanger/nvme-stas/ which now got pulled into upstream - https://github.com/linux-nvme/nvme-stas

[Fix]
1. Update kernel with TP8010 kernel patches:

a. https://git.infradead.org/git/nvme.git/commit/647b2e01fb2d3394090ed11d1b5238157c52f907

b. https://git.infradead.org/git/nvme.git/commit/de87c02ea9b4d93d1114b912b621ead81f6738e0

c.
 nvme: add CNTRLTYPE definitions for 'identify controller
https://<email address hidden>/

[Test Case]
 1. Compile libnvme and nvme-stas packages from github using the kernel
 with these patches.

 2. Test libnvme and nvme-stas packages

[Where problems could occur]
Regression risk for these patches are low

[Other Info]
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/nvme_tcp_patches_2

information type: Public → Private
Revision history for this message
Sujith Pandel (sujithpandel) wrote :
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote :

Why did you mark this private? That all but ensures no one will be able to see it outside a very select group of people... I see nothing in here that looks like it is embargoed information.

Revision history for this message
Sujith Pandel (sujithpandel) wrote :

TP-8010 code was recently made public. Marking this report public now.

information type: Private → Public
Revision history for this message
Jeff Lane  (bladernr) wrote :

Doesn't look like any of those patches have been accepted upstream yet at all. at least I was unable to find any hint of hte kernel patches in mainline nor the torvalds tree.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1948626

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client - TP 8010

Is there a different staging kernel upstream where these have been merged?

Also, it's too late to get nvme-stas and libnvme updated without a Feature Freeze Exception. Please work with Michael on that... it may be that we'll need to deal with this after release though, it's really late in the cycle at this point if we can't get the kernel patches in, there's no reason to hurry on the others.

BUT we can at least see if we can get a FFE for libnvme and nvme-stas. You say "get the latest" but we only pull from Debian for userspace stuff normally. Without me looking, what is the timeline to getting these updated in Debian?

Revision history for this message
Charles Rose (charles-rose) wrote :

Jeff,
The patches are queued up to be submitted to 5.18
https://git.infradead.org/git/nvme.git/log/refs/tags/nvme-5.18-2022-03-03

Revision history for this message
Michael Reed (mreed8855) wrote :

An additional patch was needed to fix the build of the two patches.

commit e15a8a9755659ff5972f30de4dd64867c97f242d
Author: Hannes Reinecke <email address hidden>
Date: Wed Sep 22 08:35:20 2021 +0200

    nvme: add CNTRLTYPE definitions for 'identify controller'

    Update the 'identify controller' structure to define the newly added
    CNTRLTYPE field.

    Signed-off-by: Hannes Reinecke <email address hidden>
    Reviewed-by: Chaitanya Kulkarni <email address hidden>
    Reviewed-by: Himanshu Madhani <email address hidden>
    Signed-off-by: Christoph Hellwig <email address hidden>

Revision history for this message
Michael Reed (mreed8855) wrote :
summary: - Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client -
- TP 8010
+ [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC
+ Client - TP 8010
Revision history for this message
Michael Reed (mreed8855) wrote :

Sujith,

Can you provide a test case that will test these patches?

description: updated
description: updated
Changed in linux (Ubuntu Jammy):
assignee: nobody → Michael Reed (mreed8855)
Revision history for this message
Sujith Pandel (sujithpandel) wrote :

@~reshmi-susheela-aravind - will you be able to look into this?

Revision history for this message
Sujith Pandel (sujithpandel) wrote :

Michael,
The test kernel listed here is 5.13 kernel, but jammy is of v5.15 kernel. Is this going to be for Jammy?

https://people.canonical.com/~mreed/lp_1948626_nvme_tcp/

Revision history for this message
Michael Reed (mreed8855) wrote :

Hi Sujith,

I forgot to apply the latest tag. I have rebuilt the kernel with 5.15 and updated the link.

 https://people.canonical.com/~mreed/lp_1948626_nvme_tcp/

Michael Reed (mreed8855)
description: updated
Revision history for this message
Reshmi Aravind (reshmi-susheela-aravind) wrote :

Hi Michael, Sujith,

Tested NVMe/TCP initiator support with the kernel mentioned above, whenever I do nvme connect and mount the luns ,I used to observe the same error mentioned in the launchpad which an engineer had filed previously related to PCI NVMe SSD.

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1966203

But no functionality loss is seen. No other errors observed when kept for fio stress for 24-48 Hrs.

Revision history for this message
Reshmi Aravind (reshmi-susheela-aravind) wrote :

Michael, Sujith,

Could compile and tested the following packages in the kernel mentioned above(5.15.0-27-generic).The CDC client is working fine.

1.libnvme - https://github.com/linux-nvme/libnvme (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1948636)
2.nvme-stas - https://github.com/martin-belanger/nvme-stas/

Revision history for this message
Michael Reed (mreed8855) wrote :

Reshmi,

I need a quick summary of the test plan for this code. I created the following:

 1. Test NVMe/TCP initiator support with the kernel
 2. Create nvme
 3. Connect and mount the luns
 4. Complile and test libnvme and nvme-stas packages from github

Does this suffice for testing? This info I need to submit the code to the mailing list

description: updated
Revision history for this message
Reshmi Aravind (reshmi-susheela-aravind) wrote :

Michael,

Step 4 holds good for this because this particular feature talks about CDC client(includes nvme-stas and libnvme).
Step 1,2,3 should be a good sanity tests, just that it is unrelated to the patches this bug requests.

Michael Reed (mreed8855)
description: updated
Michael Reed (mreed8855)
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
status: Incomplete → Fix Committed
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-43.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Revision history for this message
Sheik Ajith Ali Mohammed Farook (sheikajith) wrote :

Hi Michael,

I have done the basic sanity test by following the below test cases with linux/5.15.0-43.46 kernel.

 1. Test NVMe/TCP initiator support with the kernel
 2. Create nvme
 3. Connect and mount the luns
 4. Complile and test libnvme and nvme-stas packages from github

Have not faced any issue with the NVMe/TCP connection and with the latest nvme packages except socket connection error: -111 and -110 at some point. I have attached the sos report for reference.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package linux - 5.15.0-43.46

---------------
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
    (LP: #1896350)
    - nbd: don't handle response without a corresponding request message
    - nbd: make sure request completion won't concurrent
    - nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
    - nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
    events (LP: #1965241)
    - PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
    - PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
    - [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
    - TP 8010 (LP: #1948626)
    - nvme: add CNTRLTYPE definitions for 'identify controller'
    - nvme: send uevent on connection up
    - nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
    (LP: #1980951)
    - s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
    converged OVS bridge for control and data planes (LP: #1978820)
    - net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
    - ASoC: amd: add Yellow Carp ACP6x IP register header
    - ASoC: amd: add Yellow Carp ACP PCI driver
    - ASoC: amd: add acp6x init/de-init functions
    - ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
    - ASoC: amd: add acp6x pdm platform driver
    - ASoC: amd: add acp6x irq handler
    - ASoC: amd: add acp6x pdm driver dma ops
    - ASoC: amd: add acp6x pci driver pm ops
    - ASoC: amd: add acp6x pdm driver pm ops
    - ASoC: amd: enable Yellow carp acp6x drivers build
    - ASoC: amd: create platform device for acp6x machine driver
    - ASoC: amd: add YC machine driver using dmic
    - ASoC: amd: enable Yellow Carp platform machine driver build
    - ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
    - [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
    (LP: #1959973)
    - drivers/s390/char: Add Ultravisor io device
    - s390/uv_uapi: depend on CONFIG_S390
    - [Co...

Read more...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1003.5~20.04.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Sheik Ajith Ali Mohammed Farook (sheikajith) wrote :

Hi Michael,

Could you please provide the latest nvme-stas 2.0 deb package to test NVMe-oF-TCP CDC Client support with Ubuntu 22.04 Jammy?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.