[NEW SRU] backport accel-config for HWE support

Bug #2020769 reported by Dimitri John Ledkov
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
accel-config (Ubuntu)
Fix Released
High
Unassigned
Jammy
Fix Committed
High
Unassigned
Kinetic
Won't Fix
Wishlist
Unassigned
Lunar
Fix Committed
Low
Unassigned
Mantic
Fix Released
High
Unassigned

Bug Description

[ Impact ]

 * accel-config is a utility library for controlling and configuring DSA (Intel® Data Streaming Accelerator Architecture) and IAA (Intel® Analytics Accelerator Architecture) sub-systems in the Linux kernel

 * This is required/only userspace tooling for this hardware

 * Backport this package as part of HWE SRU exception

[ Test Plan ]

 * We don't have access to relevant hardware, thus we will rely on Intel partner (cking) to test and verify proposed packages.

 * Hardware requirements: DSA/IAA enabled platform is required. This means a 4th Generation Intel® Xeon® Scalable Processor is necessary.

 * Kernel requirements:
   - The kernel should have the IDXD module enabled (CONFIG_INTEL_IDXD_BUS=m, CONFIG_INTEL_IDXD=m, CONFIG_INTEL_IDXD_SVM=y).
   - The recommended kernel version that contains support for these features is 5.18+.
   - The kernel configs are enabled for the Lunar and Kinetic generic kernels.
   - Therefore in Lunar and Kinetic the tests can be run with the generic kernels. In Jammy the tests need to be performed with the hwe-5.19 and hwe-6.2 kernels.

 * Test procedure:
   1. install accel-config
   2. with the accel-config sources, run the test_all.sh test script (documentation at https://github.com/intel/idxd-config/tree/accel-config-v4.0/test)
   3. the script will report failures if they occur

[ Where problems could occur ]

 * For jammy this is a new package
 * For lunar upgrade, this should be trivial as the hardware in question is unlikely to be deployed with interim release
 * As this package provides support for configuring features only provided by a processor family that has been recently launched, this package shouldn't cause any regression for older processors or kernel versions.

[ Other Info ]

 * Requested by Intel partner team.

Lunar backport DONE:
https://launchpad.net/ubuntu/lunar/+queue?queue_state=3&queue_text=accel-config

Kinetic backport DONE:
https://launchpad.net/ubuntu/kinetic/+queue?queue_state=3&queue_text=accel-config

Jammy backport DONE:
https://launchpad.net/ubuntu/jammy/+queue?queue_state=3&queue_text=accel-config

Changed in accel-config (Ubuntu Kinetic):
importance: Undecided → Wishlist
Changed in accel-config (Ubuntu Jammy):
importance: Undecided → High
Changed in accel-config (Ubuntu Lunar):
importance: Undecided → Low
Changed in accel-config (Ubuntu Mantic):
importance: Undecided → High
status: New → Fix Released
summary: - backport accel-config for HWE support
+ [NEW SRU] backport accel-config for HWE support
description: updated
Changed in accel-config (Ubuntu Lunar):
status: New → In Progress
Changed in accel-config (Ubuntu Kinetic):
status: New → Triaged
Changed in accel-config (Ubuntu Jammy):
status: New → Triaged
Revision history for this message
Robie Basak (racb) wrote :

This isn't a full review as I'm not on shift today. But please could you expand the Test Plan? It's fine if it's going to be tested externally, but the exact steps they intend to follow should be documented. And in particular for this kind of case, please make it explicit that the testing will be performed with a kernel from the archive as well as the package from the archive (proposed is fine) as the importance of that is often missed. Report of a successful verification should state the kernel version used as well as the version of accel-config and the specific steps performed.

I've not looked at the uploads themselves, but hopefully this feedback will speed up the review time.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Colin, could you update the test plan as requested by Robie?

Steve Langasek (vorlon)
Changed in accel-config (Ubuntu Lunar):
status: In Progress → Incomplete
description: updated
description: updated
Changed in accel-config (Ubuntu Lunar):
status: Incomplete → Triaged
Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Dimitri, or anyone else affected,

Accepted accel-config into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/accel-config/4.0-2~ubuntu0.23.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in accel-config (Ubuntu Lunar):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-lunar
Changed in accel-config (Ubuntu Kinetic):
status: Triaged → Fix Committed
tags: added: verification-needed-kinetic
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello Dimitri, or anyone else affected,

Accepted accel-config into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/accel-config/4.0-2~ubuntu0.22.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in accel-config (Ubuntu Jammy):
status: Triaged → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello Dimitri, or anyone else affected,

Accepted accel-config into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/accel-config/4.0-2~ubuntu0.22.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

description: updated
Changed in accel-config (Ubuntu Kinetic):
status: Fix Committed → Won't Fix
tags: removed: verification-needed-kinetic
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Kinetic has reached EOL, therefore verification not needed for this series.

Revision history for this message
quanxian (quanxian-wang) wrote :

testing from Intel

Platform: Sapphire Rapids

22.04.3: FAILED
===
Error: idxd kernel module not loaded
Root Cause: 6.2 kernel in 22.04.3 doesn’t contain idxd kernel module.
Kernel package: linux-image-6.2.0-26-generic
===

23.04: not sure pass or not.
===
run test_libaccfg
__accfg_test_skip: explicit skip test_libaccfg:906
device has no pasid support, skipping tests
test-libaccfg: SKIP
libaccfg: accfg_unref: context 0x55e66d7862a0 released
=====

Revision history for this message
quanxian (quanxian-wang) wrote :

kernel modules for idxd: this is from Ubuntu 23.10 nightly build.
/lib/modules/6.5.0-rc6+/kernel/drivers/dma/idxd
/lib/modules/6.5.0-rc6+/kernel/drivers/dma/idxd/idxd_bus.ko
/lib/modules/6.5.0-rc6+/kernel/drivers/dma/idxd/idxd.ko

Revision history for this message
quanxian (quanxian-wang) wrote :
Download full text (4.0 KiB)

we re-run this accel-config test on SPR platform with Ubutnu 23.04 after some configuration changes. The result seems good.

there are two parts.

1. accel-config test # general testing

It passed instead. Although reported some failed operations about ats_disable, Assume it matches the expectations.

root@test:~# accel-config test
run test_libaccfg

Running accfg-test0: set and get configurations for shared wqs
configuring device dsa0
configuring group group0.0
configuring group group0.1
configuring wq wq0.0
libaccfg: accfg_wq_set_ats_disable: wq0.0: ats_disable attribute write failed: Operation not supported
configuring wq wq0.2
libaccfg: accfg_wq_set_ats_disable: wq0.2: ats_disable attribute write failed: Operation not supported
configuring engine engine0.0
configuring engine engine0.1
configuring engine engine0.2
configuring engine engine0.3
check device dsa0
check group group0.0
check group group0.1
check wq wq0.0
check wq wq0.2
check engine engine0.0
check engine engine0.1
check engine engine0.2
check engine engine0.3
accfg-test0 passed!

Running accfg-test1: set and get configurations for dedicated wqs
configuring device dsa0
configuring group group0.0
configuring group group0.1
configuring wq wq0.1
libaccfg: accfg_wq_set_ats_disable: wq0.1: ats_disable attribute write failed: Operation not supported
configuring wq wq0.3
libaccfg: accfg_wq_set_ats_disable: wq0.3: ats_disable attribute write failed: Operation not supported
configuring engine engine0.0
configuring engine engine0.1
configuring engine engine0.2
configuring engine engine0.3
check device dsa0
check group group0.0
check group group0.1
check wq wq0.1
check wq wq0.3
check engine engine0.0
check engine engine0.1
check engine engine0.2
check engine engine0.3
accfg-test1 passed!

Running accfg-test2: max wq size
configuring group group0.0
configuring wq wq0.1
libaccfg: accfg_wq_set_ats_disable: wq0.1: ats_disable attribute write failed: Operation not supported
configuring wq wq0.3
libaccfg: accfg_wq_set_ats_disable: wq0.3: ats_disable attribute write failed: Operation not supported
trying to set wq size exceeding max wq size
libaccfg: accfg_wq_set_size: wq0.3: size attribute write failed: Invalid argument
wq size exceeding max wq size was not accepted
accfg-test2 passed!

Running accfg-test3: wq boundary conditions
configure device dsa0, group group0.0, wq wq0.1 for bounds test
libaccfg: accfg_wq_set_ats_disable: wq0.1: ats_disable attribute write failed: Operation not supported
trying to set wq max_batch_size = 0
libaccfg: accfg_wq_set_max_batch_size: wq0.1: max_batch_size attribute write failed: Invalid argument
trying to set wq max_transfer_size = 0
libaccfg: accfg_wq_set_max_transfer_size: wq0.1: write failed: Invalid argument
trying to set wq max_batch_size exceeding device max
libaccfg: accfg_wq_set_max_batch_size: wq0.1: max_batch_size attribute write failed: Invalid argument
trying to set wq max_transfer_size exceeding device max
libaccfg: accfg_wq_set_max_transfer_size: wq0.1: write failed: Invalid argument
0 and greater than device max values were not accepted
accfg-test3 passed!
test-libaccfg: PASS
SUCCESS!
libaccfg: accfg_unref: context 0x55aa58bdc2a0 released

2. DS...

Read more...

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote (last edit ):

Hello quanxian,

Thank you for the tests and the feedback!

The Ubuntu 6.2 kernel builds and ships the 'idxd' and 'idxd_bus' kernel modules as well. However, they are shipped by the 'linux-modules-extra*' package.

Could you please make sure you have this package installed (e.g. linux-modules-extra-6.2.0-26-generic for your kernel version, or a more recent one) and let us know what are the test results with the 6.2 kernel? It would be great if the tests could be performed on both 22.04 and 23.04.

Can you also please elaborate on the config changes you had to do to perform the test? Were they only related to intel IOMMU did you also need other changes? For the intel IOMMU ones (CONFIG_INTEL_IOMMU_DEFAULT_ON and CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON), we have disabled them because they were causing boot issues in some platforms. Are they really necessary to be turned on in the kernel config or could they be kept as a boot parameter?

Revision history for this message
pragyansri.pathi@intel.com (pragyan) wrote :

I will let Quanxian answer your questions.

(It is OK for intel IOMMU ones to not turn on in the OS, we enabled them at command-line, quiet straightforward, for all distros)

Revision history for this message
quanxian (quanxian-wang) wrote :

Thanks for your feedback.

For Ubuntu-22.04.3, we will setup Ubuntu 22.04.3 to retest with extra modules which contains idxd modules.

For config change, about the IOMMU, Pragyan has answered the question. by default, CONFIG_INTEL_IOMMU_DEFAULT_ON and CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON are not enabled. We can put switch into kernel command line to enable it when we do DSA/IAA testing.

Revision history for this message
quanxian (quanxian-wang) wrote :
Download full text (3.4 KiB)

hi, All

we have finished testing of Accel-config in ubuntu 22.04.3. We came across some issues and spent additional effort on that. After our DSA expert analysis, share with Canonical.

Test Summary
a. accel-config packages are fine. whatever for 3.5.2 or 4.0.2
b. testing for DSA: PASS
c. testing for accel-config: Skipped (kernel doesn't support)

Details:

Ubuntu 23.04:
accel-config - 3.5.2
kernel-6.2
accel-config passed (not reasonable, suggest to skip)
DSA - passed

Ubuntu 22.04.3
accel-config-4.0.2
kernel-6.2
accel- config test skipped
DSA - passed

Kernel command line enabling: adding option ‘intel_iommu=on,sm_on’ to enable CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON and CONFIG_INTEL_IOMMU_DEFAULT_ON

Root cause the accel-config testing skip result of Ubuntu 22.04.3, about commits, put more below.
1. 6.2 kernel has one commit which disable PASID, so accel-config test should be skipped
commit 942fd5435dccb273f90176b046ae6bbba60cfbd8
iommu: Remove SVM_FLAG_SUPERVISOR_MODE support
2. accel-config user space commit(4.0 version) change the PASID testing to 0, which cause accel-config skipped.

that is why 3.5.0 accel-config passed(not reasonable), and 4.0.2 skipped.

Therefore, suggestion from Intel:
skip accel-config testing before PASID is enabled whatever for Ubuntu 22.04.3 or Ubuntu 23.04
DSA testing - currently all passed

Here are the testing results for Ubuntu 23.04 and Ubuntu 22.04.3

Ubuntu 23.04: read comment #9
Ubuntu 22.04.3:
==========
root@test:~# accel-config test
run test_libaccfg
__accfg_test_skip: explicit skip test_libaccfg:906
device has no pasid support, skipping tests
test-libaccfg: SKIP

root@test:~/idxd-config/test/configs# accel-config load-config -c ./2g2q_user_1.conf -e
Enabling device dsa0
Enabling wq wq0.1
Enabling wq wq0.0

root@test:~/idxd-config/test# ./dsa_test -w 0 -l 4096 -f 0x1 -o 0x3 -t 200 -v
[debug] umwait supported
[ info] alloc wq 1 dedicated size 16 addr 0x7f4e4ed10000 batch sz 0x400 xfer sz 0x80000000
[ info] testmemory: opcode 3 len 0x1000 tflags 0x1 num_desc 1
[debug] initializing task 0x55b1bc783110
[debug] Mem allocated: s1 0x55b1bc7832c0 s2 0 d1 0x55b1bc784320 d2 0
[ info] preparing descriptor for memcpy
[ info] Submitted all memcpy jobs
[debug] desc addr: 0x55b1bc7831d0
[debug] desc[0]: 0x0300000e00000000
[debug] desc[1]: 0x000055b1bc783220
[debug] desc[2]: 0x000055b1bc7832c0
[debug] desc[3]: 0x000055b1bc784320
[debug] desc[4]: 0x0000000000001000
[debug] desc[5]: 0x0000000000000000
[debug] desc[6]: 0x0000000000000000
[debug] desc[7]: 0x0000000000000000
[debug] completion record addr: 0x55b1bc783220
[debug] compl[0]: 0x0000000000000001
[debug] compl[1]: 0x0000000000000000
[debug] compl[2]: 0x0000000000000000
[debug] compl[3]: 0x0000000000000000
[ info] verifying task result for 0x55b1bc783110
=======

Commits:
accel-config(4.0):
commit f13c2cb0ddec6ff8db41e3cac157f5ab05d23819
Author: Ramesh Thomas <email address hidden>
Date: Tue Mar 7 18:52:12 2023 -0500

    accel-config: Fix bug in return value of pasid enabled check

    API to check whether pasid is enabled was returning a character string
    that was read from sysfs. The expected return value is boolean. Store
    the pasid_enabled ...

Read more...

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hello quanxian,

Thank you again for your quick reply.

The accel-config available in lunar/universe (23.04) is indeed version 3.5.2-1, however, we have uploaded the new 4.0-2~ubuntu0.23.04 version to lunar-proposed/universe. To test the new version the system must have the -proposed pocket enabled. Could you please enabled -proposed in your test system and verify if the new version behaves as expected?

If I understood correctly, the accel-config (skipped) and DSA (passed) test results would be the expected behavior with accel-config 4.0 for 23.04 and 22.04.3 with the 6.2 kernel, is that correct?

Revision history for this message
quanxian (quanxian-wang) wrote :

yes, for accel-config 4.0, whatever 22.04.3 or 23.04, accel-config test results are skipped and DSA PASS.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

accel-config 4.0 confirmed by Intel to be working as expected. Marking verification as done.

tags: added: verification-done verification-done-jammy verification-done-lunar
removed: verification-needed verification-needed-jammy verification-needed-lunar
Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks everyone!

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hi Kleber,

it's unclear to me if your request from comment #15 was done. In comment #14, we see that the lunar verification was done with "accel-config - 3.5.2": that's the version in the lunar release pocket, not lunar-proposed, as you noted.

Then in comment #16, we just have a statement that with "accel-config 4.0" the accel-config tests are skipped, and DSA passed.

The test plan doesn't mention any case of a "skipped" test, so I'm unclear whether this is a success or not. Furthermore, it's a bit unambiguous to state just "accel-config 4.0": was it with the packages from proposed? The version in lunar-proposed is "4.0-2~ubuntu0.23.04".

It all comes down again to the simple rule of following the test plan and stating clearly which packages were used, and the clear steps that were followed. Comments #16 and #17 are vague in that regard. Robie asked in comment #1:

"Report of a successful verification should state the kernel version used as well as the version of accel-config and the specific steps performed."

In addition, please clarify unambiguously if skipped tests are OK for these Lunar and Jammy LTS updates, why, and update the test plan stating where they are expected, because right now it just says "3. the script will report failures if they occur".

Revision history for this message
Chris Halse Rogers (raof) wrote :

I've flipped the tags back to “verification-needed” as Andreas' questions are yet to be answered. Hopefully this makes it more clear what needs to happen for this to move into the -updates pocket.

tags: added: verification-needed verification-needed-jammy verification-needed-lunar
removed: verification-done verification-done-jammy verification-done-lunar
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.