[Ubuntu 20.04] net/mlx5e: Fix endianness handling in pedit mask

Bug #1872726 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Medium
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Canonical Kernel Team
Focal
Fix Released
Undecided
Canonical Kernel Team
Groovy
Fix Released
Undecided
Canonical Kernel Team

Bug Description

SRU Justification:
==================

[Impact]

* An issue with the endianess handling in the Mellanox mlx5 driver was found.

* The mask value is provided as 64 bit and has to be casted in either 32 or 16 bit.

* On big endian systems the wrong half was casted which resulted in an all zero mask.

[Fix]

* Backport: https://launchpadlibrarian.net/476243025/0001-net-mlx5-fix-endianness-handling-in-pedit-mask.patch

[Test Case]

* An s390x system with RoCE Express 2(.1) system is needed and the driver loaded.

* Check whether the mask value stays zero, or if it also get's non-zero values.

* Functional testing is currently only doable by IBM, since we only have RoCE (1) hardware that uses the mlx4 driver.

[Regression Potential]

* There is regression potential is moderate, since:

* the RoCE 2(.1) cards are pretty new and not very wide spread, yet

* the fix got already upstream accepted with 5.6

* However, at the end the patch modifies Mellanox common code (drivers/net/ethernet/mellanox/mlx5/core/en_tc.c) to make the driver work correctly on s390x.

* but the changes were reviewed, signed off by Mellanox engineers and are very limited.

[Other Info]

* The above backport (patch-file) is based on commit 404402abd5f90aa90a134eb9604b1750c1941529 404402abd5f9 "net/mlx5e: Fix endianness handling in pedit mask" - the backport was needed for getting it applied to focal master-next.

* The commit itself got upstream accepted with kernel v5.6, hence should automatically land in 'gorilla', but since gorilla is still based on 5.4, I'm adding 'G' to this SRU.

__________

Issue found in the Mellanox mlx5 device driver:

The mask value is provided as 64 bit and has to be casted in
either 32 or 16 bit. On big endian systems the wrong half was
casted which resulted in an all zero mask.

We need to get the upstream commit picked up for the Ubuntu 20.04 kernel.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=404402abd5f90aa90a134eb9604b1750c1941529

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-185119 severity-medium targetmilestone-inin2004
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Medium
Frank Heimes (fheimes)
no longer affects: ubuntu
Revision history for this message
Frank Heimes (fheimes) wrote :

It's upstream accepted since 5.6:
$ git tag --contains 404402abd5f9 | grep ^v
v5.6
v5.7-rc1

Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Frank Heimes (fheimes) wrote :

There is unfortnately no simple and clean cherry-pick possible of this commit to the ubuntu focal master-next tree. git cherry-pick complains with:

Performing inexact rename detection: 100% (3285744/3285744), done.
Auto-merging drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
CONFLICT (content): Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
error: could not apply 404402abd5f9... net/mlx5e: Fix endianness handling in pedit mask
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

The status is:

On branch master-next
Your branch is ahead of 'origin/master-next' by 2 commits.
  (use "git push" to publish your local commits)

You are currently cherry-picking commit 404402abd5f9.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
 both modified: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

no changes added to commit (use "git add" and/or "git commit -a")

and diff:

diff --cc drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 1f9107d83848,ec5fc52bf572..000000000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@@ -2421,13 -2475,12 +2421,22 @@@ static int offload_pedit_fields(struct
                if (skip)
                        continue;

++<<<<<<< HEAD
 + field_bsize = f->size * BITS_PER_BYTE;
 +
 + if (field_bsize == 32) {
 + mask_be32 = *(__be32 *)&mask;
 + mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32));
 + } else if (field_bsize == 16) {
 + mask_be16 = *(__be16 *)&mask;
++=======
+ if (f->field_bsize == 32) {
+ mask_be32 = (__be32)mask;
+ mask = (__force unsigned long)cpu_to_le32(be32_to_cpu(mask_be32));
+ } else if (f->field_bsize == 16) {
+ mask_be32 = (__be32)mask;
+ mask_be16 = *(__be16 *)&mask_be32;
++>>>>>>> 404402abd5f9... net/mlx5e: Fix endianness handling in pedit mask
                        mask = (__force unsigned long)cpu_to_le16(be16_to_cpu(mask_be16));
                }

Since this touches common code, please double check what else is needed to get it cleanly in.
git mergetool shows a lot of differences between focal master-next and 5.6/5.7 upstream in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
So it needs to handled very carefully ...

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-04-23 08:56 EDT-------
Sebastian has supplied me with the following information for reproducing the issue and is currently working on a backport:

echo 1 > /sys/class/net/p0/device/sriov_numvfs
echo 0101:00:00.0 > /sys/bus/pci/drivers/mlx5_core/unbind
devlink dev eswitch set pci/0100:00:00.0 mode switchdev
ethtool -K p0 hw-tc-offload on
echo 0101:00:00.0 > /sys/bus/pci/drivers/mlx5_core/bind
tc qdisc add dev p0 ingress
tc qdisc add dev p0v0_r ingress
tc filter add dev p0v0_r protocol ip parent ffff:0 prio 1 flower skip_sw action pedit ex munge eth dst set 11:22:33:44:55:66 action goto chain 1

Without the patch the last command would fail with the following output in dmesg

[ 548.966006] mlx5_core 0100:00:00.0: mlx5_cmd_check:711:(pid 5220): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a)

This was only tested on Z but we suspect that the same problem should also occur on other Big Endian architectures

Revision history for this message
bugproxy (bugproxy) wrote : net/mlx5: fix endianness handling in pedit mask

------- Comment on attachment From <email address hidden> 2020-04-24 11:41 EDT-------

This is the upstream commit

404402abd5f90aa90a134eb9604b1750c1941529 ("net/mlx5: fix endianness handling in pedit mask")

backported for focal-master next. It has been running internally for a while so has seen testing on Z, as mentioned before we believe this should also be necessary for other Big Endian architectures.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Incomplete → Triaged
Revision history for this message
Frank Heimes (fheimes) wrote :

Fix:
404402abd5f9 "net/mlx5e: Fix endianness handling in pedit mask"
upstream with v5.6

Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-May/thread.html#109623
Changing status to 'In Progress'.

description: updated
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
status: Fix Committed → In Progress
Changed in linux (Ubuntu Groovy):
status: Fix Committed → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu Focal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-05-20 03:28 EDT-------
Fix already verified by IBM. Works as expected.

Frank Heimes (fheimes)
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-37.41

---------------
linux (5.4.0-37.41) focal; urgency=medium

  * CVE-2020-0543
    - SAUCE: x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2
    - SAUCE: x86/cpu: Add a steppings field to struct x86_cpu_id
    - SAUCE: x86/cpu: Add 'table' argument to cpu_matches()
    - SAUCE: x86/speculation: Add Special Register Buffer Data Sampling (SRBDS)
      mitigation
    - SAUCE: x86/speculation: Add SRBDS vulnerability and mitigation documentation
    - SAUCE: x86/speculation: Add Ivy Bridge to affected list

 -- Marcelo Henrique Cerri <email address hidden> Wed, 03 Jun 2020 11:24:23 -0300

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-oracle-5.4/5.4.0-1019.19~18.04.1)

All autopkgtests for the newly accepted linux-oracle-5.4 (5.4.0-1019.19~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

zfs-linux/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-oracle-5.4

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-42.46

---------------
linux (5.4.0-42.46) focal; urgency=medium

  * focal/linux: 5.4.0-42.46 -proposed tracker (LP: #1887069)

  * linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
    - SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

linux (5.4.0-41.45) focal; urgency=medium

  * focal/linux: 5.4.0-41.45 -proposed tracker (LP: #1885855)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2019-19642
    - kernel/relay.c: handle alloc_percpu returning NULL in relay_open

  * CVE-2019-16089
    - SAUCE: nbd_genl_status: null check for nla_nest_start

  * CVE-2020-11935
    - aufs: do not call i_readcount_inc()

  * ip_defrag.sh in net from ubuntu_kernel_selftests failed with 5.0 / 5.3 / 5.4
    kernel (LP: #1826848)
    - selftests: net: ip_defrag: ignore EPERM

  * Update lockdown patches (LP: #1884159)
    - SAUCE: acpi: disallow loading configfs acpi tables when locked down

  * seccomp_bpf fails on powerpc (LP: #1885757)
    - SAUCE: selftests/seccomp: fix ptrace tests on powerpc

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [packaging] add signed modules for the 418-server and the 440-server
      flavours

 -- Khalid Elmously <email address hidden> Thu, 09 Jul 2020 19:50:26 -0400

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.