geneve overlay network on vlan interface broken with offload enabled

Bug #1914447 reported by James Page
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned
Focal
Fix Released
Medium
Stefan Bader
Groovy
Fix Released
Medium
Stefan Bader

Bug Description

[SRU Justification]

Impact: In upstream v5.2 geneve tunnel stateless offload support was added to the mlx5 driver. This had some issue with VLANs where the VLAN ID was set by the driver even when offload support was enabled.

Fix: Upstream (v5.11-rc3) commit 378d3783412e38dc3a2b9d524f551c0008ea314a "net/mlx5e: Fix SWP offsets when vlan inserted by driver" was backported (dropping some code because it did not yet exist in 5.8) and verified to address the problem.

Testcase: Enable geneve tunnel offload support on a mlx5(e) card over VLAN.

Regression potential: The modified code path is sending packets tagged for VLAN(s), so outgoing traffic into VLAN(s) would most likely be impacted.

--- original description ---

Mellanox Connect-X 5 network card

When using geneve overlay networks over a vlan interface, txvlan offload currently has to be disabled as it interferes with the network traffic causing general wonkyness.

Mellanox engineering pointed us at:

  https://www.spinics.net/lists/netdev/msg711911.html

as a likely fix for this issue.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.8.0-41-generic 5.8.0-41.46~20.04.1
ProcVersionSignature: Ubuntu 5.8.0-40.45~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-40-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.14
Architecture: amd64
CasperMD5CheckResult: skip
Date: Wed Feb 3 15:34:23 2021
ProcEnviron:
 TERM=screen-256color-bce
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe-5.8
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.logrotate.d.apport: [modified]
mtime.conffile..etc.logrotate.d.apport: 2021-02-03T15:17:01.792261

CVE References

Revision history for this message
James Page (james-page) wrote :
Junien F (axino)
tags: added: ps5
Revision history for this message
Stefan Bader (smb) wrote :

I just uploaded a test kernel to https://launchpad.net/~smb/+archive/ubuntu/focal (currently building) with a backport of the suggested patch. The 5.8 code seems to differ from latest upstream quite a bit. Instead of several places which get adapted upstream there is basically only one caller that gets modified.

affects: linux-signed-hwe-5.8 (Ubuntu) → linux-hwe-5.8 (Ubuntu)
Changed in linux-hwe-5.8 (Ubuntu):
assignee: nobody → Stefan Bader (smb)
status: New → In Progress
importance: Undecided → Medium
tags: added: patch
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Hello Stefan,

Thank you so much for providing backport+built kernel at such short notice.

We can confirm that by using this kernel in a machine with a Mellanox Connect-X 5 card in legacy-mode, we can leverage Stateless offload of Geneve tunnel packets sent over a VLAN interface.

Cheers!

Revision history for this message
Stefan Bader (smb) wrote :

Thanks for the feedback Frode. Just to explain further modifications to this bug report. For SRU this will be targeted at the 20.10/Groovy 5.8 kernel. When it lands there it will automatically be included in the next 20.04/Focal HWE kernel.

The patch comes from upstream v5.11-rc3, so it should be included in the final 21.04/Hirsute kernel.

affects: linux-hwe-5.8 (Ubuntu) → linux (Ubuntu)
no longer affects: linux (Ubuntu Focal)
Changed in linux (Ubuntu Groovy):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
assignee: Stefan Bader (smb) → nobody
status: In Progress → Triaged
Stefan Bader (smb)
description: updated
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Revision history for this message
James Page (james-page) wrote :

Please can this be considered for Linux 5.4 as well.

Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
importance: Medium → High
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
assignee: nobody → Stefan Bader (smb)
importance: High → Medium
status: New → Triaged
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
status: Triaged → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.8.0-45.51

---------------
linux (5.8.0-45.51) groovy; urgency=medium

  * groovy/linux: 5.8.0-45.51 -proposed tracker (LP: #1916143)

  * Please trust Canonical Livepatch Service kmod signing key (LP: #1898716)
    - [Config] enable CONFIG_MODVERSIONS=y
    - [Packaging] build canonical-certs.pem from branch/arch certs
    - [Config] add Canonical Livepatch Service key to SYSTEM_TRUSTED_KEYS
    - [Config] add ubuntu-drivers key to SYSTEM_TRUSTED_KEYS
    - [Config] Allow ASM_MODVERSIONS and MODULE_REL_CRCS

  * CVE-2021-20194
    - bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
    - bpf, cgroup: Fix problematic bounds check

  * Missing device id for Intel TGL-H ISH [8086:43fc] in intel-ish-hid driver
    (LP: #1914543)
    - HID: intel-ish-hid: ipc: Add Tiger Lake H PCI device ID

  * Prevent thermal shutdown during boot process (LP: #1906168)
    - thermal/core: Emit a warning if the thermal zone is updated without ops
    - thermal/core: Add critical and hot ops
    - thermal/drivers/acpi: Use hot and critical ops
    - thermal/drivers/rcar: Remove notification usage
    - thermal: int340x: Fix unexpected shutdown at critical temperature
    - thermal: intel: pch: Fix unexpected shutdown at critical temperature

  * geneve overlay network on vlan interface broken with offload enabled
    (LP: #1914447)
    - net/mlx5e: Fix SWP offsets when vlan inserted by driver

  * Groovy update: upstream stable patchset 2021-02-11 (LP: #1915473)
    - net: cdc_ncm: correct overhead in delayed_ndp_size
    - net: hns3: fix the number of queues actually used by ARQ
    - net: hns3: fix a phy loopback fail issue
    - net: stmmac: dwmac-sun8i: Balance internal PHY resource references
    - net: stmmac: dwmac-sun8i: Balance internal PHY power
    - net: vlan: avoid leaks on register_vlan_dev() failures
    - net/sonic: Fix some resource leaks in error handling paths
    - net: ipv6: fib: flush exceptions when purging route
    - tools: selftests: add test for changing routes with PTMU exceptions
    - net: fix pmtu check in nopmtudisc mode
    - net: ip: always refragment ip defragmented packets
    - octeontx2-af: fix memory leak of lmac and lmac->name
    - nexthop: Fix off-by-one error in error path
    - nexthop: Unlink nexthop group entry in error path
    - s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
    - net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
    - net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
    - net/mlx5e: ethtool, Fix restriction of autoneg with 56G
    - chtls: Fix hardware tid leak
    - chtls: Remove invalid set_tcb call
    - chtls: Fix panic when route to peer not configured
    - chtls: Replace skb_dequeue with skb_peek
    - chtls: Added a check to avoid NULL pointer dereference
    - chtls: Fix chtls resources release sequence
    - HID: wacom: Fix memory leakage caused by kfifo_alloc
    - ARM: OMAP2+: omap_device: fix idling of devices during probe
    - i2c: sprd: use a specific timeout to avoid system hang up issue
    - dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
    - can: tcan4x5x: fix bittiming const...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (28.5 KiB)

This bug was fixed in the package linux - 5.4.0-67.75

---------------
linux (5.4.0-67.75) focal; urgency=medium

  * focal/linux: 5.4.0-67.75 -proposed tracker (LP: #1916169)

  * Please trust Canonical Livepatch Service kmod signing key (LP: #1898716)
    - [Config] enable CONFIG_MODVERSIONS=y
    - [Packaging] build canonical-certs.pem from branch/arch certs
    - [Config] add Canonical Livepatch Service key to SYSTEM_TRUSTED_KEYS
    - [Config] add ubuntu-drivers key to SYSTEM_TRUSTED_KEYS
    - [Config] Allow ASM_MODVERSIONS and MODULE_REL_CRCS

  * geneve overlay network on vlan interface broken with offload enabled
    (LP: #1914447)
    - net/mlx5e: Fix SWP offsets when vlan inserted by driver

  * Add support for selective build of special drivers (LP: #1912789)
    - [Packaging] Fix ODM support in actual build

  * devlink: don't do reporter recovery if the state is healthy (LP: #1915403)
    - devlink: don't do reporter recovery if the state is healthy

  * Missing device id for Intel TGL-H ISH [8086:43fc] in intel-ish-hid driver
    (LP: #1914543)
    - HID: intel-ish-hid: ipc: Add Tiger Lake H PCI device ID

  * Focal update: v5.4.94 upstream stable release (LP: #1915200)
    - gpio: mvebu: fix pwm .get_state period calculation
    - futex: Ensure the correct return value from futex_lock_pi()
    - futex: Replace pointless printk in fixup_owner()
    - futex: Provide and use pi_state_update_owner()
    - rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
    - futex: Use pi_state_update_owner() in put_pi_state()
    - futex: Simplify fixup_pi_state_owner()
    - futex: Handle faults correctly for PI futexes
    - HID: wacom: Correct NULL dereference on AES pen proximity
    - io_uring: Fix current->fs handling in io_sq_wq_submit_work()
    - tracing: Fix race in trace_open and buffer resize call
    - arm64: mm: use single quantity to represent the PA to VA translation
    - SMB3.1.1: do not log warning message if server doesn't populate salt
    - tools: Factor HOSTCC, HOSTLD, HOSTAR definitions
    - dm integrity: conditionally disable "recalculate" feature
    - writeback: Drop I_DIRTY_TIME_EXPIRE
    - fs: fix lazytime expiration handling in __writeback_single_inode()
    - Linux 5.4.94

  * Focal update: v5.4.93 upstream stable release (LP: #1915195)
    - i2c: bpmp-tegra: Ignore unknown I2C_M flags
    - platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
    - ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
    - ALSA: hda/via: Add minimum mute flag
    - ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
    - btrfs: don't get an EINTR during drop_snapshot for reloc
    - btrfs: fix lockdep splat in btrfs_recover_relocation
    - btrfs: don't clear ret in btrfs_start_dirty_block_groups
    - btrfs: send: fix invalid clone operations when cloning from the same file
      and root
    - mmc: core: don't initialize block size from ext_csd if not present
    - mmc: sdhci-xenon: fix 1.8v regulator stabilization
    - dm: avoid filesystem lookup in dm_get_dev_t()
    - dm integrity: fix a crash if "recalculate" used without "internal_hash"
    - drm/atomic: put...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Paul Goins (vultaire) wrote :

I found this bug listed in a doc for a customer environment running Bionic, as something that was affecting them as well and for which we were waiting for a fix. I noticed today that this bug is only marked for focal and groovy.

Is there a reason for this to be only focal and newer, or is it simply that it hadn't been reported for bionic yet?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.