Kernel crash when setting vxlan tunnel over the mlx4_en when acting as PF

Bug #1407760 reported by Brian Fromme
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Utopic
Fix Released
Medium
Unassigned

Bug Description

Eyal Perry (~eyalpe) writes:

[Impact]
When enabling SR-IOV on Mellanox devices and setting vxlan tunnel over the PF device (adding vxlan port to an OVS bridge),
the kernel is crashing as result of the unimplemented ndos: .ndo_{add,del}_vxlan_port

[Fix]
This issue was fixed in upstream kernel:
9737c6a net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too

~/linux$ git describe --contains 9737c6ab7afbc950e997ef80cba2c40dbbd16ea4
v3.18-rc6~9^2~12

I've backported it to 3.16 (by removing the "yet to be introduced" ndo_gso_check) and attached the backported version here.

The patch was built and tested on top of ubuntu-trusty: abb1293 ("UBUNTU: Ubuntu-lts-3.16.0-29.39~14.04.1")

[Test Case]
Use the affected Mellanox device w/ an Ubuntu 3.16 kernel. Setup vxlan tunnel over PF device. This should succeed.

CVE References

Revision history for this message
Brian Fromme (brianfromme) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1407760

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Brian Fromme (brianfromme) wrote :

This question is for the Mellanox engineering team.

Do you want this in the Trusty GA 3.13 kernel, or will the 3.16 based lts-utopic kernel backport in Trusty suffice?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Brian,

Can you also send your backport of the commit and request for SRU to the kernel team mailing list:
<email address hidden>

tags: added: kernel-key utopic
tags: added: bot-stop-nagging
tags: added: kernel-da-key
removed: kernel-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
Brian Fromme (brianfromme) wrote :

Here's an answer for the Trusty version question:

The mlx4 driver in 3.13-stable does not include VXLAN support, so does not need this patch. The driver in Ubuntu Trusty does include that support and the affected line of code, so does need this patch.

Chris J Arges (arges)
no longer affects: linux (Ubuntu Trusty)
Chris J Arges (arges)
description: updated
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Utopic):
status: New → In Progress
importance: Undecided → Medium
description: updated
summary: - Kernel crash when setting vxlan tunnel over the mlx4_en when acting as
- PF
+ [SRU][PATCH] [Utopic] Kernel crash when setting vxlan tunnel over the
+ mlx4_en when acting as PF
Revision history for this message
Brian Fromme (brianfromme) wrote : Re: [SRU][PATCH] [Utopic] Kernel crash when setting vxlan tunnel over the mlx4_en when acting as PF

Updating PATCH as per instructions from Chris re: SRU process.

Revision history for this message
Brian Fromme (brianfromme) wrote :

In response to Comment #3 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1407760/comments/3):

Mellanox states that this should also be backported to 3.13 (Trusty). Eyal Perry believes that this patch should apply cleanly. If there are any questions about that, he is happy to help.

Brad Figg (brad-figg)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Eyal Perry (eyalpe)
tags: added: verification-done-utopic
removed: verification-needed-utopic
Chris J Arges (arges)
description: updated
Changed in linux (Ubuntu Trusty):
assignee: nobody → Chris J Arges (arges)
importance: Undecided → Medium
status: New → In Progress
Chris J Arges (arges)
Changed in linux (Ubuntu Trusty):
assignee: Chris J Arges (arges) → nobody
Chris J Arges (arges)
summary: - [SRU][PATCH] [Utopic] Kernel crash when setting vxlan tunnel over the
- mlx4_en when acting as PF
+ Kernel crash when setting vxlan tunnel over the mlx4_en when acting as
+ PF
Brad Figg (brad-figg)
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.2 KiB)

This bug was fixed in the package linux - 3.16.0-30.40

---------------
linux (3.16.0-30.40) utopic; urgency=low

  [ Seth Forshee ]

  * Release Tracking Bug
    - LP: #1409890

  [ Andy Whitcroft ]

  * Revert "SAUCE: scsi: hyper-v storsvc switch up to SPC-3"
  * [Packaging] uploadnum should be the remainder of the version
    - LP: #1407755

  [ K. Y. Srinivasan ]

  * SAUCE: storvsc: force SPC-3 compliance on win8 and win8 r2 hosts
    - LP: #1406867

  [ Upstream Kernel Changes ]

  * Revert "xhci: clear root port wake on bits if controller isn't wake-up
    capable"
    - LP: #1408697
  * KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
    - LP: #1400209
  * powerpc/powernv: Ignore smt-enabled on Power8 and later
    - LP: #1402141
  * net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
    - LP: #1407760
  * net/mlx4_core: Enable CQE/EQE stride support
    - LP: #1400127
  * net/mlx4_core: Cache line EQE size support
    - LP: #1400127
  * net/mlx4_en: Add mlx4_en_get_cqe helper
    - LP: #1400127
  * net/mlx4_core: Introduce mlx4_get_module_info for cable module info
    reading
    - LP: #1400127
  * ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool
    support
    - LP: #1400127
  * net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
    - LP: #1400127
  * net/mlx4_core: Add ethernet backplane autoneg device capability
    - LP: #1400127
  * ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting
    support
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to query ethtool settings
    - LP: #1400127
  * net/mlx4_en: Use PTYS register to set ethtool settings (Speed)
    - LP: #1400127
  * net/mlx4_en: Add support for setting rxvlan offload OFF/ON
    - LP: #1400127
  * net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
    - LP: #1400127
  * net/mlx4_core: Prevent VF from changing port configuration
    - LP: #1400127
  * net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set
    - LP: #1400127
  * sparc64: Fix constraints on swab helpers.
    - LP: #1408697
  * inetdevice: fixed signed integer overflow
    - LP: #1408697
  * ipv4: Fix incorrect error code when adding an unreachable route
    - LP: #1408697
  * ieee802154: fix error handling in ieee802154fake_probe()
    - LP: #1408697
  * qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
    - LP: #1408697
  * bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
    - LP: #1408697
  * pptp: fix stack info leak in pptp_getname()
    - LP: #1408697
  * ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
    - LP: #1408697
  * net/mlx4_en: Advertize encapsulation offloads features only when VXLAN
    tunnel is set
    - LP: #1408697
  * target: Don't call TFO->write_pending if data_length == 0
    - LP: #1408697
  * vhost-scsi: Take configfs group dependency during
    VHOST_SCSI_SET_ENDPOINT
    - LP: #1408697
  * srp-target: Retry when QP creation fails with ENOMEM
    - LP: #1408697
  * ASoC: fsi: remove unsupported PAUSE flag
    - LP: #1408697
  * ASoC: rsnd: remove unsupported PAUSE flag
    - LP: #1408697
  * ib_isert: Add max_send_sge...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Eyal Perry (eyalpe)
tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.8 KiB)

This bug was fixed in the package linux - 3.13.0-46.75

---------------
linux (3.13.0-46.75) trusty; urgency=low

  [ Seth Forshee ]

  * Release Tracking Bug
    - LP: #1419963

  [ Andy Whitcroft ]

  * [Debian] arm64 -- build ubuntu drivers
    - LP: #1411284
  * hyper-v -- fix comment handing in /etc/network/interfaces
    - LP: #1413020

  [ Kamal Mostafa ]

  * [Packaging] force "dpkg-source -I -i" behavior

  [ Upstream Kernel Changes ]

  * Revert "[SCSI] mpt2sas: Remove phys on topology change."
    - LP: #1419838
  * Revert "[SCSI] mpt3sas: Remove phys on topology change"
    - LP: #1419838
  * Btrfs: fix transaction abortion when remounting btrfs from RW to RO
    - LP: #1411320
  * Btrfs: fix a crash of clone with inline extents's split
    - LP: #1413129
  * net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
    - LP: #1407760
  * KVM: x86: SYSENTER emulation is broken
    - LP: #1414651
    - CVE-2015-0239
  * powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
    - LP: #1415919
  * ipv6: fix swapped ipv4/ipv6 mtu_reduced callbacks
    - LP: #1404558, #1419837
  * usb: gadget: at91_udc: move prepare clk into process context
    - LP: #1419837
  * KVM: x86: Fix far-jump to non-canonical check
    - LP: #1419837
  * x86/tls: Validate TLS entries to protect espfix
    - LP: #1419837
  * userns: Check euid no fsuid when establishing an unprivileged uid
    mapping
    - LP: #1419837
  * userns: Document what the invariant required for safe unprivileged
    mappings.
    - LP: #1419837
  * userns: Only allow the creator of the userns unprivileged mappings
    - LP: #1419837
  * x86_64, switch_to(): Load TLS descriptors before switching DS and ES
    - LP: #1419837
  * isofs: Fix infinite looping over CE entries
    - LP: #1419837
  * batman-adv: Calculate extra tail size based on queued fragments
    - LP: #1419837
  * KEYS: close race between key lookup and freeing
    - LP: #1419837
  * isofs: Fix unchecked printing of ER records
    - LP: #1419837
  * x86_64, vdso: Fix the vdso address randomization algorithm
    - LP: #1419837
  * groups: Consolidate the setgroups permission checks
    - LP: #1419837
  * userns: Don't allow setgroups until a gid mapping has been setablished
    - LP: #1419837
  * userns: Don't allow unprivileged creation of gid mappings
    - LP: #1419837
  * move d_rcu from overlapping d_child to overlapping d_alias
    - LP: #1419837
  * deal with deadlock in d_walk()
    - LP: #1419837
  * Linux 3.13.11-ckt14
    - LP: #1419837
  * gre: fix the inner mac header in nbma tunnel xmit path
    - LP: #1419838
  * netlink: Always copy on mmap TX.
    - LP: #1419838
  * netlink: Don't reorder loads/stores before marking mmap netlink frame
    as available
    - LP: #1419838
  * in6: fix conflict with glibc
    - LP: #1419838
  * tg3: tg3_disable_ints using uninitialized mailbox value to disable
    interrupts
    - LP: #1419838
  * batman-adv: Unify fragment size calculation
    - LP: #1419838
  * batman-adv: avoid NULL dereferences and fix if check
    - LP: #1419838
  * net: Fix stacked vlan offload features computation
    - LP: #1419838
  * net: Reset secmark when scrubbing packet
    - L...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.