BCM57800 SRIOV bug causes interfaces to disappear

Bug #1945707 reported by Andre Ruiz
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Thadeu Lima de Souza Cascardo
Bionic
Fix Committed
Medium
Thadeu Lima de Souza Cascardo
Focal
Fix Committed
Medium
Thadeu Lima de Souza Cascardo
Hirsute
Fix Released
Medium
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
bnx2x driver won't add all devices ports/interfaces.

[Test case]
Boot system with bnx2x device and verify all ports/interfaces have been added.

[Potential regression]
bnx2x devices won't be properly probed. Devices won't be added or SR-IOV won't be correctly supported.

--- Original Description ---

Works with focal kernel 5.4.0-80
Broken with focal kernel 5.4.0-88

On a Dell R720 with the BCM57800 based 1/10 Gigabit Integrated Network cards Kernel 5.11.22-3 causes half of the network interfaces to disappear specifically the 1gb ports. Commands like "ip link show" and "dmesg" no longer show eno3 and eno4 nor any other interface name for these ports. I've read the note in the release notes and this does not appear to be a case of the interface changing names, the 3rd and 4th interface don't show up at all.

The card is based on the BCM57800 chipset and has two SFP+ and two gigabit ports on the same card. Commands like "ip link show" no longer show ports 3 and 4. "lspci" still shows four items. dmesg only shows the first two interfaces.

This problem seems to be known upstream, and seems to be a regression.

More information at https://bugzilla.proxmox.com/show_bug.cgi?id=3558

This is being seen at a customer during an openstack install. It would be appreciated if a workaround could be provided or the fix could be prioritized. Using standard Focal 20.04 LTS kernel (it installs ok with the working kernel then upgrades to the non-working one -- this is done through maas and is difficult to control).

Tested other kernels like hwe-* and all seem to be affected too.

Client does not want to disable SRIOV on whole card and also cannot disable only ports 3/4 (the bios will not allow it).

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1945707

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

This seems to be the upstream bug:

https://bugzilla.kernel.org/show_bug.cgi?id=214297

Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu Hirsute):
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Focal):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Hirsute):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

[Impact]
bnx2x driver won't add all devices ports/interfaces.

[Test case]
Boot system with bnx2x device and verify all ports/interfaces have been added.

[Potential regression]
bnx2x devices won't be properly probed. Devices won't be added or SR-IOV won't be correctly supported.

Stefan Bader (smb)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-hwe-5.8/5.8.0-66.74 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Stefan Bader (smb)
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.11.0-39.43 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-21.21 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hey, Andre.

As we talked, this is applied to 5.4 kernels as well, 5.4.0-90 should have it.

Can you test that it fixes the problem?

Thanks.
Cascardo.

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

Sure! I cannot redeploy right now because I'm using the cloud to chase another bug, bug will do asap and report back.

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.0 KiB)

This bug was fixed in the package linux - 5.11.0-40.44

---------------
linux (5.11.0-40.44) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-40.44 -proposed tracker (LP: #1947876)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.10.18)

linux (5.11.0-39.43) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-39.43 -proposed tracker (LP: #1947227)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.10.18)

  * Add final-checks to check certificates (LP: #1947174)
    - [Packaging] Add system trusted and revocation keys final check

  * No sound on Lenovo laptop models Legion 15IMHG05, Yoga 7 14ITL5, and 13s
    Gen2 (LP: #1939052)
    - ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i
      15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops.
    - ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo 13s
      Gen2

  * Fix cold plugged USB device on certain PCIe USB cards (LP: #1945211)
    - Revert "UBUNTU: SAUCE: Revert "usb: core: reduce power-on-good delay time of
      root hub""
    - usb: core: hcd: Add support for deferring roothub registration
    - xhci: Set HCD flag to defer primary roothub registration
    - usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()

  * Hirsute update: upstream stable patchset 2021-10-12 (LP: #1946788)
    - locking/mutex: Fix HANDOFF condition
    - regmap: fix the offset of register error log
    - regulator: tps65910: Silence deferred probe error
    - crypto: mxs-dcp - Check for DMA mapping errors
    - sched/deadline: Fix reset_on_fork reporting of DL tasks
    - power: supply: axp288_fuel_gauge: Report register-address on readb / writeb
      errors
    - crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
    - sched/deadline: Fix missing clock update in migrate_task_rq_dl()
    - rcu/tree: Handle VM stoppage in stall detection
    - EDAC/mce_amd: Do not load edac_mce_amd module on guests
    - hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
    - hrtimer: Ensure timerfd notification for HIGHRES=n
    - udf: Check LVID earlier
    - udf: Fix iocharset=utf8 mount option
    - isofs: joliet: Fix iocharset=utf8 mount option
    - bcache: add proper error unwinding in bcache_device_init
    - blk-throtl: optimize IOPS throttle for large IO scenarios
    - nvme-tcp: don't update queue count when failing to set io queues
    - nvme-rdma: don't update queue count when failing to set io queues
    - nvmet: pass back cntlid on successful completion
    - power: supply: smb347-charger: Add missing pin control activation
    - power: supply: max17042_battery: fix typo in MAx17042_TOFF
    - s390/cio: add dev_busid sysfs entry for each subchannel
    - s390/zcrypt: fix wrong offset index for APKA master key valid state
    - libata: fix ata_host_start()
    - crypto: omap - Fix inconsistent locking of device lists
    - crypto: qat - do not ignore errors from enable_vf2pf_comms()
    - crypto: qat - handle both source of interrupt in VF ISR
    - crypto: qat - fix reuse of completion variable
    -...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.5 KiB)

This bug was fixed in the package linux - 5.13.0-21.21

---------------
linux (5.13.0-21.21) impish; urgency=medium

  * impish/linux: 5.13.0-21.21 -proposed tracker (LP: #1947347)

  * It hangs while booting up with AMD W6800 [1002:73A3] (LP: #1945553)
    - drm/amdgpu: Rename flag which prevents HW access
    - drm/amd/pm: Fix a bug communicating with the SMU (v5)
    - drm/amd/pm: Fix a bug in semaphore double-lock

  * Add final-checks to check certificates (LP: #1947174)
    - [Packaging] Add system trusted and revocation keys final check

  * No sound on Lenovo laptop models Legion 15IMHG05, Yoga 7 14ITL5, and 13s
    Gen2 (LP: #1939052)
    - ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i
      15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops.
    - ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo 13s
      Gen2

  * Check for changes relevant for security certifications (LP: #1945989)
    - [Packaging] Add a new fips-checks script
    - [Packaging] Add fips-checks as part of finalchecks

  * BCM57800 SRIOV bug causes interfaces to disappear (LP: #1945707)
    - bnx2x: Fix enabling network interfaces without VFs

  * CVE-2021-3759
    - memcg: enable accounting of ipc resources

  * [impish] Remove the downstream xr-usb-uart driver (LP: #1945938)
    - SAUCE: xr-usb-serial: remove driver
    - [Config] update modules list

  * Fix A yellow screen pops up in an instant (< 1 second) and then disappears
    before loading the system (LP: #1945932)
    - drm/i915: Stop force enabling pipe bottom color gammma/csc

  * Impish update: v5.13.18 upstream stable release (LP: #1946249)
    - Linux 5.13.18

  * Impish update: v5.13.17 upstream stable release (LP: #1946247)
    - locking/mutex: Fix HANDOFF condition
    - regmap: fix the offset of register error log
    - regulator: tps65910: Silence deferred probe error
    - crypto: mxs-dcp - Check for DMA mapping errors
    - sched/deadline: Fix reset_on_fork reporting of DL tasks
    - power: supply: axp288_fuel_gauge: Report register-address on readb / writeb
      errors
    - crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
    - sched/deadline: Fix missing clock update in migrate_task_rq_dl()
    - rcu/tree: Handle VM stoppage in stall detection
    - EDAC/mce_amd: Do not load edac_mce_amd module on guests
    - hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
    - hrtimer: Ensure timerfd notification for HIGHRES=n
    - udf: Check LVID earlier
    - udf: Fix iocharset=utf8 mount option
    - isofs: joliet: Fix iocharset=utf8 mount option
    - bcache: add proper error unwinding in bcache_device_init
    - nbd: add the check to prevent overflow in __nbd_ioctl()
    - blk-throtl: optimize IOPS throttle for large IO scenarios
    - nvme-tcp: don't update queue count when failing to set io queues
    - nvme-rdma: don't update queue count when failing to set io queues
    - nvmet: pass back cntlid on successful completion
    - power: supply: smb347-charger: Add missing pin control activation
    - power: supply: max17042_battery: fix typo in MAx17042_TOFF
    - s390/cio: add dev_busid sysfs entry f...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

I can confirm that this is fixed for focal GA kernel. Kernel 5.4.0-89 still have the problem and kernel 5.4.0-90 is fixed, I can see all nics on the card now.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

I changed the tag for "verification-needed-focal" to "verification-done-focal", but I would like to add that the version of the kernel listed on the "focal" fix above is 5.8, when in fact the GA kernel for focal is 5.4.

Kernel 5.8 has been an HWE kernel for focal in the past but even then it's already superceed by 5.11 and now 5.13. I think 5.4 is not mentioned here because the change came from upstream and not from a local patch.

Anyway, 5.4.0-90 has the fix.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.