[SRU] Ice driver causes the kernel to crash with Ubuntu 20.04.2 with ethtool specific register commands

Bug #1939855 reported by Shivani Lalit Changela
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Focal
Fix Released
High
Unassigned

Bug Description

[Impact]

When we run the command, ethtool -d <interface_name> with Intel cards (ice driver), the kernel crashes because of the ice driver.
The same works fine with HWE kernel (5.11). Here, we do not see any crash.

[FIXES]

ice: Fix bad register reads
The "ethtool -d" handler reads registers in the ice_regs_dump_list array
and returns read values back to the userspace.

commitID: 1fba4a8a92706c89716449b1aab1b6879f438d34

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/drivers/net/ethernet/intel/ice?id=1fba4a8a92706c89716449b1aab1b6879f438d34

[TESTING]

Install Focal on a system with an E810 network device.
Ensure the network device has an IP address and has connectivity
Run ethtool -d <interface_name>
Expected result: Prints a register dump for the specified network device

[REGRESSION RISK]

The regression risk is low

[OTHER INFO]

I have added the fix, built the kernel and tested the fix.

https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/focal/+ref/e810_ethtool_fix_next

affects: ubuntu-terminal-app → dellserver
information type: Private Security → Private
Revision history for this message
Shivani Lalit Changela (shivani1512) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :

I was unable to open the logs in comment #1 but I did recreate the issue and I am attaching a log.

Michael Reed (mreed8855)
information type: Private → Public
Revision history for this message
Jeff Lane  (bladernr) wrote :

Can you give us the firmware version for this failing controller?

Changed in dellserver:
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote :

Additionally, just for completeness, which cards are these (I know it's the E810 controller, but I'm wondering about specific card models)

Revision history for this message
Michael Reed (mreed8855) wrote :

This card is using firmware version 2.33

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1939855

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael Reed (mreed8855) wrote : Re: ice driver causes the kernel to crash with Ubuntu 20.04.2 with ethtool specific register commands

I am seeing this on 20.04.2.

$ uname -a
Linux 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Michael Reed (mreed8855) wrote :
Revision history for this message
Michael Reed (mreed8855) wrote :

output from the command:

$ ethtool eno12409
Settings for eno12409:
 Supported ports: [ FIBRE ]
 Supported link modes: 25000baseCR/Full
 Supported pause frame use: Symmetric
 Supports auto-negotiation: Yes
 Supported FEC modes: None BaseR RS
 Advertised link modes: 25000baseCR/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Advertised FEC modes: None BaseR RS
 Link partner advertised link modes: Not reported
 Link partner advertised pause frame use: No
 Link partner advertised auto-negotiation: Yes
 Link partner advertised FEC modes: Not reported
 Speed: 25000Mb/s
 Duplex: Full
 Port: Direct Attach Copper
 PHYAD: 0
 Transceiver: internal
 Auto-negotiation: on
Cannot get wake-on-lan settings: Operation not permitted
 Current message level: 0x00000007 (7)
          drv probe link
 Link detected: yes
~$ ethtool -d eno12409
Cannot get register dump: Operation not permitted
ubuntu@C6520-E810-30:~$ sudo ethtool -d eno12409
Offset Values
------ ------
0x0000: 00 00 00 00 03 00 00 00 05 00 00 00 01 08 00 40
0x0010: 01 00 00 40 00 00 39 34 00 00 00 00

Revision history for this message
Michael Reed (mreed8855) wrote :

I have provided a test kernel at the following link:

https://people.canonical.com/~mreed/lp_1939855_e810_ethtool_d/

Revision history for this message
Shivani Lalit Changela (shivani1512) wrote :

Hi Michael,

The test kernel seems to give a positive result. I don't see a crash with this.
Attaching the sosreport for reference.

Revision history for this message
Paul White (paulw2u) wrote :

'focal' is a release name and not the name of a
package currently in a supported release of Ubuntu.

Changed in focal (Ubuntu):
status: New → Invalid
tags: added: focal
Revision history for this message
Jeff Lane  (bladernr) wrote :

@Paul, correct. Thanks for catching that. I believe the intention was to set up a Focal task for the kernel, but to do so, one must be viewing the bug from the Linux (Ubuntu) project only... I've fixed it with the appropriate focal task now.

Changed in focal (Ubuntu Focal):
status: New → Invalid
Michael Reed (mreed8855)
summary: - ice driver causes the kernel to crash with Ubuntu 20.04.2 with ethtool
- specific register commands
+ [SRU] Ice driver causes the kernel to crash with Ubuntu 20.04.2 with
+ ethtool specific register commands
Michael Reed (mreed8855)
description: updated
description: updated
description: updated
Michael Reed (mreed8855)
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Shivani Lalit Changela (shivani1512) wrote :

I have verified the proposed kernel and I do not see any issue. Thank you for the help.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (34.1 KiB)

This bug was fixed in the package linux - 5.4.0-88.99

---------------
linux (5.4.0-88.99) focal; urgency=medium

  * focal/linux: 5.4.0-88.99 -proposed tracker (LP: #1944747)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * please drop virtualbox-guest-dkms virtualbox-guest-source (LP: #1933248)
    - Revert "UBUNTU: [Config] Disable virtualbox dkms build"

linux (5.4.0-87.98) focal; urgency=medium

  * please drop virtualbox-guest-dkms virtualbox-guest-source (LP: #1933248)
    - [Config] Disable virtualbox dkms build

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * LRMv5: switch primary version handling to kernel-versions data set
    (LP: #1928921)
    - [Packaging] switch to kernel-versions

  * disable “CONFIG_HISI_DMA” config for ubuntu version (LP: #1936771)
    - Disable CONFIG_HISI_DMA
    - [Config] Record hisi_dma no longer built for arm64

  * memory leaking when removing a profile (LP: #1939915)
    - apparmor: Fix memory leak of profile proxy

  * CryptoExpress EP11 cards are going offline (LP: #1939618)
    - s390/zcrypt: Support for CCA protected key block version 2
    - s390: Replace zero-length array with flexible-array member
    - s390/zcrypt: Use scnprintf() for avoiding potential buffer overflow
    - s390/zcrypt: replace snprintf/sprintf with scnprintf
    - s390/ap: Remove ap device suspend and resume callbacks
    - s390/zcrypt: use fallthrough;
    - s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
    - s390/ap: remove power management code from ap bus and drivers
    - s390/ap: introduce new ap function ap_get_qdev()
    - s390/zcrypt: use kzalloc
    - s390/zcrypt: fix smatch warnings
    - s390/zcrypt: code beautification and struct field renames
    - s390/zcrypt: split ioctl function into smaller code units
    - s390/ap: rename and clarify ap state machine related stuff
    - s390/zcrypt: provide cex4 cca sysfs attributes for cex3
    - s390/ap: rework crypto config info and default domain code
    - s390/zcrypt: simplify cca_findcard2 loop code
    - s390/zcrypt: remove set_fs() invocation in zcrypt device driver
    - s390/ap: remove unnecessary spin_lock_init()
    - s390/zcrypt: Support for CCA APKA master keys
    - s390/zcrypt: introduce msg tracking in zcrypt functions
    - s390/ap: split ap queue state machine state from device state
    - s390/ap: add error response code field for ap queue devices
    - s390/ap: add card/queue deconfig state
    - s390/sclp: Add support for SCLP AP adapter config/deconfig
    - s390/ap: Support AP card SCLP config and deconfig operations
    - s390/ap/zcrypt: revisit ap and zcrypt error handling
    - s390/zcrypt: move ap_msg param one level up the call chain
    - s390/zcrypt: Introduce Failure Injection feature
    - s390/zcrypt: fix wrong format specifications
    - s390/ap: fix ap devices reference counting
    - s390/zcrypt: return EIO when msg retry limit reached
    - s390/zcrypt: fix zcard and zqueue hot-unplug memleak
    - s390/ap: Fix hanging ioctl caused by wrong msg counter

  * memfd from ubuntu_kernel_s...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
no longer affects: focal (Ubuntu)
no longer affects: focal (Ubuntu Focal)
affects: dellserver → ubuntu-translations
no longer affects: ubuntu-translations
Changed in linux (Ubuntu):
status: Invalid → Fix Released
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.