hns: under heavy load, NIC may fail and require reboot

Bug #1704146 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Critical
dann frazier
Zesty
Fix Released
Critical
dann frazier

Bug Description

[Impact]
The onboard NIC on HiSilicon D05 boards can fail under heavy load, requiring a reboot to recover.

[Test Case]
There isn't a known reliable reproducer.

[Regression Risk]
This is a driver-specific fix, so regression risk is limited to devices that we can directly test.

dann frazier (dannf)
Changed in linux (Ubuntu Zesty):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
importance: Undecided → Critical
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Revision history for this message
dann frazier (dannf) wrote :

Preemptive verification:

dannf@xps13:~$ ssh 10.XXX.XXX.XXX cat /proc/version
Warning: Permanently added '10.XXX.XXX.XXX' (ECDSA) to the list of known hosts.
Linux version 4.10.0-29-generic (buildd@bos01-arm64-012) (gcc version 6.3.0 20170406 (Ubuntu/Linaro 6.3.0-12ubuntu2) ) #33-Ubuntu SMP Wed Jul 19 13:37:12 UTC 2017

(The 10.XXX.XXX.XXX IP is on an hns NIC on a D05 system)

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
dann frazier (dannf)
tags: added: verification-done-zesty
removed: verification-needed-zesty
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.6 KiB)

This bug was fixed in the package linux - 4.10.0-30.34

---------------
linux (4.10.0-30.34) zesty; urgency=low

  * CVE-2017-7533
    - dentry name snapshots

linux (4.10.0-29.33) zesty; urgency=low

  * linux: 4.10.0-29.33 -proposed tracker (LP: #1704961)

  * Opal and POWER9 DD2 (LP: #1702159)
    - powerpc/powernv: Tell OPAL about our MMU mode on POWER9
    - powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()

  * CVE-2017-1000364
    - mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
    - mm/mmap.c: expand_downwards: don't require the gap if !vm_prev

  * [Xenial] nvme: Quirks for PM1725 controllers (LP: #1704435)
    - nvme: Quirks for PM1725 controllers

  * hns: under heavy load, NIC may fail and require reboot (LP: #1704146)
    - net: hns: Bugfix for Tx timeout handling in hns driver

  * New ACPI identifiers for ThunderX SMMU (LP: #1703437)
    - iommu/arm-smmu: Plumb in new ACPI identifiers

  * CVE-2017-7482
    - rxrpc: Fix several cases where a padded len isn't checked in ticket decode

  * CVE-2017-1000365
    - fs/exec.c: account for argv/envp pointers

  * CVE-2017-10810
    - drm/virtio: don't leak bo on drm_gem_object_init failure

  * Data corruption with hio driver (LP: #1701316)
    - SAUCE: hio: Fix incorrect use of enum req_opf values

  * arm64: fix crash reading /proc/kcore (LP: #1702749)
    - fs/proc: kcore: use kcore_list type to check for vmalloc/module address
    - arm64: mm: select CONFIG_ARCH_PROC_KCORE_TEXT

  * cxlflash update request in the Xenial SRU stream (LP: #1702521)
    - scsi: cxlflash: Refactor context reset to share reset logic
    - scsi: cxlflash: Support SQ Command Mode
    - scsi: cxlflash: Cleanup prints
    - scsi: cxlflash: Cancel scheduled workers before stopping AFU
    - scsi: cxlflash: Enable PCI device ID for future IBM CXL Flash AFU
    - scsi: cxlflash: Separate RRQ processing from the RRQ interrupt handler
    - scsi: cxlflash: Serialize RRQ access and support offlevel processing
    - scsi: cxlflash: Implement IRQ polling for RRQ processing
    - scsi: cxlflash: Update sysfs helper routines to pass config structure
    - scsi: cxlflash: Support dynamic number of FC ports
    - scsi: cxlflash: Remove port configuration assumptions
    - scsi: cxlflash: Hide FC internals behind common access routine
    - scsi: cxlflash: SISlite updates to support 4 ports
    - scsi: cxlflash: Support up to 4 ports
    - scsi: cxlflash: Fence EEH during probe
    - scsi: cxlflash: Remove unnecessary DMA mapping
    - scsi: cxlflash: Fix power-of-two validations
    - scsi: cxlflash: Fix warnings/errors
    - scsi: cxlflash: Improve asynchronous interrupt processing
    - scsi: cxlflash: Support multiple hardware queues
    - scsi: cxlflash: Add hardware queues attribute
    - scsi: cxlflash: Introduce hardware queue steering
    - cxl: Enable PCI device IDs for future IBM CXL adapters
    - scsi: cxlflash: Select IRQ_POLL
    - scsi: cxlflash: Combine the send queue locks
    - scsi: cxlflash: Update cxlflash_afu_sync() to return errno
    - scsi: cxlflash: Reset hardware queue context via specified register
    - scsi: cxlflash: Schedule asynchronous res...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (24.9 KiB)

This bug was fixed in the package linux - 4.11.0-13.19

---------------
linux (4.11.0-13.19) artful; urgency=low

  * CVE-2017-7533
    - dentry name snapshots

linux (4.11.0-12.18) artful; urgency=low

  * linux: 4.11.0-12.18 -proposed tracker (LP: #1707635)
    - no change rebuild to pick up the new binutils.

  * Adt tests of src:linux time out often on armhf lxc containers (LP: #1705495)
    - [Packaging] tests -- reduce rebuild test to one flavour
    - [Packaging] tests -- reduce rebuild test to one flavour -- use filter

  * [ARM64] config EDAC_GHES=y depends on EDAC_MM_EDAC=y (LP: #1706141)
    - [Config] set EDAC_MM_EDAC=y for ARM64

  * [Hyper-V] hv_netvsc: Exclude non-TCP port numbers from vRSS hashing
    (LP: #1690174)
    - hv_netvsc: Exclude non-TCP port numbers from vRSS hashing

  * ath10k doesn't report full RSSI information (LP: #1706531)
    - ath10k: add per chain RSSI reporting

  * ideapad_laptop don't support v310-14isk (LP: #1705378)
    - platform/x86: ideapad-laptop: Add several models to no_hw_rfkill

  * Ubuntu 16.04.3: Qemu fails on P9 (LP: #1686019)
    - KVM: PPC: Pass kvm* to kvmppc_find_table()
    - KVM: PPC: Use preregistered memory API to access TCE list
    - KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
    - powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
    - powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
    - powerpc/vfio_spapr_tce: Add reference counting to iommu_table
    - powerpc/mmu: Add real mode support for IOMMU preregistered memory
    - KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
    - KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers

  * hns: ethtool selftest crashes system (LP: #1705712)
    - net/hns:bugfix of ethtool -t phy self_test

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - arm64: Add wor...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.