PCI Call Traces hw csum failure in dmesg with 4.4.0-2-generic

Bug #1544978 reported by bugproxy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Unassigned
linux (Ubuntu)
Fix Released
Wishlist
Tim Gardner
Xenial
Fix Released
High
Unassigned
Yakkety
Fix Released
Wishlist
Tim Gardner

Bug Description

== Comment: #0 - Helmut Grauer <email address hidden> - 2016-02-12 03:00:03 ==
Hi
 getting the following Call Traces when PCI interfaces will be configured
[ 246.051566] enp0s0: hw csum failure
[ 246.051571] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.4.0-2-generic #16-Ubuntu
[ 246.051573] 00000000f9793778 00000000f9793808 0000000000000002 0000000000000000
                      00000000f97938a8 00000000f9793820 00000000f9793820 0000000000114182
                      0000000000000166 000000000091e9ca 000000000000000a 000000000000000a
                      00000000f9793868 00000000f9793808 0000000000000000 00000000f9d38000
                      0000000000000000 0000000000114182 00000000f9793808 00000000f9793868
[ 246.051581] Call Trace:
[ 246.051589] ([<00000000001140b8>] show_trace+0x140/0x148)
[ 246.051590] [<0000000000114136>] show_stack+0x76/0xe8
[ 246.051595] [<00000000005172d6>] dump_stack+0x6e/0x90
[ 246.051599] [<0000000000673500>] __skb_checksum_complete+0xd0/0xd8
[ 246.051605] [<000000000076ae24>] icmpv6_rcv+0x124/0x500
[ 246.051608] [<0000000000746e60>] ip6_input_finish+0x170/0x4e0
[ 246.051610] [<000000000074775c>] ip6_input+0x4c/0xd0
[ 246.051611] [<00000000007478ee>] ip6_mc_input+0x10e/0x280
[ 246.051612] [<0000000000747538>] ipv6_rcv+0x368/0x540
[ 246.051616] [<000000000067e5d4>] __netif_receive_skb_core+0x6fc/0xaf8
[ 246.051618] [<0000000000681a56>] netif_receive_skb_internal+0x3e/0xd8
[ 246.051619] [<0000000000682314>] napi_gro_frags+0x17c/0x208
[ 246.051627] [<000003ff805f3a2c>] mlx4_en_process_rx_cq+0x8b4/0xbd0 [mlx4_en]
[ 246.051630] [<000003ff805f3e62>] mlx4_en_poll_rx_cq+0xc2/0x1a0 [mlx4_en]
[ 246.051631] [<00000000006839e2>] net_rx_action+0x2a2/0x418
[ 246.051635] [<0000000000162726>] __do_softirq+0x156/0x300
[ 246.051637] [<0000000000162ace>] irq_exit+0xd6/0xf8
[ 246.051641] [<000000000010cc5a>] do_IRQ+0x6a/0x88
[ 246.051644] [<00000000007a99c2>] io_int_handler+0x112/0x220
[ 246.051646] [<0000000000104856>] enabled_wait+0x56/0xa8
[ 246.051649] ([<0000000000ccb888>] cpu_dead_idle+0x0/0x8)
[ 246.051651] [<0000000000104b5a>] arch_cpu_idle+0x32/0x48
[ 246.051669] [<00000000001a8198>] cpu_startup_entry+0x200/0x278
[ 246.051674] [<00000000001156ba>] smp_start_secondary+0xea/0xf8
[ 246.051679] [<00000000007a9f42>] restart_int_handler+0x62/0x78
[ 246.051680] [<0000000000000000>] (null)

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-137072 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
dann frazier (dannf)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
bugproxy (bugproxy)
tags: added: targetmilestone-inin1604
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-04-18 05:46 EDT-------
This problem is know by Mellanox, where the problem is understood and a fix available.
But not upstream posted.

This can only be solved Canonical can get this upstream fix from Mellanox!

dann frazier (dannf)
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@hws

Is there a contact at Mellanox, or any Mellanox specific linux trees or mailing lists where this fix is available? Could you please put us in touch? If we don't have a fix we cannot schedule to include it in Y-series and then SRU as per SRU cadence into xenial. For the time being, this bug will be marked incomplete until a fix is available to us. Please expect this bug report to be fixed in an SRU update to the kernel, the earliest.

Regards,

Dimitri.

Changed in linux (Ubuntu):
status: New → Incomplete
importance: High → Wishlist
Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-20 09:05 EDT-------
We are still awaiting the link of the upstream commit or mailing list.

tags: added: targetmilestone-inin16041
removed: targetmilestone-inin1604
Revision history for this message
Talat Batheesh (talat-b87) wrote :

Hi,
This upstream commit should fix the bug

commit 82d69203df634b4dfa765c94f60ce9482bcc44d6
Author: Daniel Jurgens <email address hidden>
Date: Wed May 4 15:00:33 2016 +0300

    net/mlx4_en: Fix endianness bug in IPV6 csum calculation

    Use htons instead of unconditionally byte swapping nexthdr. On a little
    endian systems shifting the byte is correct behavior, but it results in
    incorrect csums on big endian architectures.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-10 04:14 EDT-------
Hi
could you please implement this bug fix for Mellanux Call Trace Problem which is mentioned in Comment 11. I tried it on internal driver and it works.

Greetings Helmut

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Mathew Hodson (mhodson)
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
milestone: none → ubuntu-16.04.1
Changed in ubuntu-z-systems:
status: Incomplete → Fix Committed
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-23 07:32 EDT-------
checked with kernel
root@s83lp18:~# uname -a

dmesg show no hw checksum failure message anymore
1636620k SSFS
[ 9.820659] 8021q: 802.1Q VLAN Support v1.8
[ 9.824597] mlx4_en: enP1s41: frag:0 - size:1522 prefix:0 stride:1536
[ 10.204678] audit: type=1400 audit(1464002873.505:2): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=2127 comm= "apparmor_parser"
[ 10.205168] audit: type=1400 audit(1464002873.505:3): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/sbin/dhclient" pid=2125 comm="ap parmor_parser"
[ 10.205174] audit: type=1400 audit(1464002873.505:4): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-c lient.action" pid=2125 comm="apparmor_parser"
[ 10.205177] audit: type=1400 audit(1464002873.505:5): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-h elper" pid=2125 comm="apparmor_parser"
[ 10.205180] audit: type=1400 audit(1464002873.505:6): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient -script" pid=2125 comm="apparmor_parser"
[ 10.331622] IPv6: ADDRCONF(NETDEV_UP): enP1s41: link is not ready
[ 10.331630] 8021q: adding VLAN 0 to HW filter on device enP1s41
[ 10.334389] mlx4_en: enP1s41: Link Up
[ 10.337072] IPv6: ADDRCONF(NETDEV_CHANGE): enP1s41: link becomes ready
[ 10.340982] 8021q: adding VLAN 0 to HW filter on device enccw0.0.f500
[ 11.370898] mlx4_en: enP1s41: frag:0 - size:1534 prefix:0 stride:1536
[ 11.370902] mlx4_en: enP1s41: frag:1 - size:4096 prefix:1534 stride:4096
[ 11.370903] mlx4_en: enP1s41: frag:2 - size:3392 prefix:5630 stride:3584

Helmut

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.9 KiB)

This bug was fixed in the package linux - 4.4.0-23.41

---------------
linux (4.4.0-23.41) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1582431

  * zfs: disable module checks for zfs when cross-compiling (LP: #1581127)
    - [Packaging] disable zfs module checks when cross-compiling

  * Xenial update to v4.4.10 stable release (LP: #1580754)
    - Revert "UBUNTU: SAUCE: (no-up) ACPICA: Dispatcher: Update thread ID for
      recursive method calls"
    - Revert "UBUNTU: SAUCE: nbd: ratelimit error msgs after socket close"
    - Revert: "powerpc/tm: Check for already reclaimed tasks"
    - RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
    - ipvs: handle ip_vs_fill_iph_skb_off failure
    - ipvs: correct initial offset of Call-ID header search in SIP persistence
      engine
    - ipvs: drop first packet to redirect conntrack
    - mfd: intel-lpss: Remove clock tree on error path
    - nbd: ratelimit error msgs after socket close
    - ata: ahci_xgene: dereferencing uninitialized pointer in probe
    - mwifiex: fix corner case association failure
    - CNS3xxx: Fix PCI cns3xxx_write_config()
    - clk-divider: make sure read-only dividers do not write to their register
    - soc: rockchip: power-domain: fix err handle while probing
    - clk: rockchip: free memory in error cases when registering clock branches
    - clk: meson: Fix meson_clk_register_clks() signature type mismatch
    - clk: qcom: msm8960: fix ce3_core clk enable register
    - clk: versatile: sp810: support reentrance
    - clk: qcom: msm8960: Fix ce3_src register offset
    - lpfc: fix misleading indentation
    - ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p
      initialisation
    - mac80211: fix statistics leak if dev_alloc_name() fails
    - tracing: Don't display trigger file for events that can't be enabled
    - MD: make bio mergeable
    - Minimal fix-up of bad hashing behavior of hash_64()
    - mm, cma: prevent nr_isolated_* counters from going negative
    - mm/zswap: provide unique zpool name
    - ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
    - ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
    - xen: Fix page <-> pfn conversion on 32 bit systems
    - xen/balloon: Fix crash when ballooning on x86 32 bit PAE
    - xen/evtchn: fix ring resize when binding new events
    - HID: wacom: Add support for DTK-1651
    - HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
    - Input: zforce_ts - fix dual touch recognition
    - proc: prevent accessing /proc/<PID>/environ until it's ready
    - mm: update min_free_kbytes from khugepaged after core initialization
    - batman-adv: fix DAT candidate selection (must use vid)
    - batman-adv: Check skb size before using encapsulated ETH+VLAN header
    - batman-adv: Fix broadcast/ogm queue limit on a removed interface
    - batman-adv: Reduce refcnt of removed router when updating route
    - writeback: Fix performance regression in wb_over_bg_thresh()
    - MAINTAINERS: Remove asterisk from EFI directory names
    - x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
    - ARM: cpuidle: Pass on arm_cpuidle_s...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

Verified per Comment #37.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-24.43

---------------
linux (4.4.0-24.43) xenial; urgency=low

  [ Kamal Mostafa ]

  * CVE-2016-1583 (LP: #1588871)
    - ecryptfs: fix handling of directory opening
    - SAUCE: proc: prevent stacking filesystems on top
    - SAUCE: ecryptfs: forbid opening files without mmap handler
    - SAUCE: sched: panic on corrupted stack end

  * arm64: statically link rtc-efi (LP: #1583738)
    - [Config] Link rtc-efi statically on arm64

 -- Kamal Mostafa <email address hidden> Fri, 03 Jun 2016 10:02:16 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.