mellanox driver crash fix in leagcy EQ mode
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Trusty |
Fix Released
|
Undecided
|
Ming Lei |
Bug Description
1, reproduction steps:
- ethtool eth0 rx 128
- iperf -s
- in iperf client side, run below:
iperf -c IP_SRV -P128 -t 120
- then mellanox driver crash:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/
NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 4 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.11.
Call trace:
[<ffffffc000088
[<ffffffc000088
[<ffffffc0006bf
[<ffffffc000095
[<ffffffc000095
[<ffffffc0005c1
[<ffffffc0000a0
[<ffffffc0000a0
[<ffffffc000099
[<ffffffc000099
[<ffffffc000085
[<ffffffc000081
Exception stack(0xffffffc
bde0: 009c8000 ffffffc0 00a31300 ffffffc0
be00: 009cbf30 ffffffc0 000855e0 ffffffc0 0037356e 00000000 00000000 00000000
be20: fff87954 ffffffc7 00010000 00000000 a455f900 ffffffc7 00000000 00000000
be40: 00a35000 ffffffc0 005e1080 ffffffc0 009d4c90 ffffffc0 009cbd40 ffffffc0
be60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
be80: 00000000 00000000 aadd05a0 0000007f 0058f434 ffffffc0 aade3c80 0000007f
bea0: aabe8660 0000007f 009c8000 ffffffc0 00a31300 ffffffc0 00a278d8 ffffffc0
bec0: 009c8000 ffffffc0 00a27022 ffffffc0 00000001 00000000 008cbd20 ffffffc0
bee0: 006c9d70 ffffffc0 00080408 ffffffc0 00080200 00000040 009cbf30 ffffffc0
bf00: 000855dc ffffffc0 009cbf30 ffffffc0
[<ffffffc000084
[<ffffffc0000d1
[<ffffffc0006bb
[<ffffffc000993
---[ end trace a66d2f499386c240 ]---
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 7, t=6002 jiffies, g=1924, c=1923, q=1316)
Task dump for CPU 4:
irqbalance R running task 0 999 1 0x00000000
Call trace:
[<ffffffc000085
[<ffffffc000091
[<ffffffc000081
Exception stack(0xffffffc
be20: c3ecbe80 ffffffc7 00163f4c ffffffc0
be40: c69e2f00 ffffffc7 a133f000 0000007f 00000000 00000000 a147db7c 0000007f
be60: 60000000 00000000 00000015 00000000 ffffffff ffffffff a148625c 0000007f
be80: fb6e3b70 0000007f 000849ec ffffffc0 00000008 00000000 a133f000 0000007f
bea0: ffffffff ffffffff 3b215828 00000000 53e18915 00000000 00000008 00000000
bec0: fb6e3ae0 0000007f 00000000 00000000 00000003 00000000 a133f000 0000007f
bee0: 00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 2f2f2f2f 302f2f2f 00000040 00000000 2e312dff 5e6f6c72
bf20: 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000 00000000 00000000
bf40: 00000020 00000000 a15075a0 0000007f
Task dump for CPU 5:
swapper/5 R running task 0 0 1 0x00010002
Call trace:
[<ffffffc000085
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 7, t=24007 jiffies, g=1924, c=1923, q=1404)
Task dump for CPU 4:
irqbalance R running task 0 999 1 0x00000000
Call trace:
[<ffffffc000085
[<ffffffc000091
[<ffffffc000081
Exception stack(0xffffffc
be20: c3ecbe80 ffffffc7 00163f4c ffffffc0
be40: c69e2f00 ffffffc7 a133f000 0000007f 00000000 00000000 a147db7c 0000007f
be60: 60000000 00000000 00000015 00000000 ffffffff ffffffff a148625c 0000007f
be80: fb6e3b70 0000007f 000849ec ffffffc0 00000008 00000000 a133f000 0000007f
bea0: ffffffff ffffffff 3b215828 00000000 53e18915 00000000 00000008 00000000
bec0: fb6e3ae0 0000007f 00000000 00000000 00000003 00000000 a133f000 0000007f
bee0: 00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 2f2f2f2f 302f2f2f 00000040 00000000 2e312dff 5e6f6c72
bf20: 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000 00000000 00000000
bf40: 00000020 00000000 a15075a0 0000007f
Task dump for CPU 5:
swapper/5 R running task 0 0 1 0x00010002
Call trace:
[<ffffffc000085
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 5} (detected by 7, t=42012 jiffies, g=1924, c=1923, q=1553)
Task dump for CPU 4:
irqbalance R running task 0 999 1 0x00000002
Call trace:
[<ffffffc000085
[<ffffffc000091
[<ffffffc000081
Exception stack(0xffffffc
be20: c3ecbe80 ffffffc7 00163f4c ffffffc0
be40: c69e2f00 ffffffc7 a133f000 0000007f 00000000 00000000 a147db7c 0000007f
be60: 60000000 00000000 00000015 00000000 ffffffff ffffffff a148625c 0000007f
be80: fb6e3b70 0000007f 000849ec ffffffc0 00000008 00000000 a133f000 0000007f
bea0: ffffffff ffffffff 3b215828 00000000 53e18915 00000000 00000008 00000000
bec0: fb6e3ae0 0000007f 00000000 00000000 00000003 00000000 a133f000 0000007f
bee0: 00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 2f2f2f2f 302f2f2f 00000040 00000000 2e312dff 5e6f6c72
bf20: 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000 00000000 00000000
bf40: 00000020 00000000 a15075a0 0000007f
Task dump for CPU 5:
swapper/5 R running task 0 0 1 0x00010002
Call trace:
[<ffffffc000085
Changed in linux (Ubuntu Trusty): | |
assignee: | nobody → Ming Lei (tom-leiming) |
Changed in linux (Ubuntu): | |
status: | Incomplete → Fix Released |
tags: | added: bot-stop-nagging trusty |
Changed in linux (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Fix Committed |
tags: | added: verification-done-trusty |
With upstream 92df54ee3dde385 ( net/mlx4_en: Don't use irq_affinity_ notifier to track changes
in IRQ affinity map), the issue is fixed.