r8169 ethernet sometimes doesn't work after cold boot/boot

Bug #1841040 reported by Alistair Buxton
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Whenever the computer is rebooted there is a roughly 50% chance that the r8169 ethernet cannot negotiate a link. There is no relevant information in dmesg except for the "link down" message. The work around is to just keep rebooting the computer until it eventually works; sometimes this may take up to 7 reboots.

The first set of attached log files were generated while the ethernet was working.

The second set are generated on the 5.0 HWE kernel when the ethernet was NOT working.

Note that this is a regression. Ubuntu 16.04 ran on this computer for three years without the ethernet doing this even once.

Unloading and loading the r8169 module will eventually make it work.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-58-generic 4.15.0-58.64
ProcVersionSignature: Ubuntu 4.15.0-58.64-generic 4.15.18
Uname: Linux 4.15.0-58-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-id', '/dev/snd/pcmC2D0c', '/dev/snd/controlC2', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/controlC0', '/dev/snd/by-path', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D2c', '/dev/snd/pcmC1D1p', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Thu Aug 22 10:05:17 2019
HibernationDevice: RESUME=UUID=4ab51820-6203-45a8-a04d-07946b0fbe56
InstallationDate: Installed on 2013-12-22 (2068 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: ASUS All Series
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-58-generic root=UUID=e1e7d1a7-19e8-4203-ad6a-86a144db61bc ro quiet splash vt.handoff=1
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-58-generic N/A
 linux-backports-modules-4.15.0-58-generic N/A
 linux-firmware 1.173.9
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2019-01-30 (203 days ago)
dmi.bios.date: 06/18/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2201
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H87M-E
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2201:bd06/18/2015:svnASUS:pnAllSeries:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH87M-E:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.family: ASUS MB
dmi.product.name: All Series
dmi.product.version: System Version
dmi.sys.vendor: ASUS
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC2: al 14985 F.... pulseaudio
 /dev/snd/controlC0: al 14985 F.... pulseaudio
 /dev/snd/controlC1: al 14985 F.... pulseaudio
CurrentDesktop: Unity:Unity7:ubuntu
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=4ab51820-6203-45a8-a04d-07946b0fbe56
InstallationDate: Installed on 2013-12-22 (2068 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
MachineType: ASUS All Series
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-25-generic root=UUID=e1e7d1a7-19e8-4203-ad6a-86a144db61bc ro quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 5.0.0-25.26~18.04.1-generic 5.0.18
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-25-generic N/A
 linux-backports-modules-5.0.0-25-generic N/A
 linux-firmware 1.173.9
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: bionic
Uname: Linux 5.0.0-25-generic x86_64
UpgradeStatus: Upgraded to bionic on 2019-01-30 (203 days ago)
UserGroups: adm lpadmin sambashare sudo
_MarkForUpload: True
dmi.bios.date: 06/18/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2201
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H87M-E
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2201:bd06/18/2015:svnASUS:pnAllSeries:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH87M-E:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.family: ASUS MB
dmi.product.name: All Series
dmi.product.sku: All
dmi.product.version: System Version
dmi.sys.vendor: ASUS

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :
description: updated
Revision history for this message
Alistair Buxton (a-j-buxton) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Alistair Buxton (a-j-buxton) wrote : CRDA.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : IwConfig.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : Lspci.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : Lsusb.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : ProcEnviron.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : ProcModules.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : PulseList.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : UdevDb.txt

apport information

Revision history for this message
Alistair Buxton (a-j-buxton) wrote : WifiSyslog.txt

apport information

description: updated
description: updated
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

I am not able to test that kernel as it is not signed.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is it possible to disable secure boot and git mainline kernel a try?

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

I disabled secure boot and I was able to reproduce the problem with mainline on the first attempt.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is it possible to perform a kernel bisection to find the regression commit?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

First, find the last good -rc kernel and the first bad -rc kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/

Then,
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good $(the good version you found)
$ git bisect bad $(the bad version found)
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If the issue still happens,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the commit that causes the regression.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

Why -rc? The last working kernel I know of is from the 4.4 series.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

I am working on this but it is slow as the bug is difficult to reproduce. It seems to happen more often after cold boots.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

(and by cold boots I mean very cold - like the computer has been switched off for several hours.)

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

I have managed to reproduce this as far back as 2f34c1231bfc9f which is somewhere between 4.11 and 4.12rc1. I have been unable to reproduce it with 4.11, but this may just be due to bad luck. I will continue testing tomorrow.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

It is perhaps worth mentioning that there are only two patches to r8169 in this time frame:

commit 1bcf165ac6556dc55a596d524b8187d1ba7a8c7d
Author: Zhu Yanjun <email address hidden>
Date: Sun Mar 12 05:02:54 2017 -0400

    r8169: replace init_timer with setup_timer

    Replace init_timer with setup_timer to simplify the source code.

    Signed-off-by: Zhu Yanjun <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf
Author: Philippe Reynes <email address hidden>
Date: Thu Feb 23 22:34:43 2017 +0100

    net: realtek: r8169: use new api ethtool_{get|set}_link_ksettings

    The ethtool api {get|set}_settings is deprecated.
    We move this driver to new api {get|set}_link_ksettings.

    As I don't have the hardware, I'd be very pleased if
    someone may test this patch.

    Signed-off-by: Philippe Reynes <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

The difficulty of reproducing indicates it may be a timing-related issue, and the failure to negotiate a link suggests that the second patch may also be involved.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I think it's unlikely that these two caused the regression.

Please file an upstream bug at https://bugzilla.kernel.org/
Product: Drivers
Component: Network

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

Seems like you may be right. I could not reproduce it with 1bcf165ac6556d after about 10 attempts. This isn't conclusive unfortunately. I will continue with a regular bisect.

Revision history for this message
Vladislav Rubtsov (vladikcomper) wrote :
Download full text (7.5 KiB)

I've a similar bug occurring on the following kernel versions:
* 5.0.0-23-generic
* 5.0.0-25-generic
* 5.0.0-27-generic

However, quite a few things are different in my case.
As seen in logs, "Link is Down" is quickly followed up by "Link is Up" message, but interface fails to serve any packets. As far as packet statistics go, packets are transmitted, but never received.

Kernel logs also show this interesting backtrace:

12:04:55 kernel: ------------[ cut here ]------------
12:04:55 kernel: NETDEV WATCHDOG: enp3s0f1 (r8169): transmit queue 0 timed out
12:04:55 kernel: WARNING: CPU: 4 PID: 0 at /build/linux-hwe-zHO4ZF/linux-hwe-5.0.0/net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230
12:04:55 kernel: Modules linked in: ccm ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_
12:04:55 kernel: intel_rapl_perf rtsx_pci_ms asus_nb_wmi snd joydev input_leds mei_me asus_wmi soundcore memstick serio_raw sparse_keymap mei mxm_wmi processor_thermal_device intel_pch_the
12:04:55 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: P OE 5.0.0-23-generic #24~18.04.1-Ubuntu
12:04:55 kernel: Hardware name: ASUSTeK COMPUTER INC. X550VX/X550VX, BIOS X550VX.302 05/04/2017
12:04:55 kernel: RIP: 0010:dev_watchdog+0x221/0x230
12:04:55 kernel: Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 ba 0a f0 00 01 e8 03 39 fc ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 a0 8d da a3 e8 4f 63 79 ff <0f> 0b eb c0 90 66 2e 0f 1f 84 00 00 00
12:04:55 kernel: RSP: 0018:ffff968437b03e58 EFLAGS: 00010286
12:04:55 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
12:04:55 kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff968437b16440
12:04:55 kernel: RBP: ffff968437b03e88 R08: 00000000000003ca R09: 0000000000000004
12:04:55 kernel: R10: ffff968437b03ee0 R11: 0000000000000001 R12: 0000000000000001
12:04:55 kernel: R13: ffff9684352b0000 R14: ffff9684352b04c0 R15: ffff968429a4da80
12:04:55 kernel: FS: 0000000000000000(0000) GS:ffff968437b00000(0000) knlGS:0000000000000000
12:04:55 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
12:04:55 kernel: CR2: 00007f8bce55be90 CR3: 0000000230a0e004 CR4: 00000000003606e0
12:04:55 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
12:04:55 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
12:04:55 kernel: Call Trace:
12:04:55 kernel: <IRQ>
12:04:55 kernel: ? pfifo_fast_reset+0x110/0x110
12:04:55 kernel: call_timer_fn+0x30/0x130
12:04:55 kernel: run_timer_softirq+0x3ff/0x450
12:04:55 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: P OE 5.0.0-23-generic #24~18.04.1-Ubuntu
12:04:55 kernel: Hardware name: ASUSTeK COMPUTER INC. X550VX/X550VX, BIOS X550VX.302 05/04/2017
12:04:55 kernel: RIP: 0010:dev_watchdog+0x221/0x230
12:04:55 kernel: Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 ba 0a f0 00 01 e8 03 39 fc ff 89 d9 48 89 c2 4c 89 ee 48 c
12:04:55 kernel: RSP: 0018:ffff968437b03e58 EFLAGS: 00010286
12:04:55 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
12:04:55 kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff968437...

Read more...

Revision history for this message
Vladislav Rubtsov (vladikcomper) wrote :

I *think* there is better and more pleasant workaround rather than rebooting every time this bug affects you.

Try putting your computer into suspension, then wake it up.
In most of the cases Ethernet interface fails to start at all. If it does, then reloading its driver may help it this point:

# modprobe -r r8169
# modprobe r8169

This worked for me on 5.0.0-25-generic.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

Reloading the module does clear the error for me.

I have never seen the problem where the link comes up but it won't receive packets.

Revision history for this message
Brian Foster (blfoster) wrote :

Make sure your unit really is an r8169, and not an r8168.
Whilst (as I understand it) the r8169 module is supposed
to work with an r8168, at least on my laptop (Acer Aspire E17
(E5-722-425Q), currently 18.04 (up-to-date)),
using the r8169 module has resulted in "flakely" behaviour
for as long as I can remember, including becoming unusable
after a (some?) warm reboot.
(Sorry, I cannot recall any specific details,
albeit I vaguely recall that was not the only problem I had.)
Using the r8168 dkms module, there has not been any(?) problems.
However, I believe the r8168 module (in package `r8168-dkms') is
specific to r8168 hardware and will not work with an r8169 unit.

Using `lspci' my r8168 is listed as:

   Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :
Download full text (9.1 KiB)

I finally completed the bisection. An extremely large number of kernels failed to build. This is the result:

# only skipped commits left to test
# possible first bad commit: [4d6ca227c768b50b05cf183974b40abe444e9d0c] Merge branch 'for-4.12/asus' into for-linus
# possible first bad commit: [800f3eef8ebc1264e9c135bfa892c8ae41fa4792] Merge branch 'for-4.12/sony' into for-linus
# possible first bad commit: [18fc2163b8a410d4d36b8f44658580731c0afaa1] Merge branches 'for-4.11/upstream-fixes', 'for-4.12/accutouch', 'for-4.12/cp2112', 'for-4.12/hid-core-null-state-handling', 'for-4.12/hiddev', 'for-4.12/i2c-hid', 'for-4.12/innomedia', 'for-4.12/logitech-hidpp-battery-power-supply', 'for-4.12/multitouch', 'for-4.12/nti', 'for-4.12/upstream' and 'for-4.12/wacom' into for-linus
# possible first bad commit: [d529a4ad91efcf68b65440c6555895fd7ad5a08e] HID: usbhid: Add HID_QUIRK_NOGET for Aten CS-1758 KVM switch
# possible first bad commit: [149f6f6b8ff3288a88dc34755de7719637cc8cc4] HID: wacom: Move wacom_remote_irq and wacom_remote_status_irq
# possible first bad commit: [ed1fa736839eb97b1d066e36150df28251095eef] HID: wacom: generic: sync pad events only for actual packets
# possible first bad commit: [040fc001765d374776353cb4f8b03ea7fa41e3cd] HID: sony: remove redundant check for -ve err
# possible first bad commit: [a676bdc422241822130364443a6a65b6520440ba] HID: sony: Make sure to unregister sensors on failure
# possible first bad commit: [77b499e739ed5561e5026fa7140ae53f6c4d1d8e] HID: sony: Make DS4 bt poll interval adjustable
# possible first bad commit: [5caceb0695d0498b8c931cbc3cdafd99bd37b8ae] HID: sony: Set proper bit flags on DS4 output report
# possible first bad commit: [39254a13d64bc69b83f4097dacc4117d7b865118] HID: sony: DS4 use brighter LED colors
# possible first bad commit: [b8f0970d2c5a03f5a431d51af74dd1a0ec62fe91] HID: sony: Improve navigation controller axis/button mapping
# possible first bad commit: [5a144be39c3a32b3072529ccee79e4ec9eb9b275] HID: sony: Use DS3 MAC address as unique identifier on USB
# possible first bad commit: [a4bf6153b317754e058ad9c7f5f02367e0bfdcc8] HID: logitech-hidpp: add a sysfs file to tell we support power_supply
# possible first bad commit: [7f7ce2a258b47f9510ad613328c046a3ff9426b0] HID: logitech-hidpp: enable HID++ 1.0 battery reporting
# possible first bad commit: [696ecef9b5874a312d74050525217f48d0f1b349] HID: logitech-hidpp: add support for battery status for the K750
# possible first bad commit: [5b036ea18e13e006e99cb197e9aceb09d897d20a] HID: logitech-hidpp: battery: provide CAPACITY_LEVEL
# possible first bad commit: [14f437a1d7b49a2e873f63436526f9aed3a781c3] HID: logitech-hidpp: rename battery level into capacity
# possible first bad commit: [284f8d7592673a7a6dae96d082806d324378f212] HID: logitech-hidpp: battery: provide ONLINE property
# possible first bad commit: [9b9c519f1fe3ec9d2518a99c71c54f5c25eef345] HID: logitech-hidpp: notify battery on connect
# possible first bad commit: [a9525b80feb1b6ae40244b16b0558cbdc64f28cd] HID: logitech-hidpp: return an error if the queried feature is not present
# possible first bad commit: [a52ec107fa81c8f799654b860e262f07bd14d63a] HID: logitech-hidpp: create the b...

Read more...

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

It is a r8168 card. The problem continues to happen with the dkms module however.

Revision history for this message
Alistair Buxton (a-j-buxton) wrote :

I can no longer reproduce this in 22.04.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.