r8169 stops working after a while

Bug #121815 reported by Soren Hansen
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned
linux-source-2.6.22 (Ubuntu)
Won't Fix
Medium
Kyle McMartin

Bug Description

My brand new machine has a realtek nic in it (the r8169 driver).

When running Gutsy (2.6.22-6.13), the NIC just stops working after a while. After that, if I remove the driver and load it again, the mac is read as ff:ff:ff:ff:ff:ff.

Revision history for this message
Soren Hansen (soren) wrote :
Revision history for this message
Soren Hansen (soren) wrote :

I just had it happen to me with the Feisty kernel. Just once, and after using it for quite a while. With the gutsy kernel it stops working within around 10 minutes of booting.

I noticed that when the NIC is detected on boot, it says:
[ 37.723306] eth0: RTL8168b/8111b at 0xffffc2000001c000, 00:13:8f:fc:19:a3, IRQ 17

..but after it has stopped working and I've reloaded the module, it now says:
[ 4638.077249] eth0: RTL8100e at 0xffffc2000001c000, ff:ff:ff:ff:ff:ff, IRQ 17

Not that both the model and the MAC has changed. When I had the debugging options enabled, it also told me that the mac_version had changed from 0x0c to 0x0f if that means anything to you.

Revision history for this message
Soren Hansen (soren) wrote :

I just had it happen to me with the Feisty kernel. Just once, and after using it for quite a while. With the gutsy kernel it stops working within around 10 minutes of booting.

I noticed that when the NIC is detected on boot, it says:
[ 37.723306] eth0: RTL8168b/8111b at 0xffffc2000001c000, 00:13:8f:fc:19:a3, IRQ 17

..but after it has stopped working and I've reloaded the module, it now says:
[ 4638.077249] eth0: RTL8100e at 0xffffc2000001c000, ff:ff:ff:ff:ff:ff, IRQ 17

Note that both the model and the MAC has changed. When I had the debugging options enabled, it also told me that the mac_version had changed from 0x0c to 0x0f if that means anything to you.

Revision history for this message
Soren Hansen (soren) wrote :

Applying this patchset http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.22-rc3/ to current Gutsy git seems to have fixed it. I've not seen the crash yet. I'll keep testing it.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Soren - There are 18 patches at that URL. Can you tell which one in particular fixed your problem?

Revision history for this message
Soren Hansen (soren) wrote :

Not yet. Upstream suggested applying the entire patchset ( referenced here: http://marc.info/?l=linux-netdev&m=118107299330484&w=2 ), so I just did that. I've now had the machine saturate my internet connection for 8 hours straight and it's still working, so I'd definitely say it's fixed. During tomorrow, I'll see if I can identify which one fixed it.

Soren Hansen (soren)
description: updated
Revision history for this message
Soren Hansen (soren) wrote :

Murphy strikes again. Now I can't even make it fail with a kernel compiled from a clean git tree (from ubuntu-gutsy), and as far as I can see, nothing in it has changed since 6.13?

The issue sounds strikingly similar to this: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/76489 and it's also the same motherboard..

Revision history for this message
Soren Hansen (soren) wrote :

Uh.. No, it's actually not *that* similar to bug #76489. Both deal with same motherboard and NIC, and in both cases it stops working after a while, but I have *nothing * in dmesg, and I can remove the module and load it again, and TSO is already off.

Also, it seems that rtl8169_vlan_rx_kill_vid (and the matching ref in the net_device) was removed since 6.13. I'm not entirely sure what that function was for to be honest.

Revision history for this message
Kyle McMartin (kyle) wrote :

Hi Soren, can you use a stock kernel and turn TSO off with ethtool:

sudo ethtool -k eth0

See if TCP Segment Offloading is on, if it is, turn it off with

sudo ethtool -K eth0 tso off

And see if that fixes your problems at all.

Thanks, Kyle

Changed in linux-source-2.6.22:
assignee: nobody → kyle
status: New → Incomplete
Revision history for this message
Soren Hansen (soren) wrote :

I'm 99% I checked that and found out it was already off, but I'll be sure to check it when I'm rebooting anyway..

Revision history for this message
Soren Hansen (soren) wrote :

TSO is already off.

Revision history for this message
Soren Hansen (soren) wrote :

Still happens with 8.18 generic image (with TSO off).

Changed in linux-source-2.6.22:
status: Incomplete → Confirmed
Revision history for this message
Henrik Nilsen Omma (henrik) wrote :

Soren, is your nic still dropping out with current gutsy?

Changed in linux-source-2.6.22:
importance: Undecided → Medium
Revision history for this message
Chris Jones (cmsj) wrote :

I have the same motherboard as Soren.

I switched back to the onboard r8169 last night and set some iso transfer loops going.
I got home from work this evening and the interface was dead. ethtool -S just seemed to hang.

Revision history for this message
Soren Hansen (soren) wrote :

..and yes, I also still see it.

Revision history for this message
Martin Emrich (emme) wrote :

Same problem here. As the card is powered via PCI 5VSB, I have to pull the power plug everytime it happens (which is whenever the kernel crashes)

Mainboard: ASRock 939Dual-SATAII
NIC: Netgear Gigabit PCI card (RTL8169)
Kernel: linux-image-2.6.22-14-xen on gutsy i386

lspci -v :

04:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
        Subsystem: Netgear Unknown device 311a
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
        I/O ports at d800 [size=256]
        Memory at feaffc00 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at 88000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

I'll try disabling TSO in my /etc/rc.local now.

Ciao

Martin

Revision history for this message
Krunoslav Husak (h00s) wrote :

I have this problem too. Network just stop working. After network stop working, only I can do to make work again is reboot :/
TCO is off!

Mainboard: Gigabyte GA-P35-DS3P
On board network
Kernel: 2.6.22-14-generic

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
        Subsystem: Giga-byte Technology Unknown device e000
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at b000 [size=256]
        Memory at e8000000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at 80000000 [disabled] [size=128K]
        Capabilities: <access denied>

Revision history for this message
albech (ubuntu-albech) wrote :

I have the same problem. Running 2-10 min then a completely dead NIC.

RTL8111/8168B

Guess I'm better off installing some old 3COM NIC in the box.

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
jasonmartens (me-jasonmartens) wrote :

I also have a r8169 nic, and I believe that this issue is related to gigabit auto negotiation. My cabling is not very good, and when the NIC auto negotiates 1000 Full Duplex I constantly get these errors:

Sep 13 17:14:00 he-man kernel: [ 1928.456477] NETDEV WATCHDOG: eth0: transmit timed out
Sep 13 17:14:00 he-man kernel: [ 1929.271582] r8169: eth0: link up

However, if the NIC auto negotiates 100 Full Duplex, it works fine. The problem is, the system must be rebooted to force the card to re-negotiate. I've tried using ethtool to force the speed/duplex but it appears to not be supported on this card, and pulling the plug also does not force the nic to re-negotiate.

I will try with 2.6.27 if I can get the kernel packages without upgrading to Ibex...

Revision history for this message
Thibouf (thibouf) wrote :

i have a very similar problem (i think) , but with the 2.6.24-19-generic kernel :

After a while (can be very long, or just 5min ...) , ehternet connexion crash, adn I get this message in dmsg :
[ 2334.384327] NETDEV WATCHDOG: eth0: transmit timed out
[ 2335.212774] r8169: eth0: link up

When it happens, the only way to make it work again is to restart the computer.

/etc/init.d/networking restart do nothing
If I unload/reload the r8169 module, the mac in ifconfig become ff:ff:ff:ff:ff:ff

I noticed that it happen smore often when Azureus is running ..

Moreover, I do not know if it is the same bug, but if I plug the ethernet cable when the system is booting, most of the time my ethernet connexion will not work at all. I need to plug it before turning on the computer of after beeing logged-in to be sure it will work (at least at the beginning..)

I can not test Intrepid Alpha right now, I will be able in 2 weeks i think.

Here is my lscpi -v :

05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
 Subsystem: Micro-Star International Co., Ltd. Unknown device 3fad
 Flags: bus master, fast devsel, latency 0, IRQ 218
 I/O ports at 1000 [size=256]
 Memory at 93120000 (64-bit, non-prefetchable) [size=4K]
 Expansion ROM at 93100000 [disabled] [size=128K]
 Capabilities: [40] Power Management version 2
 Capabilities: [48] Vital Product Data
 Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
 Capabilities: [60] Express Endpoint IRQ 0
 Capabilities: [84] Vendor Specific Information

Revision history for this message
jasonmartens (me-jasonmartens) wrote :

I just tested again with linux-image-2. 2.6.27-3.4, and the network card worked for me. It autonegotiated 1000/FD but it worked this time. I could also manually change the speed and duplex using ethtool. I did not use it for an extended period of time, so it's possible it would crash like Thibouf mentioned above after extended use.

Revision history for this message
Robin (robingape) wrote :

I have a misbehaving RTL8169 NIC in a Jetway mini-ITX Atom board, running 64 bit Hardy Heron server. The live version DVD is currently being downloaded, prior to testing tomorrow. This note is to keep the bug live, which is more than the NIC currently is! (This is written on a different machine, of course)

Revision history for this message
Dimitrios Symeonidis (azimout) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue for you. Can you try with the latest Ubuntu release? Thanks in advance.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Johan (ignesia) wrote :

This is still broken on 2.6.32-27-generic #49-Ubuntu SMP x86_64 GNU/Linux

Revision history for this message
Roberto C. Morano (rcmorano) wrote :

Still broken on 2.6.35-22-generic under Maverick.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.