10ec:8139 Network connection rtl8139 lost after some hours of inactivity and comes up again on user interaction

Bug #997767 reported by Luca Carlon
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

Since I upgraded to Kubuntu 12.04 I started to experience this issue: after some hours of inactivity of my desktop, network connection suddenly disappears. This means that the server is not reachable anymore from the network. By inactivity I mean direct user input: connection sometimes stops when streaming data to the network.

After this happens, plugging in the mouse or simply pressing any button on the keyboard makes the screen turn on (of course this is correct) and the system is reachable again on the network.

[Ignore apport collection 9-29 - they're the wrong machine!]

---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu7
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: luca 2304 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'V8235'/'VIA 8235 with ALC650F at 0xe800, irq 22'
   Mixer name : 'Realtek ALC650F'
   Components : 'AC97a:414c4723'
   Controls : 57
   Simple ctrls : 36
Card1.Amixer.info:
 Card hw:1 'UART'/'MPU-401 UART at 0x330, irq 5'
   Mixer name : ''
   Components : ''
   Controls : 0
   Simple ctrls : 0
Card1.Amixer.values:

DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=042c2f09-8574-4670-8f69-9a284f565571
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.

 eth0 no wireless extensions.
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: root=UUID=fa298373-c25e-4dd6-a83c-5d2c65e4e996 ro quiet splash
ProcVersionSignature: Ubuntu 3.2.0-24.38-generic 3.2.16
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-24-generic N/A
 linux-backports-modules-3.2.0-24-generic N/A
 linux-firmware 1.79
RfKill:

SourcePackage: linux
Tags: precise precise
Uname: Linux 3.2.0-24-generic i686
UpgradeStatus: Upgraded to precise on 2012-04-28 (29 days ago)
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
WpaSupplicantLog:

dmi.bios.date: 12/17/2002
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F9
dmi.board.name: GA-7VAX
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF9:bd12/17/2002:svn:pn:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-7VAX:rvr:cvn:ct3:cvr:

Revision history for this message
Luca Carlon (carlon-luca) wrote :

I attach my dmesg output. It can be clearly seen that no interaction is logged for some time and at a certain point the USB mouse was plugged in at the end. That is when the connection came up again.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/997767/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Luca Carlon (carlon-luca) wrote : Re: Network connection is lost after some hours of inactivity and comes up again on user interaction

I did a new test and it seems the problem manifests only when KDE/X11 is running. I tried to stop X and remaining on the terminal only and it seems it is still up.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

I tested with an old kernel (2.6.38-11 was working) and I reproduced this issue.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Is this something as simple as the machine going into suspend/sleep when you aren't typing at it?

I'd go into the KDE system settings->Power Management and check the options; in particular the 'suspend session' option.

Dave

Revision history for this message
Luca Carlon (carlon-luca) wrote :

That is the first thing I checked days ago: Settings->Power Management is completely disabled on this system. By disabled, I mean I can do nothing, I get a message saying:

"Power Management configuration module could not be loaded. The Power Management Service appears not to be running. This can be solved by starting or scheduling it inside "Startup and Shutdown"".

Also, I commonly use suspend on my laptop, and it takes something like 8 to 10 seconds to wake up. Here I hear the disk for a second, and the network is up again (and this is an old hardware).

I did the test you suggested: issued while true; do date; ifconfig -a; sleep 60; done. I left it running all night long. Then I tried to ping the machine from another and I confirmed that it was unreachable as usual. So I checked the output of the command and I attached it. I see the system has not gone to sleep and the IP was never lost, but I see a large amount of dropped packets.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

I'm setting this to Linux; given the test output it does look like the interface is still up and the machine hasn't gone to sleep; I can't quite see what would happen after user interaction though.

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 997767

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Revision history for this message
Luca Carlon (carlon-luca) wrote : AcpiTables.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Luca Carlon (carlon-luca) wrote : AlsaDevices.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : AplayDevices.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : BootDmesg.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : CRDA.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Codecs.codec.1.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : IwConfig.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Lspci.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Lsusb.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : PciMultimedia.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : ProcModules.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : PulseList.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : RfKill.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : UdevDb.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : UdevLog.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Luca Carlon (carlon-luca) wrote : Re: Network connection is lost after some hours of inactivity and comes up again on user interaction

I ran a new test and booting in failsafe mode prevents the network from shutting down.

Revision history for this message
Luca Carlon (carlon-luca) wrote : AcpiTables.txt

apport information

description: updated
Revision history for this message
Luca Carlon (carlon-luca) wrote : AlsaDevices.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : AplayDevices.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : ArecordDevices.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : BootDmesg.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Codecs.codec97.0.ac97.0.0.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Card0.Codecs.codec97.0.ac97.0.0.regs.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Luca Carlon (carlon-luca) wrote : Lspci.txt

apport information

description: updated
summary: - Network connection is lost after some hours of inactivity and comes up
- again on user interaction
+ Network connection [rtl8139 / 8139too] is lost after some hours of
+ inactivity and comes up again on user interaction
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: needs-upstream-testing
tags: added: kernel-unable-to-test-upstream
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
25 comments hidden view all 105 comments
Revision history for this message
Mariano Jan (mjan) wrote : UdevLog.txt

apport information

Revision history for this message
Mariano Jan (mjan) wrote : WifiSyslog.txt

apport information

Revision history for this message
Mariano Jan (mjan) wrote : Re: Network connection [rtl8139 / 8139too] is lost after some hours of inactivity and comes up again on user interaction

The report is when the network card works OK. The failing card is eth1.
I'm thinking to buy a new PCI, as I've read that it may be the device is faulty.
Thanks!

Revision history for this message
Mariano Jan (mjan) wrote :

Just a comment, have you tried to add "acpi=no noapic" to GRUB_CMDLINE_LINUX_DEFAULT (in /etc/default/grub)?
I'm trying it now, and it *seems* to work. I'll come back in a few days to tell you if it works.

Revision history for this message
Mariano Jan (mjan) wrote :

(I forgot to clarify that when you update /etc/default/grub you should run "sudo update-grub" to apply the changes)
The test I made still doesn't works always. The computer lost connectivity at 06:22 a.m. (the moment when the cron.daily scripts execute), and went back online at 06:47 a.m.
So, although adding "acpi=off noapic" to the startup helped a lot to minimize the problems, it still doesn't work as expected.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

Yesterday I started this test. I booted and in grub I added the parameter you suggested. No shutdown yet. I'll continue the test to see if it just minimizes the problem or it solves. The only other way I found so far is to boot from grub in recovery mode.

Revision history for this message
Mariano Jan (mjan) wrote :

The test I made still doesn't works always. The computer lost connectivity at 06:22 a.m. (the moment when the cron.daily scripts execute), and went back online at 06:47 a.m.
So, although adding "acpi=off noapic" to the startup helped a lot to minimize the problems, it still doesn't work as expected.

penalvch (penalvch)
description: updated
penalvch (penalvch)
summary: - Network connection [rtl8139 / 8139too] is lost after some hours of
- inactivity and comes up again on user interaction
+ 10ec:8139 Network connection rtl8139 lost after some hours of inactivity
+ and comes up again on user interaction
tags: added: needs-bisect regression-release
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
2 comments hidden view all 105 comments
Revision history for this message
Luca Carlon (carlon-luca) wrote :

The problem is that it seems the system is not booting correctly with kernel 3.4.0-030400-generic. I get the splash screen reporting Xubuntu (is this correct?) and then I get an error with a shell. KDE is not booted. The last message I see is:

Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
 - Check rootdelay= (did the system wait long enough?)
 - Check root= (did the system wait for the right device?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/<uuid> does not exist.
Dropping to a shell!

And I have a shell here, which seems to be a shell of the initramfs and not that of my root partition. I'm not sure how, but if you need I can try to extract the kernel output of dmesg from here.

1 comments hidden view all 105 comments
Revision history for this message
Luca Carlon (carlon-luca) wrote :

I don't know if I'm doing anything wrong, but I downloaded linux-image-3.5.0-030500rc1-generic_3.5.0-030500rc1.201206022335_i386.deb, I installed it using dpkg -i and booted using the newly added item in grub. The same I described before is happening. Same message and the shell in initramfs is provided.

Revision history for this message
penalvch (penalvch) wrote :

Luca Carlon, thank you for trying the mainline kernel. The next step is to perform a bisect to identify the offending commit(s). Could you please do so following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Revision history for this message
Luca Carlon (carlon-luca) wrote :

I can try to bisect, but do you think it makes sense to bisect even though kernel version 2.6.38-11-generic is failing as well? I think that kernel version was installed with Kubuntu 11.04, and I don't remember I ever encountered issues until 12.04.

Revision history for this message
Brad Figg (brad-figg) wrote :

@penalvch,

You are really not helping by just spamming bugs telling folks to bisect their kernels. Many people just are not prepared to do so.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Luca,

Would it be possible for you to test the latest Lucid kernel[0]? It would be good to know if 2.6.32-41 is failing since you reported 2.6.38-11-generic is failing.

[0] https://launchpad.net/~canonical-kernel-team/+archive/ppa/+build/3436801/+files/linux-image-2.6.32-41-generic_2.6.32-41.89_i386.deb

Revision history for this message
Brad Figg (brad-figg) wrote :

There is some confusion here. Luca please correct me if I am wrong ...
1. You were originally running Natty and that was working fine for you.
2. You installed Precise and started to experience the issue.
3. You installed the Natty kernel on your Precise system and that is also
    showing the same problem.

Revision history for this message
Brad Figg (brad-figg) wrote :

From irc Luca also mentioned that he was running Oneiric just fine. I believe he also said that he is experiencing the issue when using the Oneiric kernel on his Precise install.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

@jsalisbury, yes, I can do that. Let me just finish my current test and I can run on that kernel as well.

@brad-figg: Ok, I try to summarize because I created a lot of confusion here:

1. I've been running all the Kubuntu versions since some years, so including Natty, Oneiric etc... I always update to the latest version when it is available. With this hardware I've never had issues with previous versions of Kubuntu.

2. Yes, I installed 12.04 and I started to experience this.

3. I didn't install the Natty kernel, it was installed already since the previous versions of Kubuntu. I had available 2.6.38 and 3.0.0 (and 3.2 of course). None of those worked. I tested more than once.

4. Any test I run so far failed on this installation (network shut down). Also recovery mode and the proposal of mjan.

5. What I noticed so far is that when network load is high, the network seems to shut down sooner. Recovery mode for instance may last for more than a day. If instead I run mldonkey in recovery mode, after some hours I expect to see the network down. But this is only something I noticed, might only be a coincidence.

6. The only thing that I still haven't seen failing is the Precise live CD. I'm trying to load the network using iperf and downloading from the Internet continuously. Still no failure after a couple of days. I'll wait for some more days. But unfortunately, with the live CD I'm not able to run servers like apache, proftpd, svn, mldonkey etc... that I use in my common installation. Booting from USB is not supported by the hardware.

7. I had the chance to try kernel 3.4, thanks to a kind guy on irc. That failed as well. I still have to try 3.5.

tags: added: kernel-da-key
removed: kernel-unable-to-test-upstream
Revision history for this message
Brad Figg (brad-figg) wrote :

@luca, thanks for the detailed description.

I think the point that I was trying to make is that when you had Oneiric installed (before Precise) that everything worked just fine.

When you upgraded to Precise, you encountered this issue. And now with Precise and booting the Oneiric kernel (which was already installed) you are still having the same issue.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

Some news in case it could be of any help:

1. Tested kernel 2.6.32 as requested. Network fails.

2. Tried to completely reset BIOS settings, in case this was sonehow related to that, being integrated. No changes.

3. Tried to use RTL-8029 on the same system and that seems to work.

Revision history for this message
Colin Ian King (colin-king) wrote :

As discussed on IRC, USB seems to stop working with ACPI disabled, but then the network fails with it enabled.

Can you boot the machine with acpi enabled and disabled, and for both of these scenarios please report back the output from:

cat /proc/interrupts

and

dmesg

This way I can compare the two and maybe get an idea of what is going on.

Changed in linux (Ubuntu):
assignee: nobody → Colin King (colin-king)
Revision history for this message
Luca Carlon (carlon-luca) wrote :

Consider that I kept up to date with new Ubuntu kernels, so now I have 3.5.0-19-generic. I attached the output of the acpi=off.

The last lines in the dmesg for acpi=off are related to an USB hard disk plugged in when booting and a usb mouse plugged after boot: that is interesting because I discovered that sometimes when plugging I get those ehci error messages, sometimes I don't and a couple of times I even got the mouse to work (badly, movement was not smooth).

Revision history for this message
Luca Carlon (carlon-luca) wrote :

This is the output when acpi=on.

Revision history for this message
Colin Ian King (colin-king) wrote :

With acpi=off IRQ 21 is not working and devices hci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4 which are on this interrupt won't work. If you want to run with ACPI disabled you may want to try using:

acpi=off irqpoll

..and see if USB works. This is quite gross though

Revision history for this message
Luca Carlon (carlon-luca) wrote :

Great! Seems to work correctly now. And what about the fact that acpi=off is needed to avoid packets being dropped by the interface? Is that supposed to happen?

Revision history for this message
Colin Ian King (colin-king) wrote :

I've not yet figured out why acpi=off is required, but it should work with acpi enabled, so it is quite perplexing.

Revision history for this message
Colin Ian King (colin-king) wrote :

You could try by removing acpi=off and using:

pci=use_crs

and see what that does.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

Tried that parameter but the network issue manifested again right now. Host was down until I plugged in a USB mouse: in some seconds the host could be contacted again.

Revision history for this message
Colin Ian King (colin-king) wrote :

It seems that the ACPI IRQ routing may be the cause. Can you try the kernel parameter:

acpi=noirq

1 comments hidden view all 105 comments
Revision history for this message
Luca Carlon (carlon-luca) wrote :

I tried with this cmdline:
BOOT_IMAGE=/vmlinuz-3.5.0-19-generic root=UUID=f5d18141-1192-4ba1-91ea-15445600f4ad ro quiet splash acpi=noirq apm=off vt.handoff=7

USB is not working and the network issue just manifested.

Revision history for this message
Colin Ian King (colin-king) wrote :

Luca, can you try with just the biosirq kernel option just to see if this helps.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

With biosirq the network is going down as well.

Revision history for this message
Colin Ian King (colin-king) wrote :

I suggest we default to using acpi=off irqpoll (which is not optimal) and I will see if I can find any alternative fixes.

Revision history for this message
Colin Ian King (colin-king) wrote :

Luca, since you said this was working before you upgraded, which release were you using when it worked?

Changed in linux:
importance: Unknown → High
status: Unknown → Fix Released
Revision history for this message
Luca Carlon (carlon-luca) wrote :

Colin, I've been using this hardware and Ubuntu for years, and only started to fail in the last year. I already tried some older kernels but none seemed to work anymore. Now I just tried kernel 3.2.0-31-generic-pae and it failed (only took a few hours). Do I have a simple way to install older kernels? Maybe 3.0 or 2.6?

Also, I see this bug was marked as "Fix Released", what does this mean? It points to a old bug I see, but shouldn't a fix for it already be included in recent kernels?

no longer affects: linux
Revision history for this message
Colin Ian King (colin-king) wrote :

Luca, you could try the official older kernels, however, one could try out the daily kernel builds found here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/

..there are plenty to try out.. perhaps from this one can bisect down and see which kernel causes the bug to occur. From this we should be able to figure out which patch causes the regression.

Revision history for this message
Luca Carlon (carlon-luca) wrote :

It is very difficult to track this down, connection crash is random, but according to my tests, it seems that kernel 2.6.32-02063258 is ok, 2.6.32-02063259 is not. 2.6.32-02063258 has been running correctly for more than 8 days. Now I'm running it and still seems ok with this command line: BOOT_IMAGE=/vmlinuz-2.6.32-02063258-generic root=UUID=f5d18141-1192-4ba1-91ea-15445600f4ad ro crashkernel=384M-2G:64M,2G-:128M quiet splash apm=off.

penalvch (penalvch)
tags: added: bios-outdated-f13
tags: added: needs-upstream-testing
Revision history for this message
penalvch (penalvch) wrote :

Luca Carlon, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
assignee: Colin King (colin-king) → nobody
status: Confirmed → Incomplete
Displaying first 40 and last 40 comments. View all 105 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.