Ntrack dead loop in function get_nl_link_by_index

Bug #755608 reported by csslayer
232
This bug affects 57 people
Affects Status Importance Assigned to Milestone
ntrack
Fix Released
High
Alexander Sack
ntrack (Ubuntu)
Fix Released
High
Scott Kitterman
Natty
Fix Released
High
Scott Kitterman

Bug Description

Using Archlinux and KDE 4.6.2, ntrack 013.
pptp vpn on and off will cause kded4 using 100% cpu, and backtrace shows that it get deadloop in get_nl_link_by_index. Seems the scan on cycle linked list in get_nl_link_by_index will never get to end if there is not break satisfies. I don't know whether it is ntrack bug or KDE's bug.

at least patch this function to get the loop end properly after a cycle fix this bug for me.

Related bug report: https://bugs.kde.org/show_bug.cgi?id=268038

TEST CASE: Connect to a VPN when in an KDE session. See high CPU usage when connecting/disconnecting from the VPN. Install the updated package. Rinse. Repeat. See normal CPU usage.

Revision history for this message
csslayer (wengxt) wrote :
Revision history for this message
Greg White (gwhite-kupulau) wrote :

This is also present in Kubuntu 11.04.

It can be triggered by suspend/resume and pretty much renders suspend/resume useless, or will quickly drain a notebooks battery if the user doesn't realize that kded4 has run wild in the background.

Please fix!

Revision history for this message
Alexander Sack (asac) wrote :

thanks for finding this. I see something else is fishy with VPNs now ... investigating.

Changed in ntrack:
importance: Undecided → High
milestone: none → 015
status: New → Confirmed
Revision history for this message
Alexander Sack (asac) wrote :

fwiw, i think something below the libntrack layer changed that causes us getting routes with null oif/nexthop devs, but ntrack should be more robust and show BLOCKED for the time when that state happens.

Revision history for this message
Alexander Sack (asac) wrote :

fixed this particular bug ... however, there are still some issues with VPN that i am investigating.

------------------------------------------------------------
revno: 312
fixes bug(s): https://launchpad.net/bugs/755608
committer: Alexander Sack <email address hidden>
branch nick: ntrack
timestamp: Mon 2011-04-25 19:25:24 +0200
message:
  modules[libnl]: fix infinite loop if route oif/nhopif is NULL; thx to csslayer for helping - lp:755608
------------------------------------------------------------

Changed in ntrack:
status: Confirmed → Fix Committed
assignee: nobody → Alexander Sack (asac)
Revision history for this message
Alexander Sack (asac) wrote :

FYI, filed lp:770390 on one of the (NM) VPN issues I see.

Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

I am not happy Kubuntu natty is released without this fix. Unplug the network cable or the USB dongle kded4 eats up 100% cpu making the start menu and the task bar unusable.

Revision history for this message
Greg White (gwhite-kupulau) wrote :

I can confirm this is still happening, and I would like to re-iterate just how serious this bug is.

I am somewhat surprised it wasn't fixed and/or regarded as a show stopper as it essentially renders suspend/resume useless.

Are there plans to resolve this in the near future?

Revision history for this message
Alexander Sack (asac) wrote :

hi guys, the fix is in upstream not complete for VPN yet ... the looping is gone, but VPN can disturb online/offline state; I am travelling for the next two weeks but will see what I can do; once I have a real fix I will try to get a special version of it into natty.

Revision history for this message
Alexander Sack (asac) wrote :

and yes, ETA is really soon!

Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

Thank you for your kind consideration. I will be happy to help you test this. Really appreciate again!

Revision history for this message
Harald Sitter (apachelogger) wrote :

Perfectly reproducible on Kubuntu 11.04.

* Setup vpn (at least openvpn and vpnc I know cause the issue).
* Connect using Plasma networkmanagement widget
* Disconnect once connection is established
* Watch kded4 CPU load rise

Changed in ntrack (Ubuntu):
assignee: nobody → Alexander Sack (asac)
importance: Undecided → High
milestone: none → oneiric-alpha-1
status: New → Triaged
Changed in ntrack (Ubuntu Natty):
assignee: nobody → Alexander Sack (asac)
importance: Undecided → High
milestone: none → natty-updates
status: New → Triaged
Revision history for this message
Harald Sitter (apachelogger) wrote :
Changed in ntrack (Ubuntu):
assignee: Alexander Sack (asac) → Scott Kitterman (kitterman)
status: Triaged → In Progress
Changed in ntrack (Ubuntu Natty):
assignee: Alexander Sack (asac) → Scott Kitterman (kitterman)
status: Triaged → In Progress
Revision history for this message
Paul J. Adams (adams-kolabsys) wrote :

Deployed the updated packages from Scott's PPA on Kubuntu Natty. Disconnecting from VPN no-longer triggers this bug and kded4 no-longer eats 100% CPU as a result.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ntrack - 011-1ubuntu2

---------------
ntrack (011-1ubuntu2) oneiric; urgency=low

  * Add debian/patches/dead-loop-fix.patch to fix infinite loop if route
    oif/nhopif is NULL (LP: #755608)
    - One source of high/100% CPU usage in KDE
 -- Scott Kitterman <email address hidden> Tue, 17 May 2011 11:41:56 -0400

Changed in ntrack (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Scott Kitterman (kitterman) wrote :

Proposed SRU uploaded for Natty. Waiting for Ubuntu SRU team review.

description: updated
Revision history for this message
Afiefh (afiefh) wrote :

Using Kubuntu 11.04 when my 3G disconnects I still get kded4 eating up 100% cpu. A side effect is that it also hangs the plasma-netbook workspace. Is this the same issue or something else?

Revision history for this message
Scott Kitterman (kitterman) wrote :

It'd probably work best for anyone still having problems after getting a package with this fix to file a new bug.

Revision history for this message
Alexander Sack (asac) wrote :

iirc, this patch on top of last release should fix this bug and also keep online/offline state up-to-date properly for VPNs i tried ... anyone wants to give this a go? (has some debugging in it still which needs sanitizing)

Revision history for this message
Alexander Sack (asac) wrote :

the patch initially posted to the bug has the problem that you might get stuck in offline state after you disconnect from VPN ... which isnt that great. my patch (IIRC) ensures that all the netlink caches are properly refilled etc. everytime something changes. Let me know if a) the loop is gone and b) if you are still online after disconnecting from VPN, but keeping ethernet/wifi link up!

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted ntrack into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in ntrack (Ubuntu Natty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Scott Kitterman (kitterman) wrote :

@asac: Does that mean we should not be putting your rev 312 into an SRU?

Revision history for this message
Alexander Sack (asac) wrote :

Scott, rev 312 - similar to the patch attached here - only fixes the loop, but might cause you being stuck in offline after VPN disconnect until you reconnect wireless/wired ...

the patch I attached above is a brute-force approach I did initially to fix all issues, but I didn't commit it to trunk because I wanted to have a less forcing solution. personally I think for ubuntu we should backout the patch that was uploaded and apply the complete patch from above without debugging prints ...

let me get back to you tomorrow.

Revision history for this message
Alexander Sack (asac) wrote :
Revision history for this message
Alexander Sack (asac) wrote :

011 backport of backend from 015 + patch in https://bugs.launchpad.net/ntrack/+bug/755608/comments/24

Please test if that helps!

Revision history for this message
Alexander Sack (asac) wrote :

note that common/ntrackarchapi.h is internal API.

Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

@pitti: I don't see ntrack in natty-proposed. Should I have to wait few more hours?

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 755608] Re: Ntrack dead loop in function get_nl_link_by_index

Sanjaya Karunasena [2011-05-20 0:09 -0000]:
> @pitti: I don't see ntrack in natty-proposed. Should I have to wait few
> more hours?

It is now.

Revision history for this message
Alexander Sack (asac) wrote :

Sanjaya, please keep your eyes open if you are stuck in "offline" after disconnecting from VPN. Thanks!

Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

@pitti: Thanks got it.

@asac: Just tested for Wired Ethernet, Mobile Broadband, and a VPN from USAIP. No more 100% cpu issues on disconnect. I didn't get stuck in "offline" after disconnecting from VPN. USAIP is PPTP. Will OpenVPN make a difference?

Revision history for this message
Scott Kitterman (kitterman) wrote :

I did put an updated package with asac's new patch in my ppa. Would people who've tried the first fix please try this one.

Revision history for this message
avlas (avlas) wrote :

I tried proposed packages and they worked perfectly here with a vpnc connection, no loop after disconnecting either

Revision history for this message
Ladislav Nesnera (nesnera) wrote :

Re #31:
Great work! Your package solves my annoying problem which I described in Bug #777526. Thanks a lot (y)

Revision history for this message
Jens Taprogge (jlt-launchpad) wrote :

Re #25:
The package in natty-proposed (011-1ubuntu1.1) fixes the issue for me.

Revision history for this message
Greg White (gwhite-kupulau) wrote :

I can confirm this has fixed my issue. Very nice!

Revision history for this message
Nicola Menegazzi (pugacioff83) wrote :

latest updates fixed the issue for me - using huawei 1550 broadband modem.

thanks!!!

Revision history for this message
Jose Bernardo (bernardo-bandos) wrote :

I've tested connecting and disconnecting vpn and 3G (Huawei E352), and no problems at all. Looks like the package in natty-proposed has the right patch.

Revision history for this message
Scott Kitterman (kitterman) wrote :

I did try asac's new patch and kmail seems to hang and crash now and then. It
didn't to this before. I'm tempted to say the one in proposed now is what we
should stick with.

Revision history for this message
Ladislav Nesnera (nesnera) wrote :

Re #38:
My experience is different. I use Kmail too and it works without troubles although I applied Scott Kitterman's package.
Qt: 4.7.2
KDE Development Platform: 4.6.2 (4.6.2)
KMail: 1.13.6
Linux pc 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 11.04

Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

Re #31:

Tested the ppa and it works fine too. However, I don't get any VPN connection related issues in natty-proposed as well.

Re #38 & #39:
I also don't get any KMail (I mean Kontact) issues with Scott Kitterman's package.

However, I reported another issue which was troubling me for some time here: https://bugs.launchpad.net/ntrack/+bug/786049.

Revision history for this message
Alexander Sack (asac) wrote :

If people dont see a regression the simple one liner is probably fine.

Scott, getting a backtrace of the crashes you see would be helpful. I definitly see problems with VPN here that I need to fix, so a patch like mine will be coming sooner or later.

Revision history for this message
Alexander Sack (asac) wrote :

btw, my case is when disconnecting from PPTP with NM in natty.

Revision history for this message
Scott Kitterman (kitterman) wrote :

On Monday, May 23, 2011 07:52:01 PM you wrote:
> Scott, getting a backtrace of the crashes you see would be helpful.

See https://bugs.kde.org/show_bug.cgi?id=273736

Revision history for this message
Michael Wiesner (wiesner-m-rgbg) wrote :

I've updated my system with Scott Kitterman's ppa. After disconnecting my VPNC-connection the 100% cpu issue is gone. But my IMAP-account in kmail doesn't work correct after disconnect. Furthermore shutdown, logoff etc. produce KDE-error messages (kmail-errors).

I think, I shoult change to gnome...

Revision history for this message
Scott Kitterman (kitterman) wrote :

I've had mixed results with the version in my PPA. In some cases it's better
than what's in natty-proposed and in some cases it's worse. I have not run
into a case where the natty-proposed change regresses from what we released
with.

I think the -proposed upload ought to get into -updates and then we'll work on
a better fix.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ntrack - 011-1ubuntu1.1

---------------
ntrack (011-1ubuntu1.1) natty-proposed; urgency=low

  * Add debian/patches/dead-loop-fix.patch to fix infinite loop if route
    oif/nhopif is NULL (LP: #755608)
    - One source of high/100% CPU usage in KDE
 -- Scott Kitterman <email address hidden> Tue, 17 May 2011 11:41:56 -0400

Changed in ntrack (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Sanjaya Karunasena (sanjayak) wrote :

Excellent! Thanks

Revision history for this message
Alexander Sack (asac) wrote :

thanks scott ... in your PPA was my big patch, right? and in -propose/-update we just have the one liner?

Sounds good. I think my problem with PPTP VPN is really in libnlX somehow. route -n and ifconfig -a show the proper values at least for default route and the out IF, but libnl cache doesn't get to know about the out IF until quite a while later (and sometimes never).

Anyway, lets declare this one fixed. The problem I see is tracked in lp:770390

Revision history for this message
Scott Kitterman (kitterman) wrote :

Alexander Sack <email address hidden> wrote:

>thanks scott ... in your PPA was my big patch, right? and in
>-propose/-update we just have the one liner?

Yes. Exactly.

Revision history for this message
Alexander Sack (asac) wrote :

ntrack 015 is available including the fix for this bug: https://launchpad.net/ntrack/main/015

Changed in ntrack:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.