ip neigh flush dev eth0 is hanging

Bug #15413 reported by Herbert Straub
12
Affects Status Importance Assigned to Milestone
iproute (Ubuntu)
Fix Released
Medium
Ubuntu Desktop Bugs

Bug Description

The filewall script, builded by fwbuilder, "hangs", if i start it after
compiling. I see the problem with the line::

  ip route neigh flush dev eth0

The top command shows the looping process. The double -s option show the situation::

  ip -s -s route neigh flush dev eth0

 10.165.166.155 lladdr 00:09:5b:ee:72:55 ref 1 used 60/60/60 nud stale

 *** Round 1, deleting 1 entries ***
 10.165.166.155 lladdr 00:09:5b:ee:72:55 ref 1 used 60/60/60 nud stale

 *** Round 2, deleting 1 entries ***
 10.165.166.155 lladdr 00:09:5b:ee:72:55 ref 1 used 60/60/60 nud stale

 *** Round 3, deleting 1 entries ***
 10.165.166.155 lladdr 00:09:5b:ee:72:55 ref 1 used 60/60/60 nud stale

    and so on...

**Workarounds:**

The del command can remove the entry::

  ip neigh del 10.165.166.155 dev eth0
  RTNETLINK answers: Invalid argument

The arp -d command can also stops the looping ip route process::

  ip neigh flush dev eth0 &
  [1] 27892
  for a in `arp -n | awk '/^[0-9]/ { print $1; }'`; do arp -d $a; done
  <RETURN>
  [1]+ Done ip neigh flush dev eth0

**Possible Solutions:**

This error situation is documented in the "Debian BugNr:
282492":http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=282492 and all the tips
above are from there. Wilfried Weissmann created a
"patch":http://lkml.org/lkml/2005/1/28/55 - but i don't know, if it works.

**Fwbuilder:**

The flush functionality came with Firewall Builder 2.0.4. The firewall script,
created by fwbuilder, containing the following code::

  $IP -4 neigh flush dev eth0 >/dev/null 2>&1
  $IP -4 addr flush dev eth0 secondary label "eth0:FWB*" >/dev/null 2>&1
  $IP -4 neigh flush dev lo >/dev/null 2>&1
  $IP -4 addr flush dev lo secondary label "lo:FWB*" >/dev/null 2>&1

Revision history for this message
Herbert Straub (herbert) wrote :

Sorry, i forgot: this is a fresh Hoary installation with the standard packages:

uname -a
Linux fugazzi 2.6.10-5-amd64-generic #1 Tue Apr 5 12:21:57 UTC 2005 x86_64 GNU/Linux

dpkg -l iproute
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-================-================-================================================
ii iproute 20041019-1 Professional tools to control the
networking in

This bug is in my configuration reproducable.

Revision history for this message
Herbert Straub (herbert) wrote :
Download full text (3.2 KiB)

If this bug is not a kernel problem, then this could be a iproute problem. I
found this workaround.

**Patch for iproute**

  Another possible solution is a change of iproute, to avoid the loop situation.
The source code for flush looks like::

              for (;;) {
                      if (rtnl_wilddump_request(&rth, filter.family,
RTM_GETNEIGH) < 0) {
                              perror("Cannot send dump request");
                              exit(1);
                      }
                      filter.flushed = 0;
                      if (rtnl_dump_filter(&rth, print_neigh, stdout, NULL,
NULL) < 0) {
                              fprintf(stderr, "Flush terminated\n");
                              exit(1);
                      }
                      if (filter.flushed == 0) {
                              if (round == 0) {
                                      fprintf(stderr, "Nothing to flush.\n");
                              } else if (show_stats)
                                      printf("*** Flush is complete after %d
round%s ***\n", round, round>1?"s":"");
                              fflush(stdout);
                              return 0;
                      }
                      round++;
                      if (flush_update() < 0)
                              exit(1);
                      if (show_stats) {
                              printf("\n*** Round %d, deleting %d entries
***\n", round, filter.flushed);
                              fflush(stdout);
                      }
              }
      }

  There is no way out of the loop, if the arp entry cannot be flushed. The 'for
(;;)' construction can be found in other .c files. In iproute.c there is a exit
construction like the following patch::

    --- ip/ipneigh.c.orig 2005-07-26 16:10:40.850647298 +0200
    +++ ip/ipneigh.c 2005-07-26 16:11:09.302486025 +0200
    @@ -410,6 +410,7 @@
                    filter.flushe = sizeof(flushb);
                    filter.rth = &rth;
                    filter.state &= ~NUD_FAILED;
    + time_t start = time(0);

                    for (;;) {
                            if (rtnl_wilddump_request(&rth, filter.family,
RTM_GETNEIGH) < 0) {
    @@ -432,6 +433,12 @@
                            round++;
                            if (flush_update() < 0)
                                    exit(1);
    + if (time(0) - start > 30) {
    + printf("\n*** Flush not completed after %ld
seconds, %d entries remain ***\n",
    + time(0) - start, filter.flushed);
    + exit(1);
    + }
    +
                            if (show_stats) {
                                    printf("\n*** Round %d, deleting %d entries
***\n", round, filter.flushed);
                                    fflush(stdout);

  This is not ideal, but avoid a endless loop. The output of 'ip -s -4 neigh
flush dev eth1' looks like::

    ...

    *** Round 57934, deleting 1 entries ***

    *** Round 57935, deleting 1 entries ***

    *** Round 57936, deleting 1 entries ***

    *** Flush not complet...

Read more...

Revision history for this message
Herbert Straub (herbert) wrote :

I'm writing to the maintainer Stephen Hemminger at odsl.org of the iproute2
package and descripe the error situation. His answer:

+++
Thanks, this usually shows up when someone tries to run flush
as non-root. Some vendors added a check for getuid() != 0, but that
fails in secure environments with capabilities and no root user.

I'll probably just change it to try 10 times and give up.
+++

And iproute2-050816 containig the following ChangeLog entry:

    2005-08-16 Stephen Hemminger <email address hidden>

        * Limit ip route flush to 10 rounds.
        * Cleanup ip rule flush error message

Ok, i backported his changes to the Ubuntu Hoary iproute package and the output of:

localhost:~ # ip neigh flush dev eth1
*** Flush not complete bailing out after 10 rounds

Looks good. The isolated patch:
--- ip/ipneigh.c.ORIG 2005-08-17 22:11:06.000000000 +0200
+++ ip/ipneigh.c 2005-08-17 22:13:02.000000000 +0200
@@ -31,6 +31,7 @@
 #include "ip_common.h"

 #define NUD_VALID
(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)
+#define MAX_ROUNDS 10

 static struct
 {
@@ -411,7 +412,7 @@
   filter.rth = &rth;
   filter.state &= ~NUD_FAILED;

- for (;;) {
+ while (round < MAX_ROUNDS) {
    if (rtnl_wilddump_request(&rth, filter.family, RTM_GETNEIGH) < 0) {
     perror("Cannot send dump request");
     exit(1);
@@ -437,6 +438,9 @@
     fflush(stdout);
    }
   }
+ printf("*** Flush not complete bailing out after %d rounds\n",
+ MAX_ROUNDS);
+ return 1;
  }

  if (rtnl_wilddump_request(&rth, filter.family, RTM_GETNEIGH) < 0) {

A prebuild iproute package for Ubuntu Hoary is available on my site
http://apt-get.linuxhacker.at/ubuntu/dists/hoary/main/pool/ and the patch:
https://info.linuxhacker.at/Patches/iproute-flush2.patch and the full errorlog
https://info.linuxhacker.at/wiki/FwbuilderFirewallScrptHanging

Piotr Roszatyck tested the new iproute2 version today with Debian and document
this in the Debian Bug Entry #282492
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=282492).

The testoutput with my iproute-20041019-1hs1 package. With the original Ubuntu
Hoary package on my Notebook:
~# ip neigh
10.165.167.1 dev ath0 lladdr 00:09:5b:ee:72:54 nud stale
~# ip neigh flush dev ath0
----> hanging

Now installing the new package:
~# apt-cache policy iproute
iproute:
  Installed: 20041019-1
  Candidate: 20041019-1hs1
  Version table:
     20041019-1hs1 0
        500 http://apt-get.linuxhacker.at hoary/main Packages
 *** 20041019-1 0
        500 http://at.archive.ubuntu.com hoary/main Packages
        100 /var/lib/dpkg/status

apt-get install iproute
~# ip neigh
10.165.167.1 dev ath0 lladdr 00:09:5b:ee:72:54 nud stale
root@faultier:~# ip neigh flush dev ath0
*** Flush not complete bailing out after 10 rounds

Will this patch integrated in the new Breezy?

Revision history for this message
Herbert Straub (herbert) wrote :

This bug is also present in a fresh installation of Breezy Colony 3. The output:

root@koala:~# ip neigh
10.33.44.55 dev eth0 lladdr 00:00:0c:07:ac:00 nud reachable
root@koala:~# ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***
...
*** Round 176, deleting 1 entries ***

*** Round 177, deleting 1 entries ***

*** Round 178, deleting 1 entries ***

---> and so on: interrupting with Cntl-C

After applying the patch the output of ip neigh flush...:
root@koala:~# ip neigh flush dev eth0
*** Flush not complete bailing out after 10 rounds

Revision history for this message
Ben Collins (ben-collins) wrote :

This bug has been flagged because it is old and possibly inactive. It may or may
not be fixed in the latest release (Breezy Badger 5.10). It is being marked as
"NEEDSINFO". In two weeks time, if the bug is not updated back to "NEW" and
validated against Breezy, it will be closed.

This is needed in order to help manage the current bug list for the kernel. We
would like to fix all bugs, but need users to test and help with debugging.

If this change was in error for this bug, please respond and make the
appropriate change (or email <email address hidden> if you cannot make the
change).

Thanks for your help.

Revision history for this message
Herbert Straub (herbert) wrote :

This bug also exists in Breezy (fresh installation with apt-get dist-upgrade).
This bug should be easy reproduced with

ip -s -s neigh flush dev eth0

The loop never ends. Please see Comment #3 to see the answer and the changes
from Stephen Hemminger. I think the Debian/Ubuntu version of the iproute package
ist too old and doesn't contains the changes. This patch prevents the command to
loop endless. This is very bad if using the Firewall Builder, because the
Firewall Builder creates a script, which do this flush command and never ending.

Thanks
Herbert Straub

Revision history for this message
Herbert Straub (herbert) wrote :

Actual state: Same error on Dapper Flight 4

I recompiled the actual iproute2 Package:
~# dpkg -l iproute
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii iproute 2-2.6.15-06011 Professional tools to control the networking

ip -V
ip utility, iproute2-ss060110

But the error is the same:
ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***

*** Round 2, deleting 1 entries ***

...
*** Round 10, deleting 1 entries ***
*** Flush not complete bailing out after 10 rounds

"..bailing out after 10 rounds" is from the modified Source by Stephen Hemminger at odsl.org. The table cannot be flushed, but the command doesn't hang with 100% CPU time.

# uname -r
2.6.15-15-amd64-generic

Now i'm testing this situation on Fedora Core 5 test 3:

ip -V
ip utility, iproute2-ss060110

# ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***
*** Flush is complete after 1 round ***

uname -r
2.6.15-1.1955_FC5smp

Working.

Now Suse 10.1beta5:
# ip -V
ip utility, iproute2-ss060110

ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***
*** Flush is complete after 1 round ***

I think, only Ubuntu (Debian) has problems with the ip neigh flush command.

Best regars
Herbert Straub

Revision history for this message
Herbert Straub (herbert) wrote :

The following kernel paramter fix this bug:

CONFIG_ATM=m
CONFIG_ATM_CLIP=m

I build a modified linux-image package for breezy and it looks good:

~# ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***
*** Flush is complete after 1 round ***

Tomorrow i will test this under Dapper. I found this solution on the Debian kernel mailing list:

http://lists.debian.org/debian-kernel/2005/08/msg00650.html

Fedora Core 5Test3 and Suse 10.1beta5 using the same setttings.

Best regards
Herbert Straub

Revision history for this message
Herbert Straub (herbert) wrote :

Doing the kernel recompile with

CONFIG_ATM=m
CONFIG_ATM_CLIP=m

on a fresh installed Dapper Flight 4 on a amd64 with apt-get -u dist-upgrade and (after some troubles with zd1211) i get this result:

ip -s neigh flush dev eth0

*** Round 1, deleting 1 entries ***

*** Round 2, deleting 1 entries ***

*** Round 3, deleting 1 entries ***
*** Flush is complete after 3 rounds ***

Revision history for this message
Herbert Straub (herbert) wrote :

Same problem on Dapper Flight 5.

Phil Bull (philbull)
Changed in iproute:
assignee: debzilla → desktop-bugs
status: Unconfirmed → Confirmed
Revision history for this message
Vitor Choi Feitosa (vchoi) wrote :

Looks like only Herbert, Ben (robot?) and I saw this bug.

Herbert's report on this bug is really helpful and I believe that after one year it's time to get other people looking at this bug so I'm adding ubuntu server team, as this bug affect ubuntu's server usage (as a firwall).

Revision history for this message
Vitor Choi Feitosa (vchoi) wrote :

I've subscribed the ubuntu server and kernel teams because the solution to this bug involves a change in kernel configuration.

Revision history for this message
Ben Collins (ben-collins) wrote :

Just a note, the dapper kernel has both CONFIG_ATM and CONFIG_ATM_CLIP enabled (=y, not =m).

Revision history for this message
Herbert Straub (herbert) wrote :

Status Dapper:

root@test:~# ip neigh flush dev eth0
*** Flush is not complete after 10 rounds ***

Revision history for this message
Ben Collins (ben-collins) wrote :

Assuming that means it's fixed.

Changed in iproute:
status: Confirmed → Fix Released
Revision history for this message
Herbert Straub (herbert) wrote :

No, i think it is not - maybe partly :-).

A ip neigh flush dev eth0 returns
*** Flush is not complete after 10 rounds ***
and echo $? is 1! I think 0 is no error?

In Fedora and Suse the CONFIG_ATM and CONFIG_ATM_CLIP are modules. In Debian and Ubuntu this two are fix compiled into the kernel. I think, if this two components compiled as modules, then the ip neigh flush dev eth0 command returns 0 - also on Debian and Ubuntu - not only on Fedora and Suse ;-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.