bonding inside a bridge does not work when using arp monitoring

Bug #736226 reported by Leonardo Borda
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linux
Expired
Medium
linux (Fedora)
Won't Fix
High
linux (Ubuntu)
In Progress
Medium
Unassigned

Bug Description

Binary package hint: ifenslave-2.6

Description: Ubuntu 10.04.2 LTS
Release: 10.04

ifenslave-2.6:
  Installed: 1.1.0-14ubuntu2.1
  Candidate: 1.1.0-14ubuntu2.1
  Version table:
 *** 1.1.0-14ubuntu2.1 0
        500 http://ca.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        100 /var/lib/dpkg/status
     1.1.0-14ubuntu2 0
        500 http://ca.archive.ubuntu.com/ubuntu/ lucid/main Packages

Overview:

Bonding + bridge does not work when bonding mode is set to active-backup and arp monitoring is enabled. (bond_arp_ip_target and bond_arp_interval)

Reproducible: 100%

1. Install ifenslave
$ sudo apt-get install ethtool

2. Install bridge utils
$ sudo apt-get install bridge-utils vlan

3. Use the following /etc/network/interfaces sample.
auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bond-slaves none
    bond-mode active-backup
    bond_arp_ip_target 10.153.107.1
    bond_arp_interval 100

auto eth0
allow-bond0 eth0
iface eth0 inet manual
    bond-master bond0

auto eth1
allow-bond0 eth1
iface eth1 inet manual
    bond-master bond0

auto bond0.100
iface bond0.100 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    vlan-raw-device bond0

auto bond0.200
iface bond0.200 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    vlan-raw-device bond0

auto br0
iface br0 inet static
    address 10.153.107.110
    netmask 255.255.255.0
    gateway 10.153.107.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br0-100
iface br0-100 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bridge_ports bond0.100
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br0-200
iface br0-200 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bridge_ports bond0.200
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

Reboot the server. You won't be able to ping the br0 interface anymore.

Workaround:

Disable bond_arp_ip_target and bond_arp_interval and use MII monitoring. As follow below:

auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bond-slaves none
    bond-mode active-backup
    bond_arp_ip_target 10.153.107.1
    bond_arp_interval 100

auto eth0
allow-bond0 eth0
iface eth0 inet manual
    bond-master bond0

auto eth1
allow-bond0 eth1
iface eth1 inet manual
    bond-master bond0

auto bond0.100
iface bond0.100 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    vlan-raw-device bond0

auto bond0.200
iface bond0.200 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    vlan-raw-device bond0

auto br0
iface br0 inet static
    address 10.153.107.110
    netmask 255.255.255.0
    gateway 10.153.107.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br0-100
iface br0-100 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bridge_ports bond0.100
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

auto br0-200
iface br0-200 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bridge_ports bond0.200
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: ifenslave-2.6 1.1.0-14ubuntu2.1
ProcVersionSignature: Ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
Uname: Linux 2.6.32-29-server x86_64
Architecture: amd64
Date: Wed Mar 16 12:10:55 2011
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ifenslave-2.6
---
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Jan 11 15:35 seq
 crw-rw---T 1 root audio 116, 33 Jan 11 15:35 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
CurrentDmesg: [ 20.512006] eth0: no IPv6 routers present
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=638d7f01-7816-4a20-88fc-0d4f2819d96c
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.

 eth0 no wireless extensions.
MachineType: HP ProLiant DL380 G5
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_CA:en
 PATH=(custom, no user)
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-7-generic root=/dev/mapper/srv--bonding-root ro quiet
ProcVersionSignature: Ubuntu 3.2.0-7.13-generic 3.2.0-rc7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-7-generic N/A
 linux-backports-modules-3.2.0-7-generic N/A
 linux-firmware 1.67
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.0-7-generic x86_64
UpgradeStatus: Upgraded to precise on 2012-01-05 (5 days ago)
UserGroups:

dmi.bios.date: 07/10/2009
dmi.bios.vendor: HP
dmi.bios.version: P56
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP56:bd07/10/2009:svnHP:pnProLiantDL380G5:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL380 G5
dmi.sys.vendor: HP

Revision history for this message
Leonardo Borda (lborda) wrote :
Revision history for this message
Leonardo Borda (lborda) wrote :

There is a reported patch for this problem on: http://kerneltrap.org/mailarchive/linux-netdev/2010/4/28/6275890
Please read redhat's related bug: https://bugzilla.redhat.com/show_bug.cgi?id=584872

Revision history for this message
Leonardo Borda (lborda) wrote :

Here it is the correct workaround for the bond0 section.

auto bond0
iface bond0 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    bond-slaves none
    bond-mode active-backup
    bond-downdelay 250
    bond-updelay 120

Leonardo Borda (lborda)
affects: ifenslave-2.6 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Leonardo Borda (lborda) wrote :

Hi
The bug is also seen in the latest mainstream kernel.
A public bug has been opened with upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=31822

Leonardo

Leonardo Borda (lborda)
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Leonardo Borda (lborda) wrote :

Hi I've just tried latest Natty kernel and the problem persists.
I have increased the bond_arp_interval parameter to 1000 and now I only get this:

Aug 31 18:53:55 ubuntu kernel: [ 638.650009] bonding: bond0: no path to arp_ip_target 10.153.107.1 via rt.dev br0
Aug 31 18:53:56 ubuntu kernel: [ 639.650012] bonding: bond0: no path to arp_ip_target 10.153.107.1 via rt.dev br0
Aug 31 18:53:57 ubuntu kernel: [ 640.650010] bonding: bond0: no path to arp_ip_target 10.153.107.1 via rt.dev br0

arping is able to ping through br0

root@ubuntu:/home/ubuntu# arping -i br0 10.153.107.1
ARPING 10.153.107.1
60 bytes from 00:15:17:95:c1:b7 (10.153.107.1): index=0 time=235.081 usec
60 bytes from 00:15:17:95:c1:b7 (10.153.107.1): index=1 time=190.973 usec

Revision history for this message
Leonardo Borda (lborda) wrote :
Download full text (3.7 KiB)

root@ubuntu:/home/ubuntu# ifconfig
bond0 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MASTER MULTICAST MTU:1500 Metric:1
          RX packets:5191 errors:0 dropped:97 overruns:0 frame:0
          TX packets:1828 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:451505 (451.5 KB) TX bytes:274958 (274.9 KB)

bond0.100 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:138 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:9204 (9.2 KB)

bond0.200 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:138 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:9204 (9.2 KB)

br0 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet addr:10.153.107.110 Bcast:10.153.107.255 Mask:255.255.255.0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:3374 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1222 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:229482 (229.4 KB) TX bytes:219778 (219.7 KB)

br0-100 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:1070 (1.0 KB)

br0-200 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          inet6 addr: fe80::21b:78ff:febd:eff0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:1070 (1.0 KB)

eth0 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:5148 errors:0 dropped:51 overruns:0 frame:0
          TX packets:1782 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:448233 (448.2 KB) TX bytes:270898 (270.8 KB)
          Interrupt:16 Memory:f8000000-f8012800

eth1 Link encap:Ethernet HWaddr 00:1b:78:bd:ef:f0
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:43 errors:0 dropped:24 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrie...

Read more...

Revision history for this message
Leonardo Borda (lborda) wrote :

Tested on latest Oneiric Kernel 3.0.0.11 and now I can't ping it anymore. Problem still persists.

Leonardo

Revision history for this message
Jose Plans (jplans) wrote :

#woeng asked Joe Salisbury to help prioritising / ownership.

Revision history for this message
Jose Plans (jplans) wrote :

Apologies for the comment, wrong case.

tags: added: kernel-key
Revision history for this message
Chris J Arges (arges) wrote :

@lborda

Can you confirm this bug on the latest precise kernel?
Thanks,

--chris

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Leonardo Borda (lborda) wrote :

@Chris,

I tested the same config in Precise (12.04) and still the same issue. It looks like that the arp notifications are trying to be sent through br0, we must evaluate if this is the right workflow and also if the current configuration is an acceptable use case given how the code layers have been written in the kernel.

I am sending on the next comment a apport-collect 736226 from the test environment.

Leonardo

Revision history for this message
Leonardo Borda (lborda) wrote :

Looking at the kernel messages it looks like br0 is not able to send out arp notifications.
The question is:

1. Can we do that since we also have br0-{1,2}00 hook up to bond.100 and bond.200 ?
2. Also looking at the /etc/network/interfaces configuration example the vlans does not have an ip address which may play a role as per the bonding.txt documentation [1].

"
One gratuitous ARP is issued for the bonding master
  interface and each VLAN interfaces configured above
  it, provided that the interface has at least one IP
  address configured. Gratuitous ARPs issued for VLAN
  interfaces are tagged with the appropriate VLAN id.
"

[1] - http://www.kernel.org/doc/Documentation/networking/bonding.txt

Revision history for this message
Leonardo Borda (lborda) wrote : AcpiTables.txt

apport information

tags: added: apport-collected precise
description: updated
Revision history for this message
Leonardo Borda (lborda) wrote : BootDmesg.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : Lspci.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : Lsusb.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : ProcModules.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : UdevDb.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : UdevLog.txt

apport information

Revision history for this message
Leonardo Borda (lborda) wrote : WifiSyslog.txt

apport information

Revision history for this message
Chris J Arges (arges) wrote :

Sent an email to the upstream list:
http://www.spinics.net/lists/netdev/msg186033.html

tags: removed: kernel-key
Leonardo Borda (lborda)
description: updated
tags: added: kernel-bug-exists-upstream kernel-da-key
Changed in linux:
status: Confirmed → Expired
Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: nobody → Chris J Arges (christopherarges)
Chris J Arges (arges)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Chris J Arges (arges) wrote :
Revision history for this message
Chris J Arges (arges) wrote :

Latest version of the patch: http://patchwork.ozlabs.org/patch/197479/

Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → nobody
Changed in linux (Fedora):
importance: Unknown → High
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.