Implementing a bridge slows 10G network

Bug #894608 reported by Mike Imelfort
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
bridge-utils (Ubuntu)
Invalid
Medium
Unassigned
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Installing bridge-utils and then setting up a bridge in /etc/netwrok/interfaces decimates my netwrok speed. Uninstalling the package and doing a reboot restores the speed.

This bug was originally posted here: https://bugs.launchpad.net/qemu/+bug/861141 but I now realise it is bridge-utils specific.

Hardware:
I have two identical machines (DELL poweredge R815), each with Broadcom NetXtreme II BCM57711 10-Gigabit PCIe connected via a 10G switch.
Each machine runs 4 socket 12 core AMD Opteron(tm) Processor 6174 and 128GB of ram

Software:
Both machines ar running ubuntu server 10.04 lts Kernels are:
Linux whitlam 2.6.32-28-server #55-Ubuntu SMP Mon Jan 10 23:57:16 UTC 2011 x86_64 GNU/Linux
Linux fraser 2.6.32-28-server #55-Ubuntu SMP Mon Jan 10 23:57:16 UTC 2011 x86_64 GNU/Linux

I have installed iperf on both machines.

The tests here involve running whitlam as the iperf server and fraser as the client.
On whitlam I run:
bioadmin@whitlam:~# iperf -sm

On fraser I run:
bioadmin@fraser:~# iperf -c whitlam -d

before installation I get:
bioadmin@whitlam:~# iperf -sm
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 5] local 10.168.48.14 port 5001 connected with 10.168.48.13 port 37960
------------------------------------------------------------
Client connecting to 10.168.48.13, TCP port 5001
TCP window size: 110 KByte (default)
------------------------------------------------------------
[ 6] local 10.168.48.14 port 49627 connected with 10.168.48.13 port 5001
Waiting for server threads to complete. Interrupt again to force quit.
[ 5] 0.0-10.0 sec 10.8 GBytes 9.23 Gbits/sec
[ 5] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[ 6] 0.0-10.0 sec 10.6 GBytes 9.11 Gbits/sec
[ 6] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

Which is nice.

Next I run:

sudo aptitude -y install bridge-utils

and then I add a bridge to the interfaces file like thus:

FROM:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth4
iface eth4 inet static
 address xx.xx.xx.xx
 netmask 255.255.255.0
 network xx.xx.xx.0
 broadcast xx.xx.xx.255
 gateway xx.xx.xx.1
 # dns-* options are implemented by the resolvconf package, if installed
 dns-nameservers blah blah

TO:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#auto eth4
#iface eth4 inet manual

auto br0
iface br0 inet static
 address xx.xx.xx.xx
 netmask 255.255.255.0
 network xx.xx.xx.0
 broadcast xx.xx.xx.255
 gateway xx.xx.xx.1
 bridge_ports eth4
 bridge_stp off
 # dns-* options are implemented by the resolvconf package, if installed
 dns-nameservers blah blah

And restart networking

sudo /etc/init.d/netwroking restart

Which results in:
bioadmin@whitlam:~# iperf -sm
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.168.48.14 port 5001 connected with 10.168.48.13 port 56405
------------------------------------------------------------
Client connecting to 10.168.48.13, TCP port 5001
TCP window size: 1.25 MByte (default)
------------------------------------------------------------
[ 6] local 10.168.48.14 port 57001 connected with 10.168.48.13 port 5001
Waiting for server threads to complete. Interrupt again to force quit.
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 10.5 GBytes 9.00 Gbits/sec
[ 4] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[ 6] 0.0-10.0 sec 2.51 GBytes 2.15 Gbits/sec
[ 6] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

Note that the connection from the machine with bridge-utils installed (fraser) to the other machine (whitlam) is not affected. Only the connection from whitlam to fraser.

Both machines are fresh installs with nfs-common, iperf installed. Fraser also has bridge-utils

Any help is very much appreciated!
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 (20110211.1)
MachineType: Dell Inc. PowerEdge R815
Package: linux (not installed)
PackageArchitecture: amd64
PciMultimedia:

ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-35-server root=UUID=cfbcc41d-7968-457b-971a-6b7d5d56e6d2 ro quiet
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-35.78-server 2.6.32.46+drm33.20
Regression: No
Reproducible: Yes
Tags: lucid lucid networking needs-upstream-testing
Uname: Linux 2.6.32-35-server x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 08/02/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.1
dmi.board.name: 04Y8PT
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.1:bd08/02/2010:svnDellInc.:pnPowerEdgeR815:pvr:rvnDellInc.:rn04Y8PT:rvrA00:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R815
dmi.sys.vendor: Dell Inc.

description: updated
Changed in bridge-utils (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. I've marked it as also affecting the kernel as the bridge driver seems most likely to be the problem.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Several sources suggest that turning odd tso, and perhaps also sg and tx, might help. Could you try

sudo ethtool -K eth4 tso off
sudo ethtool -K eth4 sg off
sudo ethtool -K eth4 tx off

and try again?

Also, could you show the result of 'brctl show' and 'ip link'?

Changed in bridge-utils (Ubuntu):
status: New → Incomplete
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 894608

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: lucid
Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote :

Thanks for getting back to me.

Here is brctl show output:

$ brctl show
bridge name bridge id STP enabled interfaces
br0 8000.0010187ee074 no eth4

and ip link:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f0:4d:a2:3b:9a:9e brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f0:4d:a2:3b:9a:a0 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f0:4d:a2:3b:9a:a2 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f0:4d:a2:3b:9a:a4 brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:10:18:7e:e0:74 brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:10:18:7e:e0:76 brd ff:ff:ff:ff:ff:ff
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:10:18:7e:e0:74 brd ff:ff:ff:ff:ff:ff

Also:

"Several sources suggest that turning odd tso, and perhaps also sg and tx, might help. Could you try"

Tried this but it made no noticeable difference.

Thanks for your help so far.

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : Dependencies.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : Lspci.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : Lsusb.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : ProcModules.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : UdevDb.txt

apport information

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote : UdevLog.txt

apport information

Changed in bridge-utils (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote :

I just wanted to make sure that I can reproduce this from scratch, for anyone who's interested...

Fresh install from 10.04.3 CD, choosing openssh-server as the only "extra" package

$ sudo -s
$ apt-get update
$ apt-get dist-upgrade
$ aptitude install iperf

...

 test network speed -> all OK

...

$ aptitude install bridge-utils

...

make changes to interfaces file as outlined above

$ /etc/init.d/networking restart

...

test network speed -> slow

...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Mike. I tried to reproduce this with two lucid machines. With both
not using bridges, I got:

------------------------------------------------------------
Client connecting to 10.55.55.5, TCP port 5001
TCP window size: 166 KByte (default)
------------------------------------------------------------
[ 5] local 10.55.55.5 port 34214 connected with 10.55.55.5 port 5001
[ 4] local 10.55.55.5 port 5001 connected with 10.55.55.5 port 34214
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 16.3 GBytes 14.0 Gbits/sec
[ 4] 0.0-10.0 sec 16.3 GBytes 14.0 Gbits/sec
root@mabolo:~# iperf -c 10.55.55.5 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.55.55.5, TCP port 5001
TCP window size: 199 KByte (default)
------------------------------------------------------------
[ 5] local 10.55.55.5 port 34215 connected with 10.55.55.5 port 5001
[ 4] local 10.55.55.5 port 5001 connected with 10.55.55.5 port 34215
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 16.4 GBytes 14.1 Gbits/sec
[ 4] 0.0-10.0 sec 16.4 GBytes 14.1 Gbits/sec

With one having a bridge, I got:

------------------------------------------------------------
Client connecting to 10.55.55.5, TCP port 5001
TCP window size: 49.7 KByte (default)
------------------------------------------------------------
[ 3] local 10.55.55.5 port 34216 connected with 10.55.55.5 port 5001
[ 5] local 10.55.55.5 port 5001 connected with 10.55.55.5 port 34216
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 23.0 GBytes 19.7 Gbits/sec
[ 5] 0.0-10.0 sec 23.0 GBytes 19.7 Gbits/sec
root@mabolo:~# iperf -c 10.55.55.5 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.55.55.5, TCP port 5001
TCP window size: 49.7 KByte (default)
------------------------------------------------------------
[ 4] local 10.55.55.5 port 34217 connected with 10.55.55.5 port 5001
[ 5] local 10.55.55.5 port 5001 connected with 10.55.55.5 port 34217
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 23.2 GBytes 20.0 Gbits/sec
[ 5] 0.0-10.0 sec 23.2 GBytes 20.0 Gbits/sec

However, looking back at your results, look at the TCP window sizes. They
are similar to mine in all but the degenerate case, where it is 1.25Mb.

Could you show the values of the following files:

   /proc/sys/net/ipv4/tcp_rmem
   /proc/sys/net/ipv4/tcp_wmem

on both hosts?

Revision history for this message
Mike Imelfort (mike-mikeimelfort) wrote :

Here are the results:

(Fraser is the one with bridge utils installed)

fraser:~$ cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 4194304
fraser:~$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304

whitlam:~$ cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 4194304
whitlam:~$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Those look normal and the same as on my systems...

Could you try running iperf with the added '-w 80k' argument?

Changed in bridge-utils (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
michael imelfort (michael-imelfort) wrote :

I have '-w 80k' now. It's still the same...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Could you show the results? (I'm curious what it then reports as TCP window size)

Changed in bridge-utils (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
michael imelfort (michael-imelfort) wrote :
Download full text (3.3 KiB)

Oops sorry, Here tis'

whitlam:~$ iperf -sm -w 80k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 160 KByte (WARNING: requested 80.0 KByte)
------------------------------------------------------------
[ 4] local 10.168.48.14 port 5001 connected with 10.168.48.13 port 38575
------------------------------------------------------------
Client connecting to 10.168.48.13, TCP port 5001
TCP window size: 160 KByte (WARNING: requested 80.0 KByte)
------------------------------------------------------------
[ 6] local 10.168.48.14 port 49025 connected with 10.168.48.13 port 5001
Waiting for server threads to complete. Interrupt again to force quit.
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 5.52 GBytes 4.74 Gbits/sec
[ 4] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[ 6] 0.0-10.0 sec 3.46 GBytes 2.97 Gbits/sec
[ 6] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
whitlam:~$ iperf -sm
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.168.48.14 port 5001 connected with 10.168.48.13 port 38576
------------------------------------------------------------
Client connecting to 10.168.48.13, TCP port 5001
TCP window size: 259 KByte (default)
------------------------------------------------------------
[ 6] local 10.168.48.14 port 49026 connected with 10.168.48.13 port 5001
Waiting for server threads to complete. Interrupt again to force quit.
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.71 GBytes 1.47 Gbits/sec
[ 4] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
[ 6] 0.0-10.0 sec 5.40 GBytes 4.64 Gbits/sec
[ 6] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

fraser:~$ iperf -c whitlam -d -w 80k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 160 KByte (WARNING: requested 80.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to whitlam, TCP port 5001
TCP window size: 160 KByte (WARNING: requested 80.0 KByte)
------------------------------------------------------------
[ 5] local 10.168.48.13 port 38575 connected with 10.168.48.14 port 5001
[ 4] local 10.168.48.13 port 5001 connected with 10.168.48.14 port 49025
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 5.52 GBytes 4.74 Gbits/sec
[ 4] 0.0-10.0 sec 3.46 GBytes 2.97 Gbits/sec
fraser:~$ iperf -c whitlam -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to whitlam, TCP port 5001
TCP window size: 279 KByte (default)
------------------------------------------------------------
[ 5] local 10.168.48.13 port 38576 connected with 10.168.48.14 port 5001
[ 4] local 10.168.48.13 port 5001 connected with 10.168.48.14 po...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Michael - that appears to show that the window size has nothing to do with it.

At this point, I can only suggest watching 'top' while the benchmark is running, to see if the cpu is a bottleneck when you are getting the poor results. Hopefully someone more knowledgeable will chime in with a better idea.

Revision history for this message
Simon Déziel (sdeziel) wrote :

@Michael, I haven't tried to reproduce your issue but maybe some of the bridge bottleneck comes from the netfilter hooks ?

Maybe you could try setting those sysctl keys :

net.bridge.bridge-nf-call-iptables=0
net.bridge.bridge-nf-call-ip6tables=0
net.bridge.bridge-nf-call-arptables=0

That is assuming you don't need to firewall inside of a bridge.

penalvch (penalvch)
tags: added: needs-upstream-testing
Revision history for this message
Stéphane Graber (stgraber) wrote :

Marking it invalid against bridge-utils as the tool used to setup the bridge isn't the problem, it's most likely either a kernel bug or a configuration problem.
I personally reproduce similar bottlenecks in the past and Simon's suggestion usually helps quite a lot (if CPU bound).

Changed in bridge-utils (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
penalvch (penalvch) wrote :

Mike Imelfort, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested and remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the tag: 'kernel-unable-to-test-upstream'.

Please let us know your results. Thanks in advance.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.