trouble with guest network connectivity when host is using a bonded interface
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
High
|
Unassigned | ||
qemu-kvm (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
i'm seeing poor/intermitte
in a nutshell, the network configuration is as follows: the physical interfaces [eth0 and eth1] are bonded together as bond0 [i've tried various bond modes - see below]. a bridge interface [br0] is configured with bond0 attached to it. all guests use br0 as their "forward" interface. my tests have generally included a single host, with two guests running on it. both guests are running ubuntu 12.10.
it depends slightly on the particulars of the configuration, but the most prevalent symptom is that a newly booted guest will at first respond to pings [with little to no loss], and the guest will be able to ping other hosts on the network, but as time passes, more and more packets are dropped. eventually, virtually all ping requests go unanswered. in some cases, it appears that restarting networking on the guest will fix this, partially and temporarily. the guest will begin to reply 4-5 packets after restarting networking, but does not respond consistently, eventually failing again as before. i've also noticed that in some cases where ping against the guest has not yet begun to fail, if i ping something else on the network from the guest, this causes the pings against the guest to abruptly fail.
i know this is all quite abstract - i've spent quite a bit a time trying to isolate various variables, and while i've made some progress, i think some guidance would be helpful.
what i have noticed specifically is if i attach a physical device [e.g. eth0 or eth1] to the bridge [instead of bond0], things seem to work ok. also, if i use active-backup as the bonding mode, things seem to work ok. i was initially using balance-alb as the bonding mode, and have also tested balance-rr as the bonding mode. both exhibit the above symptoms. i've also tried various network card models for the guests [realtek, e1000, and virtio]. this has not had any impact on the symptoms. lastly, the two guests have been able to ping each other, with no issues, regardless of the various network settings. at the moment, i have switched back to active-backup, so this is reflected in the below information.
here is a bit of configuration info:
host os/package info:
>lsb_release -rd
Description: Ubuntu 12.10
Release: 12.10
>apt-cache policy qemu-kvm
qemu-kvm:
Installed: 1.2.0+noroms-
Candidate: 1.2.0+noroms-
Version table:
*** 1.2.0+noroms-
500 http://
100 /var/lib/
1.
500 http://
1.
500 http://
>dpkg -l | grep -i virt
ii libvirt-bin 0.9.13-0ubuntu12.2 amd64 programs for the libvirt library
ii libvirt0 0.9.13-0ubuntu12.2 amd64 library for interfacing with different virtualization systems
ii python-libvirt 0.9.13-0ubuntu12.2 amd64 libvirt Python bindings
ii qemu-kvm 1.2.0+noroms-
ii virtinst 0.600.2-1ubuntu1 all Programs to create and clone virtual machines
>dpkg -l | grep -i qemu
ii qemu-common 1.2.0+noroms-
ii qemu-kvm 1.2.0+noroms-
ii qemu-utils 1.2.0+noroms-
ii vgabios 0.7a-3ubuntu2 all VGA BIOS software for the Bochs and Qemu emulated VGA card
host network config:
>egrep -v '(^[[:space:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
bond-master bond0
auto eth1
iface eth1 inet manual
bond-master bond0
auto bond0
iface bond0 inet manual
bond-mode active-backup
bond-slaves eth0 eth1
bond-primary eth0
bond-primary_
auto br0
iface br0 inet static
bridge_ports bond0
bridge_stp off
bridge_waitport 0
bridge_maxwait 0
bridge_maxage 0
bridge_fd 0
bridge_ageing 0
address 192.168.1.60
netmask 255.255.255.0
gateway 192.168.1.1
>brctl show
bridge name bridge id STP enabled interfaces
br0 8000.0019b9ec43f3 no bond0
vnet0
>ifconfig
bond0 Link encap:Ethernet HWaddr 00:19:b9:ec:43:f3
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:2527 errors:0 dropped:146 overruns:0 frame:0
TX packets:2129 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:765773 (765.7 KB) TX bytes:1071088 (1.0 MB)
br0 Link encap:Ethernet HWaddr 00:19:b9:ec:43:f3
inet addr:192.168.1.60 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::219:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2189 errors:0 dropped:0 overruns:0 frame:0
TX packets:1643 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:669115 (669.1 KB) TX bytes:1026954 (1.0 MB)
eth0 Link encap:Ethernet HWaddr 00:19:b9:ec:43:f3
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:146 errors:0 dropped:146 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:10568 (10.5 KB) TX bytes:0 (0.0 B)
eth1 Link encap:Ethernet HWaddr 00:19:b9:ec:43:f3
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2381 errors:0 dropped:0 overruns:0 frame:0
TX packets:2129 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:755205 (755.2 KB) TX bytes:1071088 (1.0 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:3704 errors:0 dropped:0 overruns:0 frame:0
TX packets:3704 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1256952 (1.2 MB) TX bytes:1256952 (1.2 MB)
vnet0 Link encap:Ethernet HWaddr fe:54:00:f3:b2:32
inet6 addr: fe80::fc54:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:133 errors:0 dropped:0 overruns:0 frame:0
TX packets:723 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:12368 (12.3 KB) TX bytes:546331 (546.3 KB)
>virsh net-list --all
Name State Autostart
-------
host-bridge active no
direct-macvtap inactive no
>virsh net-info host-bridge
Name host-bridge
UUID ecb96001-
Active: yes
Persistent: yes
Autostart: no
Bridge: br0
>virsh net-dumpxml host-bridge
<network>
<name>
<uuid>
<forward mode='bridge'/>
<bridge name='br0' />
</network>
guest config:
>virsh list --all
Id Name State
-------
1 aurora running
- ecto shut off
- proto shut off
>virsh dominfo aurora
Id: 1
Name: aurora
UUID: 542c39da-
OS Type: hvm
State: running
CPU(s): 1
CPU time: 26.8s
Max memory: 1048576 KiB
Used memory: 1048576 KiB
Persistent: yes
Autostart: disable
Managed save: no
Security model: apparmor
Security DOI: 0
Security label: libvirt-
>virsh dumpxml aurora --security-info
<domain type='kvm' id='1'>
<name>
<uuid>
<description>
<memory unit='KiB'
<currentMemory unit='KiB'
<vcpu placement=
<os>
<type arch='x86_64' machine=
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu mode='custom' match='exact'>
<model fallback=
<vendor>
<feature policy='require' name='pbe'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='est'/>
<feature policy='require' name='ds'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='dca'/>
<feature policy='require' name='lahf_lm'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='cx16'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='acpi'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/
<target dev='sda' bus='sata'/>
<alias name='sata0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='sdb' bus='sata'/>
<readonly/>
<alias name='sata0-0-1'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<controller type='sata' index='0'>
<alias name='sata0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</controller>
<controller type='usb' index='0'>
<alias name='usb0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<interface type='network'>
<mac address=
<source network=
<target dev='vnet0'/>
<model type='virtio'/>
<driver name='vhost' txmode='iothread'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>
<imagelabel
</seclabel>
</domain>
guest os networking config:
>hostname
aurora
>ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:f3:b2:32
inet addr:192.168.1.70 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::5054:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2397 errors:0 dropped:8 overruns:0 frame:0
TX packets:544 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1576649 (1.5 MB) TX bytes:54356 (54.3 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
thanks
-ben
---
Architecture: amd64
DistroRelease: Ubuntu 12.10
MarkForUpload: True
Package: qemu-kvm 1.2.0+noroms-
PackageArchitec
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
Uname: Linux 3.5.0-25-generic x86_64
UserGroups:
---
Architecture: amd64
DistroRelease: Ubuntu 12.10
MarkForUpload: True
Package: qemu-kvm 1.2.0+noroms-
PackageArchitec
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
Uname: Linux 3.5.0-25-generic x86_64
UserGroups:
some more information - while running a ping from another physical host, against a guest, i did a bit of testing with tshark:
192.168.1.123 - other physical host on network
192.168.1.60 - virtual host
192.168.1.70 - virtual guest
on the virtual host, the current active slave is eth0, so i started there:
>cat /proc/net/ bonding/ bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:19:b9:ec:43:f1
Slave queue ID: 0
tshark appears to indicate that the ping requests are reaching the physical interface on the virtual host: ==icmp- echo'
>tshark -i eth0 'icmp[icmptype]
Capturing on eth0
0.000000 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=540/7170, ttl=64
1.000273 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=541/7426, ttl=64
2.001328 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=542/7682, ttl=64
3.002381 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=543/7938, ttl=64
^C4 packets captured
next, tshark appears to indicate that the ping requests are reaching the bond interface: ==icmp- echo'
>tshark -i bond0 'icmp[icmptype]
Capturing on bond0
0.000000 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=796/7171, ttl=64
1.001077 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=797/7427, ttl=64
2.001996 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=798/7683, ttl=64
3.002751 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=799/7939, ttl=64
^C4 packets captured
continuing on, tshark appears to indicate that the ping requests are reaching the bridge interface: ==icmp- echo'
>tshark -i br0 'icmp[icmptype]
Capturing on br0
0.000000 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=665/39170, ttl=64
1.001045 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=666/39426, ttl=64
2.001173 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=667/39682, ttl=64
3.002232 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=668/39938, ttl=64
4.003298 192.168.1.123 -> 192.168.1.70 ICMP 98 Echo (ping) request id=0xa494, seq=669/40194, ttl=64
^C5 packets captured
while doing each of these captures, i was running a matching capture on the guest, and did not see any of these packets. while i'm not quite sure what [if any] the implication is, it would seem that somehow, the packets are getting lost on their way to the guest, after they reach the bridge interface.