OVS agent is not removing VLAN tags before tunnels when configured with native OF interface
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
IWAMOTO Toshihiro |
Bug Description
In investigating an MTU issue, an accounted-for overhead of 4 bytes was discovered. A spurious 802.1q header was discovered using tcpdump when attempting to connect to a guest via floating IP. The tenant network type is VXLAN and the VXLAN endpoints themselves are on a VLAN. This issue effectively breaks communication with guests via floating ip for some system configurations.
The test system is configured with a default global_physnet_mtu of 1500 and inspection of the router namespace confirms that the tenant network's router interface has been automatically configured to with an MTU of 1450. Ping was used to test. e.g. ping -M do -s 1422 192.0.2.58 (1422 is the maximum that should fit in the 1450 MTU without fragmentation).
With the system configured as described, "ping -s 1420 <floating ip>" fails.
tcpdump on the controller reveals:
root@overcloud-
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
18:32:49.163223 P 52:54:00:01:09:3c (oui Unknown) ethertype IPv4 (0x0800), length 1464: (tos 0x0, ttl 64, id 37535, offset 0, flags [DF], proto ICMP (1), length 1448)
192.0.2.1 > 192.0.2.58: ICMP echo request, id 16083, seq 1, length 1428
18:32:49.163340 In 00:00:00:00:00:00 (oui Ethernet) ethertype IPv4 (0x0800), length 592: (tos 0xc0, ttl 64, id 4395, offset 0, flags [none], proto ICMP (1), length 576)
overcloud-
(tos 0x0, ttl 64, id 22077, offset 0, flags [DF], proto UDP (17), length 1502)
overcloud-
Adjusting the ping size to allow for a 4 byte header (e.g. ping -s 1418 <floating ip>) succeeds.
Using an alternate tcpdump command to get information from the VXLAN traffic, reveals unusual extra 802.1q header with a vlan ID of 0:
[root@overcloud
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
18:36:48.095985 Out 56:13:19:d8:af:27 ethertype IPv4 (0x0800), length 1516: (tos 0x0, ttl 64, id 22088, offset 0, flags [DF], proto UDP (17), length 1500)
172.
fa:16:3e:99:37:ce > fa:16:3e:06:65:6f, ethertype 802.1Q (0x8100), length 1464: vlan 0, p 0, ethertype IPv4, (tos 0x0, ttl 63, id 37541, offset 0, flags [DF], proto ICMP (1), length 1446)
192.0.2.1 > 192.168.2.101: ICMP echo request, id 16422, seq 1, length 1426
18:36:48.097861 P ea:0c:37:f7:69:5e ethertype 802.1Q (0x8100), length 1520: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 22354, offset 0, flags [DF], proto UDP (17), length 1500)
172.
The flow table is similar to (this was taken from the compute node, not the controller but the br-tun flow tables follow the same form with only different values for local segment IDs)
[root@overcloud
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
On a hunch, the same trials were performed with the openvswitch agents on the controller and compute nodes configured to use the ovs-ofctl OF interface. ping -s 1422 192.0.2.58 as well as ssh to the guests and copies of large amount of data are now possible. The same tcpdump command shows that the extra 802.1q information is not present:
#with ofctl instead of native
[root@overcloud
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
19:10:31.570425 Out 56:13:19:d8:af:27 ethertype IPv4 (0x0800), length 1512: (tos 0x0, ttl 64, id 22104, offset 0, flags [DF], proto UDP (17), length 1496)
172.
fa:16:3e:99:37:ce > fa:16:3e:06:65:6f, ethertype IPv4 (0x0800), length 1460: (tos 0x0, ttl 63, id 37549, offset 0, flags [DF], proto ICMP (1), length 1446)
192.0.2.1 > 192.168.2.101: ICMP echo request, id 19062, seq 1, length 1426
19:10:31.572143 P ea:0c:37:f7:69:5e ethertype 802.1Q (0x8100), length 1520: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 22370, offset 0, flags [DF], proto UDP (17), length 1500)
172.
The flow table is also different, using strip_vlan instead of pop_vlan (as well as other obvious differences)
[root@overcloud
NXST_FLOW reply (xid=0x4):
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
cookie=
System details follow:
System info: CentOS Linux release 7.2.1511 (Core)
Kernel version: 3.10.0-
System is a tripleo deployment using a network isolation type network environment (see docs for details)
Deployment command line:
openstack overcloud deploy --templates ./tripleo-
-e ~/tripleo-
-e ~/tripleo-
-e ~/for_net_
All templates "stock" except for last, contains:
parameter_defaults:
EC2MetadataIp: 192.0.2.1
ControlPlaneD
OpenStack packages
openvswitch.x86_64 2.5.0-2.el7 @delorean-
openstack-
[root@overcloud
ovs-vsctl (Open vSwitch) 2.5.0
Compiled Mar 18 2016 15:00:11
DB Schema 7.12.1
[root@overcloud
ovs-ofctl (Open vSwitch) 2.5.0
Compiled Mar 18 2016 15:00:11
OpenFlow versions 0x1:0x4
python-
python2-ryu.noarch 4.3-2.el7 @delorean-
tags: | added: ovs |
Changed in neutron: | |
importance: | Undecided → High |
milestone: | none → newton-rc1 |
Fix proposed to branch: master /review. openstack. org/368553
Review: https:/