Geneve tunnels don't work when ipv6 is disabled

Bug #1794232 reported by Nivedita Singhvi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Nivedita Singhvi
Xenial
Fix Released
High
Nivedita Singhvi
Bionic
Fix Released
High
Nivedita Singhvi
Cosmic
Fix Released
High
Nivedita Singhvi
Disco
Fix Released
High
Nivedita Singhvi

Bug Description

SRU Justification

Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

Fix:
Fixed by upstream commit in v5.0:
Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
"geneve: correctly handle ipv6.disable module parameter"

Hence available in Disco and later; required in X,B,C.

Testcase:
1. Boot with "ipv6.disable=1"
2. Then try and create a geneve tunnel using:
   # ovs-vsctl add-br br1
   # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
    type=geneve options:remote_ip=192.168.x.z // ip of the other host

Regression Potential: Low, only geneve tunnels when ipv6 dynamically
disabled, current status is it doesn't work at all.

Other Info:
* Mainline commit msg includes reference to a fix for
  non-metadata tunnels (infrastructure is not yet in
  our tree prior to Disco), hence not being included
  at this time under this case.

  At this time, all geneve tunnels created as above
  are metadata-enabled.

---
[Impact]

When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
an OS environment with open vswitch, where ipv6 has been disabled,
the create fails with the error :

“ovs-vsctl: Error detected while setting up 'geneve0': could not
add network device geneve0 to ofproto (Address family not supported
by protocol)."

[Fix]
There is an upstream commit for this in v5.0 mainline (and in Disco and later Ubuntu kernels).

"geneve: correctly handle ipv6.disable module parameter"
Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

This fix is needed on all our series prior to Disco
and the v5.0 kernel: X, C, B. It is identical to the
fix we implemented and tested internally with, but had
not pushed upstream yet.

[Test Case]
(Best to do this on a kvm guest VM so as not to interfere with
 your system's networking)

1. On any Ubuntu Xenial kernel, disable ipv6. This example
   is shown with the 4.15.0-23-generic kernel (which differs
   slightly from 4.4.x in symptoms):

- Edit /etc/default/grub to add the line:
        GRUB_CMDLINE_LINUX="ipv6.disable=1"
- # update-grub
- Reboot

2. Install OVS
# apt install openvswitch-switch

3. Create a Geneve tunnel
# ovs-vsctl add-br br1
# ovs-vsctl add-port br1 geneve1 -- set interface geneve1
type=geneve options:remote_ip=192.168.x.z

(where remote_ip is the IP of the other host)

You will see the following error message:

"ovs-vsctl: Error detected while setting up 'geneve1'.
See ovs-vswitchd log for details."

From /var/log/openvswitch/ovs-vswitchd.log you will see:

"2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
failed to add geneve1 as port: Address family not supported
by protocol"

You will notice from the "ifconfig" output that the device
genev_sys_6081 is not created.

If you do not disable IPv6 (remove ipv6.disable=1 from
/etc/default/grub + update-grub + reboot), the same
'ovs-vsctl add-port' command completes successfully.
You can see that it is working properly by adding an
IP to the br1 and pinging each host.

On kernel 4.4 (4.4.0-128-generic), the error message doesn't
happen using the 'ovs-vsctl add-port' command, no warning is
shown in ovs-vswitchd.log, but the device genev_sys_6081 is
also not created and ping test won't work.

With the fixed test kernel, the interfaces and tunnel
is created successfully.

[Regression Potential]
* Low -- affects the geneve driver only, and when ipv6 is
  disabled, and since it doesn't work in that case at all,
  this fix gets the tunnel up and running for the common case.

[Other Info]

* Analysis

Geneve tunnels should work with either IPv4 or IPv6 environments
as a design and support principle.

Currently, however, what's in the implementation requires support
for ipv6 for metadata-based tunnels which geneve is:

rather than:

a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
b) ipv4 + metadata + ipv6

What enforces this in the current 4.4.0-x code when opening a Geneve
tunnel is the following in geneve_open() :

        bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
        bool metadata = geneve->collect_md;
        ...

#if IS_ENABLED(CONFIG_IPV6)
        geneve->sock6 = NULL;
        if (ipv6 || metadata)
                ret = geneve_sock_add(geneve, true);
#endif
        if (!ret && (!ipv6 || metadata))
                ret = geneve_sock_add(geneve, false);

CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
even though ipv6 is false, metadata is always true
for a geneve open as it is set unconditionally in
ovs:

In /lib/dpif_netlink_rtnl.c :

case OVS_VPORT_TYPE_GENEVE:
nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);

The second argument of geneve_sock_add is a boolean
value indicating whether it's an ipv6 address family
socket or not, and we thus incorrectly pass a true
value rather than false.

The current "|| metadata" check is unnecessary and incorrectly
sends the tunnel creation code down the ipv6 path, which
fails subsequently when the code expects an ipv6 family socket.

* This issue exists in all versions of the kernel upto present
   mainline and net-next trees.

* Testing with a trivial patch to remove that and make
  similar changes to those made for vxlan (which had the
  same issue) has been successful. Patches for various
  versions to be attached here soon.

* Example Versions (bug exists in all versions of Ubuntu
  and mainline):

$ uname -r
4.4.0-135-generic

$ lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04

$ dpkg -l | grep openvswitch-switch
ii openvswitch-switch 2.5.4-0ubuntu0.16.04.1

tags: added: geneve kernel-bug
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1794232

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Logs not necessary at this time, will attach patches and other
information as needed.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: kernel-da-key
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

We had tested a patch discussed above and tested internally,
with success - although we have limited testing (opening up
a geneve tunnel between 2 kvm guests).

Jiri has now pushed an identical patch upstream which is
available in the v5.0 kernel and later.

"geneve: correctly handle ipv6.disable module parameter"
Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

Although I do not have testing validation from original
poster, since it has been committed upstream, I'm going
to go ahead and get the SRU request started.

Changed in linux (Ubuntu):
status: Triaged → In Progress
importance: Medium → High
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Changed in linux (Ubuntu Disco):
assignee: nobody → Nivedita Singhvi (niveditasinghvi)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Nivedita Singhvi (niveditasinghvi)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Nivedita Singhvi (niveditasinghvi)
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
description: updated
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Nivedita Singhvi (niveditasinghvi)
description: updated
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Released
description: updated
description: updated
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :

I tested a fully up to date cosmic VM using the reproducer steps in the description, and found that I could not create a geneve tunnel when ipv6 is disabled.

I compiled a new cosmic kernel off the master-next branch with this commit included:
"geneve: correctly handle ipv6.disable module parameter"
Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

The commit was a clean cherry-pick, and when the patched kernel was installed, I was able to successfully create a geneve tunnel when ipv6 is disabled.

I also tested the latest disco daily build, and found that disco is not effected, as I can successfully create a geneve tunnel when ipv6 is disabled.

description: updated
description: updated
description: updated
tags: added: cosmic xenial
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Submitted SRU request for Bionic, Cosmic.

Huge thanks for the testing, Matthew!

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Resubmitted SRU for B,C for this kernel cycle.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

A 4.4 test kernel with the fix backported is available at:

https://people.canonical.com/~nivedita/geneve-xenial-test/

if anyone wishes to validate the 4.4 X solution.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Bionic, Cosmic kernels successfully tested.
I've updated the tags.

tags: added: verification-done-bionic verification-done-cosmic
removed: verification-needed-bionic verification-needed-cosmic
tags: added: sts
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.8 KiB)

This bug was fixed in the package linux - 4.15.0-51.55

---------------
linux (4.15.0-51.55) bionic; urgency=medium

  * linux: 4.15.0-51.55 -proposed tracker (LP: #1829219)

  * disable a.out support (LP: #1818552)
    - [Config] Disable a.out support

  * [UBUNTU] qdio: clear intparm during shutdown (LP: #1828394)
    - s390/qdio: clear intparm during shutdown

  * ftrace in ubuntu_kernel_selftests hang with Cosmic kernel (LP: #1826385)
    - kprobes/x86: Fix instruction patching corruption when copying more than one
      RIP-relative instruction

  * touchpad not working on lenovo yoga 530 (LP: #1787775)
    - Revert "UBUNTU: SAUCE: i2c:amd Depends on ACPI"
    - Revert "UBUNTU: SAUCE: i2c:amd move out pointer in union i2c_event_base"
    - Revert "UBUNTU: SAUCE: i2c:amd I2C Driver based on PCI Interface for
      upcoming platform"
    - i2c: add helpers to ease DMA handling
    - i2c: add a message flag for DMA safe buffers
    - i2c: add extra check to safe DMA buffer helper
    - i2c: Add drivers for the AMD PCIe MP2 I2C controller
    - [Config] Update config for AMD MP2 I2C driver
    - [Config] Update I2C_AMD_MP2 annotations

  * tm-unavailable in powerpc/tm failed on Bionic Power9 (LP: #1813129)
    - selftests/powerpc: Check for pthread errors in tm-unavailable
    - selftests/powerpc: Skip tm-unavailable if TM is not enabled

  * cp_abort in powerpc/context_switch from ubunut_kernel_selftests failed on
    Bionic P9 (LP: #1813134)
    - selftests/powerpc: Remove redundant cp_abort test

  * bionic/linux: completely remove snapdragon files from sources (LP: #1827880)
    - [Packaging] remove snapdragon dead files
    - [Config] update configs after snapdragon removal

  * The noise keeps occurring when Headset is plugged in on a Dell machine
    (LP: #1827972)
    - ALSA: hda/realtek - Fixed Dell AIO speaker noise

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * There are 4 HDMI/Displayport audio output listed in sound setting without
    attach any HDMI/DP monitor (LP: #1827967)
    - ALSA: hda/hdmi - Read the pin sense from register when repolling
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event

  * Headphone jack switch sense is inverted: plugging in headphones disables
    headphone output (LP: #1824259)
    - ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board

  * CTAUTO:DevOps:860.50:devops4fp1:Error occurred during LINUX Dmesg error
    Checking for all LINUX clients for devops4p10 (LP: #1766201)
    - SAUCE: integrity: downgrade error to warning

  * Screen freeze after resume from S3 when HDMI monitor plugged on Dell
    Precision 7740 (LP: #1825958)
    - PCI: Restore resized BAR state on resume

  * potential memory corruption on arm64 on dev release (LP: #1827437)
    - driver core: Postpone DMA tear-down until after devres release

  * powerpc/pmu/ebb test in ubuntu_kernel_selftest failed with "error while
    loading shared libraries" on Bionic/Cosmic PowerPC (LP: #1812805)
    - selftests/powerpc/pmu: Link ebb tests with -no-pie

  * unnecessary request_queue freeze (LP: #1815733)
    - block: av...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.18.0-21.22

---------------
linux (4.18.0-21.22) cosmic; urgency=medium

  * linux: 4.18.0-21.22 -proposed tracker (LP: #1829186)

  * disable a.out support (LP: #1818552)
    - [Config] Turn off a.out support

  * ftrace in ubuntu_kernel_selftests hang with Cosmic kernel (LP: #1826385)
    - kprobes/x86: Fix instruction patching corruption when copying more than one
      RIP-relative instruction

  * touchpad not working on lenovo yoga 530 (LP: #1787775)
    - Revert "UBUNTU: SAUCE: i2c:amd Depends on ACPI"
    - Revert "UBUNTU: SAUCE: i2c:amd move out pointer in union i2c_event_base"
    - Revert "UBUNTU: SAUCE: i2c:amd I2C Driver based on PCI Interface for
      upcoming platform"
    - i2c: add extra check to safe DMA buffer helper
    - i2c: Add drivers for the AMD PCIe MP2 I2C controller
    - [Config] Update config for AMD MP2 I2C driver
    - [Config] Update I2C_AMD_MP2 annotations

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * There are 4 HDMI/Displayport audio output listed in sound setting without
    attach any HDMI/DP monitor (LP: #1827967)
    - ALSA: hda/hdmi - Read the pin sense from register when repolling
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event

  * Headphone jack switch sense is inverted: plugging in headphones disables
    headphone output (LP: #1824259)
    - ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board

  * CTAUTO:DevOps:860.50:devops4fp1:Error occurred during LINUX Dmesg error
    Checking for all LINUX clients for devops4p10 (LP: #1766201)
    - SAUCE: integrity: downgrade error to warning

  * potential memory corruption on arm64 on dev release (LP: #1827437)
    - driver core: Postpone DMA tear-down until after devres release

  * powerpc/pmu/ebb test in ubuntu_kernel_selftest failed with "error while
    loading shared libraries" on Bionic/Cosmic PowerPC (LP: #1812805)
    - selftests/powerpc/pmu: Link ebb tests with -no-pie

  * unnecessary request_queue freeze (LP: #1815733)
    - block: avoid setting nr_requests to current value
    - block: avoid setting none scheduler if it's already none

  * Kprobe event string type argument failed in ftrace from
    ubuntu_kernel_selftests on B/C i386 (LP: #1825780)
    - selftests/ftrace: Fix kprobe string testcase to not probe notrace function

  * False positive test result in run_netsocktests from net in
    ubuntu_kernel_selftest (LP: #1825777)
    - selftests/net: correct the return value for run_netsocktests

 -- Stefan Bader <email address hidden> Wed, 15 May 2019 13:18:36 +0200

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

As the test kernel with the backported Xenial fix
has been up for almost 2 months now, I'm submitting
the SRU for Xenial, although I have not received
feedback from original reporter or others.

Backported patch for Xenial varies slightly from the
cherry-picked patch for B, C.

My testing has been successful (see original testing
information in description).

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Confirmed
status: Confirmed → Won't Fix
status: Won't Fix → Fix Committed
Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Verified on Xenial

tags: added: verification-done-xenial
removed: verification-needed-xenial
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.5 KiB)

This bug was fixed in the package linux - 4.4.0-157.185

---------------
linux (4.4.0-157.185) xenial; urgency=medium

  * linux: 4.4.0-157.185 -proposed tracker (LP: #1837476)

  * systemd 229-4ubuntu21.22 ADT test failure with linux 4.4.0-156.183 (storage)
    (LP: #1837235)
    - Revert "block/bio: Do not zero user pages"
    - Revert "block: Clear kernel memory before copying to user"
    - Revert "bio_copy_from_iter(): get rid of copying iov_iter"

linux (4.4.0-156.183) xenial; urgency=medium

  * linux: 4.4.0-156.183 -proposed tracker (LP: #1836880)

  * BCM43602 802.11ac Wireless regression - PCI ID 14e4:43ba (LP: #1836801)
    - brcmfmac: add eth_type_trans back for PCIe full dongle

linux (4.4.0-155.182) xenial; urgency=medium

  * linux: 4.4.0-155.182 -proposed tracker (LP: #1834918)

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * Handle overflow in proc_get_long of sysctl (LP: #1833935)
    - sysctl: handle overflow in proc_get_long

  * Xenial update: 4.4.181 upstream stable release (LP: #1832661)
    - x86/speculation/mds: Revert CPU buffer clear on double fault exit
    - x86/speculation/mds: Improve CPU buffer clear documentation
    - ARM: exynos: Fix a leaked reference by adding missing of_node_put
    - crypto: vmx - fix copy-paste error in CTR mode
    - crypto: crct10dif-generic - fix use via crypto_shash_digest()
    - crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
    - ALSA: usb-audio: Fix a memory leak bug
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event
    - ALSA: hda/realtek - EAPD turn on later
    - ASoC: max98090: Fix restore of DAPM Muxes
    - ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
    - mm/mincore.c: make mincore() more conservative
    - ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
    - mfd: da9063: Fix OTP control register names to match datasheets for
      DA9063/63L
    - tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
    - ext4: actually request zeroing of inode table after grow
    - ext4: fix ext4_show_options for file systems w/o journal
    - Btrfs: do not start a transaction at iterate_extent_inodes()
    - bcache: fix a race between cache register and cacheset unregister
    - bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
    - ipmi:ssif: compare block number correctly for multi-part return messages
    - crypto: gcm - Fix error return code in crypto_gcm_create_common()
    - crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
    - crypto: chacha20poly1305 - set cra_name correctly
    - crypto: salsa20 - don't access already-freed walk.iv
    - crypto: arm/aes-neonbs - don't access already-freed walk.iv
    - writeback: synchronize sync(2) against cgroup writeback membership switches
    - fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going
      into workqueue when umount
    - ALSA: hda/realtek - Fix for Lenovo B...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.