cloud-init attempts to rename bonds

Bug #1669860 reported by Ryan Harper
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned
Yakkety
Fix Released
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
When booting with bonds provided in networking configuration, cloud-init
can fail as it attempts to rename the bond device to an interface.

[Test Case]
 * download ubuntu cloud image
 * mount image, enable proposed, update, upgrade cloud-init
 * run 'bond-rename-launch' as provided.
 * login to kvm guest as 'ubuntu:passw0rd'
 * sudo cloud-init init

the 'cloud-init init' above would fail before in an attempt
to rename a bond device. It will succeed now, as it will realize
that it does not have anything to do.

[Regression Potential]
Should be small. regressions would be certainly related to
bond or vlan configurations.

=== End SRU Template ===

1. Zesty amd64
2. cloud-init 0.7.9-47-gc81ea53-0ubuntu1

3. cloud-init boots with a bond network config and does not attempt to rename bond0

4. cloud-init init (net mode) fails when it attempts to rename a bond interface

Running with the following network config (2 nics)
config:
- mac_address: bc:76:4e:06:96:b3
    name: interface0
    type: physical
- mac_address: bc:76:4e:04:88:41
    name: interface1
    type: physical
- bond_interfaces:
    - interface0
    - interface1
    name: bond0
    params:
        bond_miimon: 100
        bond_mode: 802.3ad
        bond_xmit_hash_policy: layer3+4
    type: bond
- name: bond0.108
    subnets:
    - address: 65.61.151.38
        netmask: 255.255.255.252
        routes:
        - gateway: 65.61.151.37
            netmask: 0.0.0.0
            network: 0.0.0.0
        type: static
    - address: 2001:4800:78ff:1b:be76:4eff:fe06:96b3
        netmask: 'ffff:ffff:ffff:ffff::'
        routes:
        - gateway: 2001:4800:78ff:1b::1
            netmask: '::'
            network: '::'
        type: static
    type: vlan
    vlan_id: '108'
    vlan_link: bond0
- name: bond0.208
    subnets:
    - address: 10.184.225.122
        netmask: 255.255.255.252
        routes:
        - gateway: 10.184.225.121
            netmask: 255.240.0.0
            network: 10.176.0.0
        - gateway: 10.184.225.121
            netmask: 255.240.0.0
            network: 10.208.0.0
        type: static
    type: vlan
    vlan_id: '208'
    vlan_link: bond0
- address: 72.3.128.240
    type: nameserver
- address: 72.3.128.241
    type: nameserver

During cloud-init init --local; the network configuration is rendered and brought up
bond0 is a virtual interface which uses the MAC from one of the slaves.

In cloud-init init (net) mode, we check if the interfaces are named properly;
When cloud-init collects the current_rename_info, it reads the MAC address of
each device listed in /sys/class/net; this includes *virtual* devices, like bonds/bridges
Then it looks up an interface name by MAC, however the bond and one of the interfaces
have the same value which results in cloud-init attempting to rename bond0

The solution is to not collect MACs of virtual interfaces for rename-purpose since
virtual devices do not ever get renamed; their name is defined by the config.

diff --git a/cloudinit/net/__init__.py b/cloudinit/net/__init__.py
index ea649cc..e2a50ad 100755
--- a/cloudinit/net/__init__.py
+++ b/cloudinit/net/__init__.py
@@ -14,6 +14,7 @@ from cloudinit import util

 LOG = logging.getLogger(__name__)
 SYS_CLASS_NET = "/sys/class/net/"
+SYS_DEV_VIRT_NET = "/sys/devices/virtual/net/"
 DEFAULT_PRIMARY_INTERFACE = 'eth0'

@@ -205,7 +206,11 @@ def _get_current_rename_info(check_downable=True):
     """Collect information necessary for rename_interfaces."""
     names = get_devicelist()
     bymac = {}
+ virtual = os.listdir(SYS_DEV_VIRT_NET)
     for n in names:
+ # do not attempt to rename virtual interfaces
+ if n in virtual:
+ continue
         bymac[get_interface_mac(n)] = {
             'name': n, 'up': is_up(n), 'downable': None}

Log file of a failure:
http://paste.ubuntu.com/24084999/

Related bugs:
 * bug 1682871: cloud-init attempts to rename vlans / get_interfaces_by_mac does not filter vlans

Related branches

Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

The suggested fix doesn't work, as cloud-init may be expected to rename devices that are virtual. The example is insid an lxc container.

# ls /sys/devices/virtual/net/
eth0 lo

Revision history for this message
Ryan Harper (raharper) wrote :

Then we need to detect duplicate macs and somehow sort out which one to ignore.

Revision history for this message
Scott Moser (smoser) wrote :

I expected to be able to recreate this in a lxc container like below, but that didnt show any errors at all in /var/log/cloud-init.log.

#!/bin/sh
name=$1
[ -n "$name" ] || { echo "must give name"; exit 1; }
set -ex
lxc init ubuntu-daily:zesty $name
lxc network attach lxdbr0 $name eth1
# pastebinit `which lxc-chroot`
# http://paste.ubuntu.com/24198752/
lxc-chroot "$name" sh -c 'cat > /var/lib/cloud/seed/nocloud-net/network-config'
 <<EOF
version: 1
config:
  - type: physical
    name: eth0
  - type: physical
    name: eth1
  - type: bond
    name: bond0
    bond_interfaces: [eth0, eth1]
    params:
      bond-mode: active-backup
EOF
lxc start "$name"

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1669860] Re: cloud-init attempts to rename bonds

On Fri, Mar 17, 2017 at 8:59 PM, Scott Moser <email address hidden> wrote:

> I expected to be able to recreate this in a lxc container like below,
> but that didnt show any errors at all in /var/log/cloud-init.log.
>

Try using non-knames, like interface0 and interface1.

>
> #!/bin/sh
> name=$1
> [ -n "$name" ] || { echo "must give name"; exit 1; }
> set -ex
> lxc init ubuntu-daily:zesty $name
> lxc network attach lxdbr0 $name eth1
> # pastebinit `which lxc-chroot`
> # http://paste.ubuntu.com/24198752/
> lxc-chroot "$name" sh -c 'cat > /var/lib/cloud/seed/nocloud-
> net/network-config'
> <<EOF
> version: 1
> config:
> - type: physical
> name: eth0
> - type: physical
> name: eth1
> - type: bond
> name: bond0
> bond_interfaces: [eth0, eth1]
> params:
> bond-mode: active-backup
> EOF
> lxc start "$name"
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1669860
>
> Title:
> cloud-init attempts to rename bonds
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1669860/+subscriptions
>

Revision history for this message
Scott Moser (smoser) wrote :

this attached script creates a container with 2 nics, and the re-writes the network config to either
have renamed devices (iface0, iface1) or bonded nic on iface0 and iface1.

the rename works correctly:
  $ lp1669860-lxc zrename rename

but the bond isn't coming up currently and no errors in the log either. I expected to have gotten errors recreating the original issue.

Still, this is useful from a perspective of being able to contaienr with different network configs.

Revision history for this message
Ryan Harper (raharper) wrote :

Getting this to trip is somewhat tricky and racy due to how fast/slow the bond is able to come up. However, if the bond is up before cloud-init.service runs it's net.apply_network_config_names() then we see the following:

>>> r = net._get_current_rename_info(check_downable=True)
>>> r
{False: {'up': False, 'name': 'bonding_masters', 'downable': True}, '00:00:00:00:00:00': {'up': True, 'name': 'lo', 'downable': False}, '52:54:00:12:34:02': {'up': True, 'name': 'interface1', 'downable': True}, '52:54:00:12:34:00': {'up': False, 'name': 'bond0', 'downable': True}}
>>> r.keys()
dict_keys([False, '00:00:00:00:00:00', '52:54:00:12:34:02', '52:54:00:12:34:00'])
>>> r['52:54:00:12:34:00']
{'up': False, 'name': 'bond0', 'downable': True}
>>> r['52:54:00:12:34:02']
{'up': True, 'name': 'interface1', 'downable': True}

Here we can see that by checking /sys/class/net/* for interfaces and mapping a mac address to an interface picks up bond0 for 'interface0's mac.

Then if we attempt to apply the names, we see the error:

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> ncfg = yaml.load(open('/root/network-config'))
>>> netcfg = ncfg.get('network')
>>> netcfg
>>> ncfg.keys()
dict_keys(['version', 'config'])
>>> from cloudinit import net
>>> net.apply_network_config_names(ncfg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 201, in apply_network_config_names
    return _rename_interfaces(renames)
  File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 336, in _rename_interfaces
    raise Exception('\n'.join(errors))
Exception: [unknown] Error performing rename('bond0', 'interface0') for 52:54:00:12:34:00, interface0: Unexpected error while running command.
Command: ['ip', 'link', 'set', 'bond0', 'name', 'interface0']
Exit code: 2
Reason: -
Stdout: -
Stderr: RTNETLINK answers: File exists

Revision history for this message
Ryan Harper (raharper) wrote :

Download a zesty cloud image and then create a qcow2 overlay:

qemu-img create -b zesty-server-cloudimg-amd64.img -f qcow2 bond-rename.qcow2

Then invoke the script like:

./bond-rename-launch.sh <lp userid>

Revision history for this message
Ryan Harper (raharper) wrote :

Fixed up network config for bond testing.

Scott Moser (smoser)
Changed in cloud-init:
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-89-gbf7723e8-0ubuntu1

---------------
cloud-init (0.7.9-89-gbf7723e8-0ubuntu1) zesty; urgency=medium

  * New upstream snapshot.
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.

 -- Scott Moser <email address hidden> Fri, 31 Mar 2017 17:02:28 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser)
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Scott Moser (smoser)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Ryan, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-90-g61eb03fe-0ubuntu1~16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-90-g61eb03fe-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

$ burl="http://cloud-images.ubuntu.com/daily/server"
$ for r in yakkety xenial; do
   fname=$r-server-cloudimg-amd64.img
   ofname="$fname"
   [ "$r" = "xenial" ] && ofname=$r-server-cloudimg-amd64-disk1.img
   pfname="${fname%.img}-proposed.img"
   if [ ! -f "$fname" ]; then
      proxy wget "$burl/$r/current/$ofname" -O "$fname.tmp" &&
          mv "$fname.tmp" "$fname" || break
   fi
   if [ ! -f "$pfname" ]; then
       qemu-img create -f qcow2 -b "$fname" "$pfname.tmp" || break
       sudo mount-image-callback --system-resolvconf "$pfname.tmp" -- \
           mchroot sh -ec '
               r=$(lsb_release -sc)
               m="http://archive.ubuntu.com/ubuntu"
               plist="/etc/apt/sources.list.d/proposed.list"
               echo "deb $m $r-proposed main" > "$plist"
               apt-get update -q
               DEBIAN_FRONTEND=noninteractive apt-get -qy install cloud-init
           ' </dev/null || break
       mv $pfname.tmp $pfname
   fi
done

$ for img in *-proposed.img; do
  echo $img
  sudo mount-image-callback "$img" -- mchroot dpkg-query --show cloud-init;
  done
xenial-server-cloudimg-amd64-proposed.img
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1
yakkety-server-cloudimg-amd64-proposed.img
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1

## xenial
$ MODE=bond ./bond-rename-launch.sh xenial-server-cloudimg-amd64-proposed.img
... login as ubuntu:passw0rd ....
% dpkg-query --show cloud-init
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1

% python3 -c 'from cloudinit.net import get_interfaces_by_mac; print(get_interfaces_by_mac())'
{'00:00:00:00:00:00': 'lo', '52:54:00:12:34:02': 'interface1', '52:54:00:12:34:00': 'interface0'}

% sudo cloud-init init
...
no stack trace

## yakkety
$ MODE=bond ./bond-rename-launch.sh xenial-server-cloudimg-amd64-proposed.img
... login as ubuntu:passw0rd ....
% dpkg-query --show cloud-init
cloud-init 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1
% cat /etc/cloud/build.info
build_name: server
serial: 20170413

% python3 -c 'from cloudinit.net import get_interfaces_by_mac; print(get_interfaces_by_mac())'
{'52:54:00:12:34:02': 'interface1', '00:00:00:00:00:00': 'lo', '52:54:00:12:34:00': 'interface0'}

% sudo cloud-init init
...
no stack trace

Revision history for this message
Scott Moser (smoser) wrote :
tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed
Scott Moser (smoser)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-90-g61eb03fe-0ubuntu1~16.10.1

---------------
cloud-init (0.7.9-90-g61eb03fe-0ubuntu1~16.10.1) yakkety; urgency=medium

  * debian/cloud-init.templates: add Bigstep to list of sources. (LP: #1676460)
  * New upstream snapshot.
    - OpenStack: add 'dvs' to the list of physical link types. (LP: #1674946)
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.
    - net: in netplan renderer delete known image-builtin content.
      (LP: #1675576)
    - doc: correct grammar in capabilities.rst [David Tagatac]
    - ds-identify: fix detecting of maas datasource. (LP: #1677710)
    - netplan: remove debugging prints, add debug logging [Ryan Harper]
    - ds-identify: do not write None twice to datasource_list.
    - support resizing partition and rootfs on system booted without
      initramfs. [Steve Langasek] (LP: #1677376)
    - apt_configure: run only when needed. (LP: #1675185)
    - OpenStack: identify OpenStack by product 'OpenStack Compute'.
      (LP: #1675349)
    - GCE: Search GCE in ds-identify, consider serial number in check.
      (LP: #1674861)
    - Add support for setting hashed passwords [Tore S. Lonoy] (LP: #1570325)
    - Fix filesystem creation when using "partition: auto"
      [Jonathan Ballet] (LP: #1634678)
    - ConfigDrive: support reading config drive data from /config-drive.
      (LP: #1673411)
    - ds-identify: fix detection of Bigstep datasource. (LP: #1674766)
    - test: add running of pylint [Joshua Powers]
    - ds-identify: fix bug where filename expansion was left on.
    - advertise network config v2 support (NETWORK_CONFIG_V2) in features.
    - Bigstep: fix bug when executing in python3. [root]
    - Fix unit test when running in a system deployed with cloud-init.
    - Bounce network interface for Azure when using the built-in path.
      [Brent Baude] (LP: #1674685)
    - cloudinit.net: add network config v2 parsing and rendering [Ryan Harper]
    - net: Fix incorrect call to isfile [Joshua Powers] (LP: #1674317)
    - net: add renderers for automatically selecting the renderer.
    - doc: fix config drive doc with regard to unpartitioned disks.
      (LP: #1673818)
    - test: Adding integratiron test for password as list [Joshua Powers]
    - render_network_state: switch arguments around, do not require target
    - support 'loopback' as a device type.
    - Integration Testing: improve testcase subclassing [Wesley Wiedenmeier]
    - gitignore: adding doc/rtd_html [Joshua Powers]
    - doc: add instructions for running integration tests via tox.
      [Joshua Powers]
    - test: avoid differences in 'date' output due to daylight savings.
    - Fix chef config module in omnibus install. [Jeremy Melvin] (LP: #1583837)
    - Add feature flags to cloudinit.version. [Wesley Wiedenmeier]
    - tox: add a citest environment
    - Support chpasswd/list being a list in addition to a string.
      [Sergio Lystopad] (LP: #1665694)
    - doc: Fix configuration example for cc_set_passwords module.
      [Sergio Lystopad] (LP: #1665773)
    - ...

Read more...

Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package cloud-init - 0.7.9-90-g61eb03fe-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-90-g61eb03fe-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/cloud-init.templates: add Bigstep to list of sources. (LP: #1676460)
  * New upstream snapshot.
    - OpenStack: add 'dvs' to the list of physical link types. (LP: #1674946)
    - Fix bug that resulted in an attempt to rename bonds or vlans.
      (LP: #1669860)
    - tests: update OpenNebula and Digital Ocean to not rely on host
      interfaces.
    - net: in netplan renderer delete known image-builtin content.
      (LP: #1675576)
    - doc: correct grammar in capabilities.rst [David Tagatac]
    - ds-identify: fix detecting of maas datasource. (LP: #1677710)
    - netplan: remove debugging prints, add debug logging [Ryan Harper]
    - ds-identify: do not write None twice to datasource_list.
    - support resizing partition and rootfs on system booted without
      initramfs. [Steve Langasek] (LP: #1677376)
    - apt_configure: run only when needed. (LP: #1675185)
    - OpenStack: identify OpenStack by product 'OpenStack Compute'.
      (LP: #1675349)
    - GCE: Search GCE in ds-identify, consider serial number in check.
      (LP: #1674861)
    - Add support for setting hashed passwords [Tore S. Lonoy] (LP: #1570325)
    - Fix filesystem creation when using "partition: auto"
      [Jonathan Ballet] (LP: #1634678)
    - ConfigDrive: support reading config drive data from /config-drive.
      (LP: #1673411)
    - ds-identify: fix detection of Bigstep datasource. (LP: #1674766)
    - test: add running of pylint [Joshua Powers]
    - ds-identify: fix bug where filename expansion was left on.
    - advertise network config v2 support (NETWORK_CONFIG_V2) in features.
    - Bigstep: fix bug when executing in python3. [root]
    - Fix unit test when running in a system deployed with cloud-init.
    - Bounce network interface for Azure when using the built-in path.
      [Brent Baude] (LP: #1674685)
    - cloudinit.net: add network config v2 parsing and rendering [Ryan Harper]
    - net: Fix incorrect call to isfile [Joshua Powers] (LP: #1674317)
    - net: add renderers for automatically selecting the renderer.
    - doc: fix config drive doc with regard to unpartitioned disks.
      (LP: #1673818)
    - test: Adding integratiron test for password as list [Joshua Powers]
    - render_network_state: switch arguments around, do not require target
    - support 'loopback' as a device type.
    - Integration Testing: improve testcase subclassing [Wesley Wiedenmeier]
    - gitignore: adding doc/rtd_html [Joshua Powers]
    - doc: add instructions for running integration tests via tox.
      [Joshua Powers]
    - test: avoid differences in 'date' output due to daylight savings.
    - Fix chef config module in omnibus install. [Jeremy Melvin] (LP: #1583837)
    - Add feature flags to cloudinit.version. [Wesley Wiedenmeier]
    - tox: add a citest environment
    - Support chpasswd/list being a list in addition to a string.
      [Sergio Lystopad] (LP: #1665694)
    - doc: Fix configuration example for cc_set_passwords module.
      [Sergio Lystopad] (LP: #1665773...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Cloud-init 17.1

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.