dhclient can fail if other nics are renamed

Bug #1446767 reported by Scott Moser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
isc-dhcp (Fedora)
Fix Released
Low
isc-dhcp (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Unassigned
Vivid
Fix Released
High
Unassigned
Wily
Fix Released
High
Unassigned

Bug Description

=== Begin SRU Information ===
[Impact]
Systems that use dhcp for network config combined with network device re-naming can hit a race condition in dhclient which causes dhcp to fail. Any network device renaming could cause this, but the most likely scenario is boot with udev persistent naming in /etc/udev/rules.d/70-persistent-net.rules.

This can be a fatal error when network devices that are required for proper function.

[Test Case]
To recreate the failure:
 * boot an ubuntu system with an interface that can dhcp
 * configure /etc/network/interfaces for dhcp on that interface
   $ grep eth0 /etc/network/interfaces
   auto eth0
   iface eth0 inet dhcp
 * run attached 'nic-go-crazy' as root in one window/shell
   this will create by default 10 tuntap devices and repeatedly rename them.
 * run attached 'ifup-loop eth0'

ifup-loop will exit failure if dhclient failed to bring the network up.

With the fix provided, this will/should run indefinitely.

[Regression Potential]
Chance for regression here should be reasonably small. However, a very significant number of systems run dhclient, so any change has to be considered risky.

One thing to note, is that Fedora has carried this patch for > 3 years.

Per getifaddrs(3):
 | The getifaddrs() function first appeared in glibc 2.3, but before glibc
 | 2.3.3, the implementation supported only IPv4 addresses; IPv6 support
 | was added in glibc 2.3.3. Support of address families other than IPv4
 | is available only on kernels that support netlink.

These versions are older than any supported Ubuntu release, so that should not be a problem.
=== End SRU Information ===

given 3 nics eth0, eth1, eth2

dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0

while that in its early phases, if eth1 is renamed a race condition can cause dhclient to exit failure.

This can happen in real life when udev and persistent rules are used. Ie, in a system where eth0 is configured for 'auto' and dhcp and persistent rules cause renaming of devices during boot.

I have set up recreate of that more complex system lp:~smoser/+junk/lp1444428 , but this recreate is simpler to catch.

example, while running attached 'nic-go-crazy' on other nics, I try ifup eth1

$ sudo ifup eth1
sudo: unable to resolve host ubuntu
Internet Systems Consortium DHCP Client 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Error getting interface address for 'nic0317610'; No such device
Error getting interface information.

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug. These pages explain the proper
process and the information we find helpful for debugging..

exiting.
Failed to bring up eth1.

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: isc-dhcp-client 4.3.1-5ubuntu2
ProcVersionSignature: User Name 3.19.0-15.15-generic 3.19.3
Uname: Linux 3.19.0-15-generic x86_64
ApportVersion: 2.17.2-0ubuntu1
Architecture: amd64
Date: Tue Apr 21 16:35:10 2015
DhclientLeases:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: isc-dhcp
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Scott Moser (smoser) wrote :
Changed in isc-dhcp (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in isc-dhcp (Ubuntu Trusty):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser)
description: updated
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

The solution here was pulled from https://bugzilla.redhat.com/show_bug.cgi?id=449946
The patch really just makes linux use getifaddrs rather than reading /proc/net/dev for information.

The change has been in place in fedora for ~ 3 years, so that should stand as some testimony.

Revision history for this message
Scott Moser (smoser) wrote :

I've also put these attached recreate programs and the original kvm boot recreate at lp:~smoser/+junk/lp1444428/

tags: added: patch
Scott Moser (smoser)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote :

This has already ended up in vivid-proposed.

Changed in isc-dhcp (Ubuntu Vivid):
status: Confirmed → Fix Committed
Changed in isc-dhcp (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Scott, or anyone else affected,

Accepted isc-dhcp into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/isc-dhcp/4.2.4-7ubuntu12.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Scott Moser (smoser) wrote :

## host system is vivid, trusty would also do

host $ rel=trusty ; serial=20150417
host $ url=http://cloud-images.ubuntu.com/releases/$rel/release-$serial/ubuntu-14.04-server-cloudimg-amd64-disk1.img
host $ img_dist="$rel-$serial-amd64.img.dist"
host $ img=${img_dist%.dist}

host $ pkgs="qemu-utils qemu-system-x86 cloud-image-utils"
host $ sudo apt-get --assume-yes install $pkgs

## get images and convert qcow to raw
host $ [ -f "$img_dist" ] ||
  { wget "$url" -O "$img_dist.tmp" && mv "$img_dist.tmp" "$img_dist"; }
host $ [ -f "$img" ] ||
   { qemu-img convert -O raw "$img_dist" "$img.tmp" && mv "$img.tmp" "$img"; }

## create a user-data to let you log in
host $ cat > user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF
host $ cloud-localds seed.img user-data

## launch a guest with 2 nics, one stable/ssh in, one for testing
## you can ssh in with 'ubuntu/passw0rd' with 'ssh -p 2222 ubuntu@localhost'
host $ qemu-img create -f qcow2 -b $img disk.img
host $ qemu-system-x86_64 -enable-kvm \
   -device virtio-net-pci,netdev=net00 \
   -netdev user,id=net00,hostfwd=tcp::2222-:22 \
   -device virtio-net-pci,netdev=net01 \
   -netdev user,id=net01 \
   -drive if=virtio,file=disk.img,if=virtio \
   -drive if=virtio,file=seed.img,if=virtio \
   -m 768 -curses

## have 2 windows / shells in system.
### setup system
guest $ crazy_url=http://bazaar.launchpad.net/~smoser/+junk/lp1444428/download/head:/nicgocrazy-20150421193756-8xtcmllz0qf4efb4-2/nic-go-crazy
guest $ $ loop_url=http://bazaar.launchpad.net/~smoser/+junk/lp1444428/download/head:/ifuploop-20150421193756-8xtcmllz0qf4efb4-1/ifup-loop

guest $ wget "$crazy_url" -O nic-go-crazy
guest $ wget "$loop_url" -O loop-ifup
guest $ chmod 755 nic-go-crazy loop-ifup
guest $ echo "iface eth1 inet dhcp" | sudo tee -a /etc/network/interfaces

### recreate failure ####
guest-w1 % sudo ./nic-go-crazy 20 nicfoo
guest-w2 % sudo ./loop-ifup
Internet Systems Consortium DHCP Client 4.2.4
Copyright 2004-2012 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Error getting interface address for 'nic074725'; No such device
Error getting interface information.
Failed to bring up eth1.

### enable proposed and get update ###
guest $ rel=$(lsb_release -sc)
guest $ echo "deb http://archive.ubuntu.com/ubuntu ${rel}-proposed main" |
   sudo tee -a /etc/apt/sources.list.d/proposed.list
guest $ sudo apt-get update --quiet
guest $ sudo apt-cache policy
guest $ sudo apt-cache policy isc-dhcp-client
isc-dhcp-client:
  Installed: 4.2.4-7ubuntu12.1
  Candidate: 4.2.4-7ubuntu12.2
  Version table:
     4.2.4-7ubuntu12.2 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-proposed/main amd64 Packages
 *** 4.2.4-7ubuntu12.1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     4.2.4-7ubuntu12 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

guest $ sudo apt-get install isc-dhcp-client -qy

## retry the 'recreate failure' test above. It should run indefinintely.

Revision history for this message
Scott Moser (smoser) wrote :

host $ rel=vivid serial=20150422
host $ url=http://cloud-images.ubuntu.com/releases/$rel/release-$serial/ubuntu-14.04-server-cloudimg-amd64-disk1.img
host $ img_dist="$rel-$serial-amd64.img.dist"
host $ img=${img_dist%.dist}

## host
host $ pkgs="qemu-utils qemu-system-x86 cloud-image-utils"
host $ sudo apt-get --assume-yes install $pkgs

host $ [ -f "$img_dist" ] ||
  { wget "$url" -O "$img_dist.tmp" && mv "$img_dist.tmp" "$img_dist"; }
host $ [ -f "$img" ] ||
   { qemu-img convert -O raw "$img_dist" "$img.tmp" && mv "$img.tmp" "$img"; }

## create a user-data to let you log in
host $ cat > user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF
host $ cloud-localds seed.img user-data

## launch a guest with 2 nics, one stable/ssh in, one for testing
host $ qemu-img create -f qcow2 -b $img disk.img
host $ qemu-system-x86_64 -enable-kvm \
   -device virtio-net-pci,netdev=net00 \
   -netdev user,id=net00,hostfwd=tcp::2222-:22 \
   -device virtio-net-pci,netdev=net01 \
   -netdev user,id=net01 \
   -drive if=virtio,file=disk.img,if=virtio \
   -drive if=virtio,file=seed.img,if=virtio \
   -m 768 -curses

## have 2 windows / shells in system.
### setup system
guest $ crazy_url=http://bazaar.launchpad.net/~smoser/+junk/lp1444428/download/head:/nicgocrazy-20150421193756-8xtcmllz0qf4efb4-2/nic-go-crazy
guest $ $ loop_url=http://bazaar.launchpad.net/~smoser/+junk/lp1444428/download/head:/ifuploop-20150421193756-8xtcmllz0qf4efb4-1/ifup-loop

guest $ wget "$crazy_url" -O nic-go-crazy
guest $ wget "$loop_url" -O loop-ifup
guest $ chmod 755 nic-go-crazy loop-ifup
guest $ echo "iface eth1 inet dhcp" | sudo tee -a /etc/network/interfaces

### recreate failure ####
guest-w1 % sudo ./nic-go-crazy 20 nicfoo
guest-w2 % sudo ./loop-ifup
 sudo ./loop-ifup
sudo: unable to resolve host ubuntu
Internet Systems Consortium DHCP Client 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Error getting interface address for 'nic0927390'; No such device
Error getting interface information.

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug. These pages explain the proper
process and the information we find helpful for debugging..

exiting.
Failed to bring up eth1.

### enable proposed and get update ###
guest $ rel=$(lsb_release -sc)
guest $ echo "deb http://archive.ubuntu.com/ubuntu ${rel}-proposed main" |
   sudo tee -a /etc/apt/sources.list.d/proposed.list
guest $ sudo apt-get update --quiet
guest $ apt-cache policy isc-dhcp-client
isc-dhcp-client:
  Installed: 4.3.1-5ubuntu2
  Candidate: 4.3.1-5ubuntu2.1
  Version table:
     4.3.1-5ubuntu2.1 0
        500 http://archive.ubuntu.com/ubuntu/ vivid-proposed/main amd64 Packages
 *** 4.3.1-5ubuntu2 0
        500 http://archive.ubuntu.com/ubuntu/ vivid/main amd64 Packages
        100 /var/lib/dpkg/status

guest $ sudo apt-get install isc-dhcp-client -qy

## retry the 'recreate failure' test above. It should run indefinintely.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.3.1-5ubuntu2.1

---------------
isc-dhcp (4.3.1-5ubuntu2.1) vivid-proposed; urgency=medium

  * debian/patches/dhcp-getifaddrs.patch: use getifaddrs
    for getting nic addresses rather than /proc/net (LP: #1446767)
 -- Scott Moser <email address hidden> Tue, 21 Apr 2015 18:10:40 +0000

Changed in isc-dhcp (Ubuntu Wily):
status: Triaged → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

With some other work that I was doing, I found bug 1272414
it seems that getifaddrs() is somewhat slow when lots of interfaces are used. so this could have a bit of a performance impact in some places. that said, dhclient is not likely a heavy use path.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.3.1-5ubuntu2.1

---------------
isc-dhcp (4.3.1-5ubuntu2.1) vivid-proposed; urgency=medium

  * debian/patches/dhcp-getifaddrs.patch: use getifaddrs
    for getting nic addresses rather than /proc/net (LP: #1446767)
 -- Scott Moser <email address hidden> Tue, 21 Apr 2015 18:10:40 +0000

Changed in isc-dhcp (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Scott Kitterman (kitterman) wrote : Update Released

The verification of the Stable Release Update for isc-dhcp has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

So, has anyone verified that the update is properly fixing the bug on Trusty?

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Yes, sorry for delay. We have been running repeated deployments with Trusty and this package and are no longer hitting the problems anymore.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.2.4-7ubuntu12.2

---------------
isc-dhcp (4.2.4-7ubuntu12.2) trusty-proposed; urgency=medium

  * debian/patches/dhcp-getifaddrs.patch: use getifaddrs
    for getting nic addresses rather than /proc/net (LP: #1446767)
 -- Scott Moser <email address hidden> Tue, 21 Apr 2015 18:10:40 +0000

Changed in isc-dhcp (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chad Page (chad-page) wrote :

There is a bad memory leak when many (thousands in our case) interfaces are up and dhcpd is started. In addition it is very slow.

For the former, a call to freeifaddrs(ifaddr) is needed before the next interface is looked at, and we got around the latter by only calling getifaddrs() once, but that might regress what this fixed.

Changed in isc-dhcp (Fedora):
importance: Unknown → Low
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.