networking.service hangs on shutdown -- killing dhclient has no effect any more

Bug #1556175 reported by Martin Pitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
Fix Released
High
LaMont Jones
isc-dhcp (Ubuntu)
Fix Released
High
Unassigned

Bug Description

On current xenial, LXD containers take ages to reboot:

lxc launch images:ubuntu/xenial/amd64 x1
lxc exec x1 reboot

"systemctl list-jobs" shows:

117 networking.service stop running

root@x1:~# systemctl status networking
● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: deactivating (stop-sigterm) since Fri 2016-03-11 16:03:51 UTC; 21s ago
     Docs: man:interfaces(5)
  Process: 187 ExecStop=/sbin/ifdown -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 55 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 49 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --r
 Main PID: 55 (code=exited, status=0/SUCCESS)
    Tasks: 1 (limit: 512)
   CGroup: /system.slice/networking.service
           └─146 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0

Mar 11 16:03:52 x1 dhclient[207]: For info, please visit https://www.isc.org/software/dhcp/
Mar 11 16:03:52 x1 dhclient[207]:
Mar 11 16:03:52 x1 dhclient[207]: Listening on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Listening on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Sending on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Sending on Socket/fallback
Mar 11 16:03:52 x1 dhclient[207]: Sending on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 dhclient[207]: Sending on Socket/fallback
Mar 11 16:03:52 x1 dhclient[207]: DHCPRELEASE on eth0 to 10.0.3.1 port 67 (xid=0x6480ef2b)
Mar 11 16:03:52 x1 ifdown[187]: DHCPRELEASE on eth0 to 10.0.3.1 port 67 (xid=0x6480ef2b)

The above is also the last journal line that I get for the usual 90s timeout, then the container finally reboots.

Not many processes are actually running any more:

root 1 0.0 0.0 37076 5304 ? Ss 16:03 0:00 /sbin/init
root 38 0.0 0.0 41572 3316 ? Ss 16:03 0:00 /lib/systemd/systemd-udevd
root 41 0.0 0.0 35276 5668 ? Ss 16:03 0:00 /lib/systemd/systemd-journald
root 146 0.0 0.0 181376 2664 ? Ss 16:03 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0

but apparently the dhclient instance refuses to die (note that ifdown -a itself is not running any more).

Revision history for this message
Martin Pitt (pitti) wrote :

This is actually unrelated to rebooting -- "ifdown -a" returns immediately and leaves eth0 DOWN, but does not kill dhclient despite saying "Killed old client process".

Changed in ifupdown (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-16.03
summary: - networking.service hangs on shutdown in LXD
+ networking.service hangs on shutdown in LXD / ifdown fails to kill
+ dhclient
summary: - networking.service hangs on shutdown in LXD / ifdown fails to kill
- dhclient
+ networking.service hangs on shutdown -- ifdown fails to kill dhclient
Revision history for this message
Martin Pitt (pitti) wrote : Re: networking.service hangs on shutdown -- ifdown fails to kill dhclient

strace shows that ifdown does read the pid and SIGTERMs dhclient correctly:

201 open("/run/dhclient.eth0.pid", O_RDONLY) = 7
201 read(7, "146\n", 4096) = 4
201 kill(146, SIGTERM) = 0

But after that pid 146 (dhclient) is still running happily. "kill 146" also does not help (just kill -9 of course). This is apparently the real regression, in wily and before you can TERM dhclient.

Martin Pitt (pitti)
affects: ifupdown (Ubuntu) → isc-dhcp (Ubuntu)
Changed in isc-dhcp (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
tags: added: regression-release xenial
Changed in isc-dhcp (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
summary: - networking.service hangs on shutdown -- ifdown fails to kill dhclient
+ networking.service hangs on shutdown -- killing dhclient has no effect
+ any more
Revision history for this message
Martin Pitt (pitti) wrote :

Indeed dhclient now blocks SIGTERM -- after sending it, it gets

ShdPnd: 0000000000004000
SigBlk: 0000000000004003

Downgrading to 4.3.3-5ubuntu1 fixes the bug, so the regression is not in the new upstream version nor in Debian, but in one of our recent Ubuntu changes.

Revision history for this message
Martin Pitt (pitti) wrote :

Further bisection shows that 4.3.3-5ubuntu5 is the last version that works, and 4.3.3-5ubuntu7 is the first version that fails (ubuntu6 cannot be tested as that FTBFSed).

The corresponding diff is http://launchpadlibrarian.net/242313487/isc-dhcp_4.3.3-5ubuntu5_4.3.3-5ubuntu7.diff.gz -- the change was that this now links to the non-export bind libraries. There is no rationale/explanation of that, but supposedly the internal libraries handle SIGTERM differently.

Revision history for this message
Martin Pitt (pitti) wrote :

Lamont apparently knows what's going on. He'll restore the -export versions soon, so that we can switch isc-dhcp back to those.

Changed in isc-dhcp (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
Changed in bind9 (Ubuntu):
assignee: nobody → LaMont Jones (lamont)
importance: Undecided → High
status: New → Triaged
Revision history for this message
Seth Arnold (seth-arnold) wrote :
LaMont Jones (lamont)
Changed in bind9 (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.3.3-5ubuntu10

---------------
isc-dhcp (4.3.3-5ubuntu10) xenial; urgency=low

  [ Stefan Bader ]
  * Revert back to bind against bind9 export libraries
    (LP: #1556175, LP: #1551351)

  [ LaMont Jones ]
  * libirs-export requires libisccfg-export now

 -- Stefan Bader <email address hidden> Thu, 17 Mar 2016 17:45:53 +0100

Changed in isc-dhcp (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bind9 - 1:9.10.3.dfsg.P4-3

---------------
bind9 (1:9.10.3.dfsg.P4-3) experimental; urgency=medium

  * Fix vcs links
  * build in debian/tmp, use bind9.install

 -- LaMont Jones <email address hidden> Fri, 18 Mar 2016 14:46:30 -0600

Changed in bind9 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.