wifi connection drops, reconnects every 10 minutes

Bug #1664748 reported by Dustin Kirkland 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
maas (Ubuntu)
Invalid
Undecided
Unassigned
network-manager (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I recently moved my home DHCP and DNS server over to MAAS (2.1.3 from
xenial-updates).

Since doing so, I've noticed that my wifi connection drops and
reconnects (with corresponding Unity pop-up notifications) exactly
every 10 minutes.

I suppose this is due to the fact that MAAS sets DHCP leases to 10
minutes by default?

Has anyone else noticed this behavior?

Is there a suitable workaround? Increasing the DHCP lease time?
Using static addresses? Something else?

Tags: maas-at-home
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Dustin,

I have, /personally/ not seen this type of behavior myself. However, it is kind of strange that the wifi connection would just drop when a lease is renewed... that would seem like a bug in network manager to me....

That said, I think it is worth investigating on our side.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@ Dustin,

Can you attach your dhcpd.conf or at least the relevant sections to see how this hostmap was configured ?

Revision history for this message
Andres Rodriguez (andreserl) wrote :

cat $(ps auxw | grep dhclient | grep -o '\-lf.*' | awk '{ print $2 }')

Revision history for this message
Mike Pontillo (mpontillo) wrote :

^ That is, please run that command on the client so we can see the full transaction history.

It's hard to tell if the DHCP client isn't renewing fast enough, or possibly it's trying to renew but for some reason the server no longer recognizes the lease.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (3.3 KiB)

From the ML

lease {
  interface "enp0s25";
  fixed-address 10.1.8.227;
  option subnet-mask 255.255.255.0;
  option dhcp-lease-time 2592000;
  option routers 10.1.8.1;
  option dhcp-message-type 5;
  option dhcp-server-identifier 10.1.8.1;
  option domain-name-servers 8.8.8.8,8.8.4.4;
  option broadcast-address 10.1.8.255;
  option domain-name "medusa.mezzonet.net";
  renew 5 2017/02/10 09:51:53;
  rebind 4 2017/02/23 03:18:17;
  expire 0 2017/02/26 21:18:17;
}
lease {
  interface "enp0s25";
  fixed-address 10.0.0.112;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option dhcp-server-identifier 10.0.0.3;
  option ntp-servers 10.0.0.3;
  option broadcast-address 10.0.0.255;
  option domain-name "maas";
  renew 2 2017/02/14 23:33:53;
  rebind 2 2017/02/14 23:38:13;
  expire 2 2017/02/14 23:39:28;
}
lease {
  interface "enp0s25";
  fixed-address 10.0.0.112;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option dhcp-server-identifier 10.0.0.3;
  option ntp-servers 10.0.0.3;
  option broadcast-address 10.0.0.255;
  option domain-name "maas";
  renew 2 2017/02/14 23:38:02;
  rebind 2 2017/02/14 23:42:39;
  expire 2 2017/02/14 23:43:54;
}
lease {
  interface "enp0s25";
  fixed-address 10.0.0.112;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option dhcp-server-identifier 10.0.0.3;
  option ntp-servers 10.0.0.3;
  option broadcast-address 10.0.0.255;
  option domain-name "maas";
  renew 2 2017/02/14 23:41:55;
  rebind 2 2017/02/14 23:46:47;
  expire 2 2017/02/14 23:48:02;
}
lease {
  interface "enp0s25";
  fixed-address 10.0.0.112;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option dhcp-server-identifier 10.0.0.3;
  option ntp-servers 10.0.0.3;
  option broadcast-address 10.0.0.255;
  option domain-name "maas";
  renew 2 2017/02/14 23:46:17;
  rebind 2 2017/02/14 23:50:40;
  expire 2 2017/02/14 23:51:55;
}
lease {
  interface "wlp3s0";
  fixed-address 10.0.0.46;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option dhcp-lease-time 600;
  option routers 10.0.0.1;
  option dhcp-message-type 5;
  option dhcp-server-identifier 10.0.0.3;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option broadcast-address 10.0.0.255;
  option ntp-servers 10.0.0.3;
  option domain-name "maas";
  renew 2 2017/02/14 23:30:52;
  rebind 2 2017/02/14 23:35:43;
  expire 2 2017/02/14 23:36:58;
}
lease {
  interface "wlp3s0";
  fixed-address 10.0.0.46;
  filename "pxelinux.0";
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.3,10.0.0.1;
  option dhcp-ser...

Read more...

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Normal behavior is that the DHCP client begins to renew the lease half way through the lease time. So, for MAAS's 10 minute lease, it should be after ~5 minutes. After ~85-90% of the time (the rebind time), the client will give up on the lease and try to get a new one, under the assumption that maybe a different DHCP server took over.

Taking a closer look at the leases file, we can estimate when each lease renewed (though that information is unstated), and whether or not it was an expected outcome:

[Wired]
Lease 1: Not from MAAS. [10.1.8.277]
Lease 2: Renews 23:33:53 (so granted ~23:28:53) [OK] [10.0.0.112]
Lease 3: Renews 23:38:02 (so granted ~23:33:02) [RENEWED-OK] [10.0.0.112]
Lease 4: Renews 23:41:55 (so granted ~23:36:55) [RENEWED-OK] [10.0.0.112]
Lease 5: Renews 23:46:17 (so granted ~23:41:17) [RENEWED-OK] [10.0.0.112]

[WLAN]
Lease 6: Renews 23:30:52 (so granted ~23:25:52) [OK] [10.0.0.46]
Lease 7: Renews 23:41:27 (so granted ~23:36:27) [REBOUND; granted after rebind time of 23:35:43] [10.0.0.46]

On the wired interface, everything looks great. Leases are being renewed as expected.

The only questionable data point is the last lease (on the WLAN interface); it appears that this lease extended beyond the REBIND timeout, which caused the client to give up on the current lease and try to get a new lease instead.

Since the wired interface is okay, I think it's safe to assume that there is greater packet loss on the wireless interface, leading to the normal DHCP client behavior of giving up on the lease if it hasn't heard back from the server.

So on one hand, everything is operating normally, and maybe you should look at upgrading your WiFi network to prevent packet loss. ;-) On the other hand, yes, if you were to increase the timeout, that might help somewhat with this situation.

Increasing the timeout is a tricky balance; make it too long, and customers with small or highly utilized dynamic ranges will not be able to deploy new machines. Make it too short, and clients on networks experiencing packet loss, and/or poorly-written DHCP clients will lose their leases.

Changed in network-manager (Ubuntu):
status: New → Invalid
Changed in maas (Ubuntu):
status: New → Opinion
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Marking 'Opinion' for MAAS since we might want to discuss if it's possible to better balance renewal times for clients on unreliable networks.

I do not believe this is a bug in the ISC DHCP client or NetworkManager, so marking 'Invalid' for NM.

Changed in maas (Ubuntu):
status: Opinion → Invalid
Revision history for this message
Mike Pontillo (mpontillo) wrote :

After further investigation, this is also invalid for MAAS.

The root cause of this issue is: two separate interfaces requesting an address from the same DHCP server cannot be supported. (At least, not without some serious sysctl hacking at a minimum, but I'm not even sure about that.)

If you have a wired and a wireless NIC running alongside NetworkManager, NetworkManager will kindly ensure that the metric of your wireless interface is higher than for your wired interface.

When the DHCP client goes to renew the lease in question, it will send out the renewal on the interface it is currently bound to. (the wireless interface) So far so good.

Now the DHCP server will receive the DHCP renewal request, and then create a unicast UDP reply packet. This packet will be addressed to the wireless interface. So the DHCP server will need to ARP for the currently-leased IP address on the wireless interface. So far so good. The ARP request will be sent to the MAC owner of the lease, since that's what should be cached for that IP address. So far so good.

Your laptop (happily, or so it thinks, sitting on both the wireless and wired networks) receives an ARP request on the wireless owned-MAC. So far so good.

Your laptop, in sending its ARP reply, wants to be sure that the requester has the best possible interface to communicate with said IP address on. "Oh, hey", the kernel says to itself, "it says here in my table that wired0 is a better interface than wlan0 to communicate with 10.0.0.3 on". So it constructs an ARP reply to the DHCP server, effectively saying "hey, wait, I have better information about 10.0.0.46. You should talk to it on wired0. Then we won't have to worry about that stupid unreliable radio, OK? OK."

So the DHCP server dutifully processes the ARP reply and blasts the lease renewal ACK to... wired0. Which promptly drops it after saying to itself "I didn't ask for that IP address; go away".

So the poor wireless interface is a third-wheel in all this, thinking that the server hates it or the network must have dropped its packets. Until lease renewal comes along, the IP address goes away, the route goes away, and the initial (broadcast-based) discover/offer/request/ack cycle works just fine, giving it a short-lived glimmer of hope.

QED.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

One last note on this. It might be possible to get this setup to work (on the client) using the following sysctl changes:

net.ipv4.conf.all.arp_filter = 1 (or 0)
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.all.arp_ignore = 2

This is completely untested. But in theory[1]:

 - arp_filter = 1: "allows you to have multiple network interfaces on the same subnet", according to the kernel docs. However, the decision is "based on whether or not the kernel would route a packet from the ARP'd IP out that interface". So that might need to remain set to zero. So worst case, it still wouldn't work, or the DHCP server would get conflicting ARP replies (possibly making the problem worse for the wired interface).

 - rp_filter = 2: the default in Ubuntu is for strict reverse-path filtering, which might cause us to fail to receive unicast DHCP ACK replies, if we see packets coming to a wireless interface [with a lower metric] which we don't expect. Loose reverse-path filtering should allow this, though it would roll back significant security properties that rp_filter=1 adds.

 - arp_ignore = 2: an attempt to mitigate the fact that ARP filtering might allow more than interface to reply to the ARP by ensuring that only interfaces configured with the address can reply. (Use this if arp_filter=1 isn't doing the trick and you need to try arp_filter=0.)

[1]: based on https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

So I tried all three of those sysctl's and the problem is still manifesting itself (every 10 minutes, Network Mananger pops up a disconnected/reconnected message.

Moreover, I'll not that this problem does NOT exist in 14.04 (tested with the same hardware, same wifi+ethernet, using a 14.04.5 LiveISO, which uses the Xenial kernel (very close to the kernel I'm running here on 16.04).

I really believe this problem is likely in Network Manager itself. I'm reopening that task for now.

Changed in network-manager (Ubuntu):
status: Invalid → New
importance: Undecided → Medium
Revision history for this message
Aron Xu (happyaron) wrote :

Hi Dustin,

I would like to know which version of Ubuntu are you running on your laptop? Specifically, I want to know what's behaving as your dhcp client, is it dhclient or systemd-networkd?

Requesting IP addresses from two interfaces on the same machine to one DHCP server is sometimes causing trouble because in systemd-networkd it sends Client-Identifier to the server (which is about your D-Bus machine-id), and some DHCP server (as far as I know at least some versions of ISC DHCP Server) will take Client-Identifier as precedence over the interface mac address, thus generating IP address allocation problem.

Changed in network-manager (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for network-manager (Ubuntu) because there has been no activity for 60 days.]

Changed in network-manager (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.