VM interface goes down because of short dhcp lease

Bug #887162 reported by PierreF
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Unassigned

Bug Description

Hi,

We have an issue with our cloud VM, where eth0 goes down a short time, because they didn't get DHCP reply quickly enough.

The cause seems to be dnsmasq not responding to unicast dhcprequest and only getting reply with broadcast dhcprequest, which may be sent too late in time.

Here is the detailed problem and our installation:

* All server are ubuntu lucid.
* nova version are version 2011.3 from ppa (2011.3-0ubuntu2~ppa1~lucid1)
* lease time is 120 second (default - hard coded in nova/network/linux_net.py, in dnsmasq startup line)
* from the 120 second of lease timelife, dnsmasq generate the renew to a value near 60s (120/2) and rebind time to a value near 105s (7/8 * 120)
* we are using vlan and then we have several dnsmasq running, one for each vlan, serving only the given vlan
* guest is a simple ubuntu lucid with default dhcp3-client

First issue: dnsmasq don't reply to UNICAST dhcprequest: because when an unicast request come to nova-network node, since we have several dnsmasq running on some port (0.0.0.0:67), the last started will get the message. This is probably more a dnsmasq bug.

But because of this, dhcp client won't get any answer before trying to rebind. In this case, rebind send BROADCAST dhcprequest, and in this case all dnsmasq will get the message. Since dnsmasq only reply when it know the host, when all dnsmasq get the broadcast request, only the good one send the reply.

In most case everything is fine, dhcp client send the broadcast few seconds before lease expire time. But something (I'm pretty sure because of backoff time - interval between two dhcprequest) it don't send the broadcast request before lease expire time.

This never last too long, because if dhcp client would backoff after the lease expire time, it alter the backoff interval to match the expire time... + 1 second. So interface goes down and a second later dhcp request is send.

A solution could be to increase the lease time to more than 120s, but this seems to be hard-coded (I don't known if their is some reason). Also, it's seem to be an flag for dhcp_lease_time, but it's not used for dnsmasq command line parameter.

We are testing a workaround, which is to limit the backoff interval of dhclient to 10 second instead of default 2 minutes.

If something is not clear, don't hesitate to ask for more information.

I also add detail on software version and extract of dhcp client log

PS: we have trouble with interface going down, because we add a secondary IP (a VIP) with pacemaker. When interface goes goes, even the shortest time, secondary IP get lost.

Revision history for this message
PierreF (pierre-fersing) wrote :
Revision history for this message
PierreF (pierre-fersing) wrote :
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Boris Deschenes (boris-michel-deschenes) wrote :

I have the same issue, especially on windows guests, sometimes the DHCPREQUEST is sent too late and the DHCP server answers with a NAK because the lease is not found (lease is expired).

We are testing with a longer lease time (yes you have to overwrite the hard-coded value of 120 in linux_net.py).

Hopefully a longer lease time will fix this issue and the dhcp_release switch will be used to free up private IPs of terminated instances before the lease expired so we don't waste fixed ips.

I don't think we should work on a client-based solution as they are many OS involved, but a switch to overwrite the lease time would be effective and simple to implement.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

FYI, dhcp_lease_type is configurable in essex, so you don't have to overwrite the hard coded value.

Changed in nova:
status: Confirmed → Fix Released
Revision history for this message
Boris Deschenes (boris-michel-deschenes) wrote :

I guess you mean --dhcp_lease_time and not --dhcp_lease_type

I see the switch is actually referenced in diablo but simply not used for the dnsmasq command. If it's fixed in essex then all is well.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

sorry, yes --dhcp_lease_time and it is actually used in essex. :)

Revision history for this message
tomiles (tomiles-deactivatedaccount) wrote :

Maybe this isn't a duplicate. The bug originally reported by PierreF is exactly what we experience. We raised the lease time slightly (so the --dhcp_lease_time does its job ) and still have issues with network interfaces going down on VMs under high load.
So the --dhcp_lease_time parameter does its job but doesn't solve the issue of why the VM don't receive a DHCPOFFER and then looses its interface. So ok we can raise the lease but the actual issue is probably a dnsmasq related bug.

Our syslog:
Jun 27 08:32:26 node1 dhclient: DHCPREQUEST of 192.168.1.13 on eth0 to 192.168.1.7 port 67
Jun 27 08:32:26 node1 dhclient: DHCPACK of 192.168.1.13 from 192.168.1.7
Jun 27 08:32:26 node1 dhclient: bound to 192.168.1.13 -- renewal in 407 seconds.
Jun 27 08:38:24 ---> from this time point we get a lot error lines of deamons that start complaining that network interface is down
-->only after only 3 min the VM requests a new IP but doesn't get offer back, We check the node running DNSmasq and it never seems to receive these last requests
Jun 27 08:41:15 pbsnode1 dhclient: DHCPREQUEST of 192.168.1.13 on eth0 to 192.168.1.7 port 67
Jun 27 08:41:27 pbsnode1 dhclient: DHCPREQUEST of 192.168.1.13 on eth0 to 192.168.1.7 port 67
Jun 27 08:44:53 pbsnode1 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
Jun 27 08:44:53-->at this time the node with dns masq server receives the DHCPDISCOVER and send out a DHCPACK of 192.168.1.13 back, but this doesn't sam to reach the VM and it keeps sending out DHCPDISCOVER requests
Jun 27 08:53:38 pbsnode1 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Jun 27 08:53:46 pbsnode1 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 19
Jun 27 08:54:05 pbsnode1 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4
Jun 27 08:54:09 pbsnode1 dhclient: No DHCPOFFERS received.

--> at this point we just give up and reboot the VM to get our network back

Revision history for this message
GMi (gmi) wrote :
Revision history for this message
tomiles (tomiles-deactivatedaccount) wrote :

The problems detailed in my comment above occurred despite the fact that we had already upgraded to 2.61.
We even start loosing our network interface under high load now despite having raised our lease time to 600000s.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

can you check syslog for errors on the compute host? You might want to make sure you aren't losing packets due to conntrack tables being full etc. If you are seeing issues like that, it could be a bug in libvirt / qemu / virtio that needs to be reported upstream.

Revision history for this message
tomiles (tomiles-deactivatedaccount) wrote :

the compute hosts syslog doesn't contain any errors around the time-point of the crash.
We are beginning to think DHCP is not the source issue but just a consequence of the network interface going down.
But we are in the dark about any indication on what could possible trigger a VMs network interface to do down?
Usual suspect are glusterfs mounts in the VM but don't know if that can cause a network down.

Revision history for this message
Narayan Desai (narayan-desai) wrote :

I'm seeing something quite similar on my deployment (essex/precise). I have either network or VM stability problems (haven't nailed down which yet) when the network interface gets busy on the VM. The instance can be rebooted successfully and begins to work properly, modulo network load. The odd thing is that we only see this issue on machines with large amounts of memory. We don't see the same issue on hosts/instances with smaller memory footprints. It is unclear to me if this is an artifact of resource footprint or a difference between node hardware types.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.