Dnsmasq 2.81 broke neutron's DHCP service

Bug #1876094 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Harald Jensås

Bug Description

With dnsmasq 2.81 DHCP for instances connected to network with both IPv4 and IPv6 dhcp-stateful networks will not work as it should. Dnsmasq is processing "host" file's entries from the bottom to the top and as neutron always places first IPv4 and then IPv6 addresses for same MAC, DHCP for IPv4 will not work now.
Details are in Harald's email: http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q2/014038.html

Basically it's not Neutron's fault but regression in dnsmasq, but we can workaround it by changing how we generate host file. If it will be one line per mac address, like:

fa:16:3e:b2:5f:a2,host-10-0-0-72.openstacklocal,[fd05:321f:2d56:1::b5]
fa:16:3e:08:f4:1f,host-10-0-0-2.openstacklocal,10.0.0.2,[fd05:321f:2d56:1::2]
fa:16:3e:9d:76:87,host-10-0-0-31.openstacklocal,10.0.0.31,[fd05:321f:2d56:1::8]
fa:16:3e:9f:f9:a5,host-10-0-0-60.openstacklocal,10.0.0.60,[fd05:321f:2d56:1::30a]
fa:16:3e:8c:47:e5,host-10-0-0-1.openstacklocal,10.0.0.1

then it should works fine in both dnsmasq 2.81 and older versions.

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
tags: added: l3-ipam-dhcp
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

It is also broken in dnsmasq-2.79-11.el8 which is available in Centos 8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/725246

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/725369

Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → Harald Jensås (harald-jensas)
Akihiro Motoki (amotoki)
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/725246

Revision history for this message
sean mooney (sean-k-mooney) wrote :

im not certin but i think this has broken ironic integration as well
i have been debuging an issue where ironic intospection works (and there for all the pxe and ipmi infrastucture) but deploy fails because the neutron dhcp server does not respond.

the neutron port config is correct
http://paste.openstack.org/show/793148/

and i have done a tshark wire capture on the the tap device on the dhcp server namespase and confirmed that the packet reachs it correctly be the server never reponds.

if i look in the dnsmasq log i see this http://paste.openstack.org/show/793150/

so i suspect that the extra dhcp optiosn are also split across multiple lines? or are otherwsie also not being processed correcty.

i will try booting a server tommorwo and capturing the contents of the files in
/var/lib/neutron/dhcp/dff5da1b-d7d7-4903-864e-b80c045683b2/ to confirm how they are being populated.

but i would guess this need to be treated as a potentil candiate for an RC or prioritised for backport as it will be an upgrade blocker for ironic cloud if this is infact the root casue of the issue i am facing

Revision history for this message
Harald Jensås (harald-jensas) wrote :

@sean,

What is your subnet mask?

In IRC I see you have two vlans, 192.168.2.1 and 192.168.3.1? I guess your vlan's subnet's are If your vlans are: 192.168.2.0/24 and 192.168.3.0/24 ?

The dnsmasq logs indicate it is listening on:
  May 5 23:14:06 dnsmasq-dhcp[1986]: DHCP, static leases only on 192.168.2.0, lease time 1d

Your server port:
  ip_address='192.168.50.169'

If dnsmasq is service static only on 192.168.2.0/24 then adding a host with 192.168.50.169 to it's config will result in 'no address available' since the address is not in 192.168.2.0/24?

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (3.8 KiB)

vlan 1 is my api network 192.168.1.0/24
vlan 2 was 192.168.2.0/24
vlan 3 was 192.168.50.0/24 in neutron but i also used that for the ipmi interfaces so each
server had staic ips on there bmc in the 192.168.3.0/24 subnet

i have updated it so that 192.168.3.0/24 is the subnet in neutron and there is no change.

(neutron-dhcp-agent)[neutron@workstation /var/lib/neutron/dhcp/dff5da1b-d7d7-4903-864e-b80c045683b2]$ cat host
fa:16:3e:af:c6:b6,host-192-168-3-150.openstacklocal,192.168.3.150
78:e7:d1:e7:4a:27,host-192-168-3-171.openstacklocal,192.168.3.171,set:port-55912e59-73ba-4c92-b906-3576ea4b2df1

digging a littel deeper i see in my case there is a singerl entry for the mac in the hosts file

the opts file has multiple entries

(neutron-dhcp-agent)[neutron@workstation /var/lib/neutron/dhcp/dff5da1b-d7d7-4903-864e-b80c045683b2]$ cat opts
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:dns-server,192.168.3.1,192.168.1.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:classless-static-route,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,249,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:router,192.168.3.1
tag:port-55912e59-73ba-4c92-b906-3576ea4b2df1,150,192.168.1.61
tag:port-55912e59-73ba-4c92-b906-3576ea4b2df1,66,192.168.1.61
tag:port-55912e59-73ba-4c92-b906-3576ea4b2df1,67,undionly.kpxe
tag:port-55912e59-73ba-4c92-b906-3576ea4b2df1,210,/httpboot/
tag:port-55912e59-73ba-4c92-b906-3576ea4b2df1,option:server-ip-address,192.168.1.61

so im wondering if we have the same issue with the opts.
e.g. does that need to be updated so that all the opts are on one line

i still get the same message in the dnsmas log
May 6 15:45:14 dnsmasq-dhcp[3800]: DHCPDISCOVER(tap415dce2d-68) 78:e7:d1:e7:4a:27 no address available
May 6 15:45:51 dnsmasq-dhcp[3715]: DHCPDISCOVER(tapfe3cb124-f5) 78:e7:d1:e7:4a:26 no address available

but i think i might know what is going on if i look at the pid 3715 vs 3800

May 6 03:46:13 dnsmasq-dhcp[3715]: DHCP, static leases only on 192.168.3.0, lease time 1d
May 6 03:46:13 dnsmasq[3715]: using nameserver 8.8.4.4#53
May 6 03:46:13 dnsmasq[3715]: using nameserver 8.8.8.8#53
May 6 03:46:13 dnsmasq[3715]: using nameserver 1.1.1.1#53

May 6 03:48:12 dnsmasq-dhcp[3800]: DHCP, static leases only on 192.168.2.0, lease time 1d
May 6 03:48:12 dnsmasq[3800]: using nameserver 8.8.4.4#53
May 6 03:48:12 dnsmasq[3800]: using nameserver 8.8.8.8#53
May 6 03:48:12 dnsmasq[3800]: using nameserver 1.1.1.1#53

since i have updated my network layout but not rerun intospection and since i have changed which interface is being used for pxe booting i think i have the wrong interface mac set in ironic.

regarding the 192.168.50.0/24 subnet i was using before
i do have a start message for that too.
May 5 19:10:23 dnsmasq-dhcp[887]: DHCP, static leases only on 192.168.50.0, lease time 1d
May 5 19:10:23 dnsmasq[887]: using nameserver 8.8.4.4#53
May 5 19:10:23 dnsmasq[887]: using nameserver 8.8.8.8#53
May 5 19:10:23 dnsmasq[887]: using nameserver 1.1.1.1#53

s...

Read more...

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (13.5 KiB)

ok so ya it looks like my issue was related to the swapped cable
after i re run introspection it is not now booting althoug i am still having issues with ipxe

specifcially the initall pxe boot is geting the ipxe image fine form the tftp server on 192.168.1.61

but when the ipxe image loads it does not seam to get the static route...

so (neutron-dhcp-agent)[neutron@workstation /var/lib/neutron/dhcp/dff5da1b-d7d7-4903-864e-b80c045683b2]$ cat opts
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:dns-server,192.168.3.1,192.168.1.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:classless-static-route,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,249,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:router,192.168.3.1
tag:port-2c99df54-23a3-4dca-b0f1-ac71ffba8946,67,undionly.kpxe
tag:port-2c99df54-23a3-4dca-b0f1-ac71ffba8946,option:server-ip-address,192.168.1.61
tag:port-2c99df54-23a3-4dca-b0f1-ac71ffba8946,150,192.168.1.61
tag:port-2c99df54-23a3-4dca-b0f1-ac71ffba8946,210,/httpboot/
tag:port-2c99df54-23a3-4dca-b0f1-ac71ffba8946,66,192.168.1.61

works fine for my bmc's pxe rom but statics routes

tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,option:classless-static-route,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1
tag:subnet-852a7a69-667e-4f51-9be4-a400a03e2013,249,192.168.1.0/24,192.168.3.1,169.254.169.254/32,192.168.3.150,0.0.0.0/0,192.168.3.1

are not not picked up by the ipxe image and it complains that a network is unreacable.

looking at the neutron port

 openstack port show 2c99df54-23a3-4dca-b0f1-ac71ffba8946
+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| allowed_address_pairs | |
| binding_host_id | 31303735-3035-4247-3830-333132534457 |
| binding_profile | |
| binding_vif_details | ...

Revision history for this message
sean mooney (sean-k-mooney) wrote :

just an fyi im pretty sure the issue is that the request does not contain
dhcp option 121 for classless-static-route so even though neutron is correctly configuring it

its not being served in the responce. i dont know if this is another break in behavior with the new version fo dnsmasq or not but that is why this is failing for me.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/725369
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f951871430ba59a148b8cb88e0d1b9e517c0a52e
Submitter: Zuul
Branch: master

commit f951871430ba59a148b8cb88e0d1b9e517c0a52e
Author: Harald Jensås <email address hidden>
Date: Mon May 4 20:01:35 2020 +0200

    Use dhcp-host tag support when supported

    In dnsmasq 2.81 there is a regression (see [1] for details).
    Prior versions of dnsmasq would select a host record where:
    a) no address is present in the host record.
    b) an address matching address family of the client request
       is present in the host record.

    dnsmasq 2.81 will also use a host record where a only an address
    not matching the address family of the client request is present.

    The same issue is also backported to the dnsmasq-2.79-11.el8.x86_64
    which is e.g. in RHEL 8.2 and Centos 8.

    dnsmasq version 2.81 also adds support for using tag's on host
    records. When a dhcpv6 request is received, dnsmasq automatically
    sets the tag 'dhcpv6'.

    This change adds a runtime check, testing for dnsmasq host entry
    tag support. And adds 'tag:dhcpv6' to all IPv6 host records when
    dnsmasq supports this.

    Adding the tag makes dnsmasq prefer the tagged host for dhcpv6
    requests, i.e it's a workaround fix for the regression issue.

    [1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q2/014051.html

    Closes-Bug: #1876094
    Change-Id: Ie654c84137914226bdc3e31e16219345c2efaac9

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/726079

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726080

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/726080
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=00dca13b66a50fcdabb7ceb01fb7f4ec9c3b3bf1
Submitter: Zuul
Branch: stable/train

commit 00dca13b66a50fcdabb7ceb01fb7f4ec9c3b3bf1
Author: Harald Jensås <email address hidden>
Date: Mon May 4 20:01:35 2020 +0200

    Use dhcp-host tag support when supported

    In dnsmasq 2.81 there is a regression (see [1] for details).
    Prior versions of dnsmasq would select a host record where:
    a) no address is present in the host record.
    b) an address matching address family of the client request
       is present in the host record.

    dnsmasq 2.81 will also use a host record where a only an address
    not matching the address family of the client request is present.

    The same issue is also backported to the dnsmasq-2.79-11.el8.x86_64
    which is e.g. in RHEL 8.2 and Centos 8.

    dnsmasq version 2.81 also adds support for using tag's on host
    records. When a dhcpv6 request is received, dnsmasq automatically
    sets the tag 'dhcpv6'.

    This change adds a runtime check, testing for dnsmasq host entry
    tag support. And adds 'tag:dhcpv6' to all IPv6 host records when
    dnsmasq supports this.

    Adding the tag makes dnsmasq prefer the tagged host for dhcpv6
    requests, i.e it's a workaround fix for the regression issue.

    [1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q2/014051.html

    Closes-Bug: #1876094
    Change-Id: Ie654c84137914226bdc3e31e16219345c2efaac9
    (cherry picked from commit f951871430ba59a148b8cb88e0d1b9e517c0a52e)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/726079
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=38afccc28a49c383c4649bcd385271cbb4b4c3a6
Submitter: Zuul
Branch: stable/ussuri

commit 38afccc28a49c383c4649bcd385271cbb4b4c3a6
Author: Harald Jensås <email address hidden>
Date: Mon May 4 20:01:35 2020 +0200

    Use dhcp-host tag support when supported

    In dnsmasq 2.81 there is a regression (see [1] for details).
    Prior versions of dnsmasq would select a host record where:
    a) no address is present in the host record.
    b) an address matching address family of the client request
       is present in the host record.

    dnsmasq 2.81 will also use a host record where a only an address
    not matching the address family of the client request is present.

    The same issue is also backported to the dnsmasq-2.79-11.el8.x86_64
    which is e.g. in RHEL 8.2 and Centos 8.

    dnsmasq version 2.81 also adds support for using tag's on host
    records. When a dhcpv6 request is received, dnsmasq automatically
    sets the tag 'dhcpv6'.

    This change adds a runtime check, testing for dnsmasq host entry
    tag support. And adds 'tag:dhcpv6' to all IPv6 host records when
    dnsmasq supports this.

    Adding the tag makes dnsmasq prefer the tagged host for dhcpv6
    requests, i.e it's a workaround fix for the regression issue.

    [1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q2/014051.html

    Closes-Bug: #1876094
    Change-Id: Ie654c84137914226bdc3e31e16219345c2efaac9
    (cherry picked from commit f951871430ba59a148b8cb88e0d1b9e517c0a52e)

tags: added: in-stable-ussuri
tags: added: neutron-proactive-backport-potential
Revision history for this message
Dan Radez (dradez) wrote :

This appears to be related to:
https://src.fedoraproject.org/rpms/dnsmasq/c/744ba31be775c11b1f52104d6285509b06b81035?branch=master

I'm cleaning my env to do a final verification. Seems that dnsmasq will only listen on lo with this in place.

Maybe we can add interface= to the neutron dnsmasq command to ensure that we always listen on all interfaces no matter what the system config states?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Dan Radez (<email address hidden>) on branch: master
Review: https://review.opendev.org/755330
Reason: Need a different change to resolve this bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/755356

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.