DNS resolution fails when using VPN and routing all traffic over it

Bug #1603898 reported by James Troup
44
This bug affects 8 people
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Confirmed
High
Unassigned
Xenial
Confirmed
High
Unassigned

Bug Description

When using our company VPN, the Network Manager configured dnsmasq
ends up in a weird state where its unable to answer queries because
it's (incorrectly) sending them to 127.0.0.1:53 where nothing is
listening.

| root@ornery:~# nmcli con show 'Canonical UK - All Traffic' | grep -i dns
| ipv4.dns:
| ipv4.dns-search:
| ipv4.dns-options: (default)
| ipv4.ignore-auto-dns: no
| ipv6.dns:
| ipv6.dns-search:
| ipv6.dns-options: (default)
| ipv6.ignore-auto-dns: no
| IP4.DNS[1]: 10.172.192.1
| root@ornery:~# ps auxfwwwww | grep [4]035
| nobody 4035 0.0 0.0 52872 1620 ? S Jun29 6:39 \_ /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/NetworkManager/dnsmasq.pid --listen-address=127.0.1.1 --cache-size=0 --proxy-dnssec --enable-dbus=org.freedesktop.NetworkManager.dnsmasq --conf-dir=/etc/NetworkManager/dnsmasq.d
| root@ornery:~#

Querying the DNS server provided by the VPN connection works; querying
dnsmasq doesn't:

| root@ornery:~# dig +short @10.172.192.1 www.openbsd.org
| 129.128.5.194
| root@ornery:~# dig @127.0.1.1 www.openbsd.org
|
| ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.1.1 www.openbsd.org
| ; (1 server found)
| ;; global options: +cmd
| ;; Got answer:
| ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 6996
| ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
|
| ;; QUESTION SECTION:
| ;www.openbsd.org. IN A
|
| ;; Query time: 0 msec
| ;; SERVER: 127.0.1.1#53(127.0.1.1)
| ;; WHEN: Mon Jul 18 10:25:48 CEST 2016
| ;; MSG SIZE rcvd: 33
|
| root@ornery:~#

While running 'dig @127.0.1.1 www.openbsd.org':

| root@ornery:~# tcpdump -i lo port 53 -v -n
| tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
| 10:26:04.728905 IP (tos 0x0, ttl 64, id 56577, offset 0, flags [none], proto UDP (17), length 72)
| 127.0.0.1.54917 > 127.0.1.1.53: 32273+ [1au] A? www.openbsd.org. (44)
| 10:26:04.729001 IP (tos 0x0, ttl 64, id 49204, offset 0, flags [DF], proto UDP (17), length 61)
| 127.0.1.1.53 > 127.0.0.1.54917: 32273 Refused$ 0/0/0 (33)

| root@ornery:~# netstat -anp | grep 127.0.[01].1:53
| tcp 0 0 127.0.1.1:53 0.0.0.0:* LISTEN 4035/dnsmasq
| udp 0 0 127.0.1.1:53 0.0.0.0:* 4035/dnsmasq
| root@ornery:~#

You can see below a) that dnsmasq thinks it is configured to use a DNS
server provided by the VPN, and/but that b) it tries to answer a non
local query like www.openbsd.org locally.

| root@ornery:~# kill -USR1 4035; tail /var/log/syslog | grep dnsmasq
| Jul 18 09:29:22 ornery dnsmasq[4035]: time 1468830562
| Jul 18 09:29:22 ornery dnsmasq[4035]: cache size 0, 0/0 cache insertions re-used unexpired cache entries.
| Jul 18 09:29:22 ornery dnsmasq[4035]: queries forwarded 1880976, queries answered locally 375041
| Jul 18 09:29:22 ornery dnsmasq[4035]: queries for authoritative zones 0
| Jul 18 09:29:22 ornery dnsmasq[4035]: server 10.172.192.1#53: queries sent 792, retried or failed 0
| root@ornery:~# dig +short @127.0.1.1 www.openbsd.org
| root@ornery:~# kill -USR1 4035; tail /var/log/syslog | grep dnsmasq
| Jul 18 09:29:22 ornery dnsmasq[4035]: queries for authoritative zones 0
| Jul 18 09:29:22 ornery dnsmasq[4035]: server 10.172.192.1#53: queries sent 792, retried or failed 0
| Jul 18 09:29:37 ornery dnsmasq[4035]: time 1468830577
| Jul 18 09:29:37 ornery dnsmasq[4035]: cache size 0, 0/0 cache insertions re-used unexpired cache entries.
| Jul 18 09:29:37 ornery dnsmasq[4035]: queries forwarded 1880976, queries answered locally 375042
| Jul 18 09:29:37 ornery dnsmasq[4035]: queries for authoritative zones 0
| Jul 18 09:29:37 ornery dnsmasq[4035]: server 10.172.192.1#53: queries sent 792, retried or failed 0
| root@ornery:~#

This is on Ubuntu 16.04, with the following packages:

| james@ornery:~$ COLUMNS=200 dpkg -l dnsmasq-base network-manager network-manager-openvpn | grep ^ii
| ii dnsmasq-base 2.75-1ubuntu0.16.04.1 amd64 Small caching DNS proxy and DHCP/TFTP server
| ii network-manager 1.2.0-0ubuntu0.16.04.2 amd64 network management framework (daemon and userspace tools)
| ii network-manager-openvpn 1.1.93-1ubuntu1 amd64 network management framework (OpenVPN plugin core)
| james@ornery:~$

Tags: xenial
Revision history for this message
Stéphane Graber (stgraber) wrote :

Could you include what gets written to your syslog while the VPN connection is established?

SIGUSR1 to dnsmasq tells you what server it's talking to, but unfortunately not for what domain...

Based on your described symptoms, I'm suspecting that the openvpn plugin told NM to configure dnsmasq only for the domains pushed by the VPN server rather than for all domains (as it should when you route everything over the VPN).

If that's what happened, you should see log lines like:
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain stgraber.net
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 16.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 17.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 18.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 19.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 22.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.30#53 for domain 56.149.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain stgraber.net
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 16.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 17.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 18.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 19.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 22.172.in-addr.arpa
Jul 18 22:14:06 castiana dnsmasq[9394]: using nameserver 172.16.20.31#53 for domain 56.149.in-addr.arpa

And resolution of any of record that's part of one of those domains would succeed.

Revision history for this message
James Troup (elmo) wrote :
Revision history for this message
James Troup (elmo) wrote :

Good guess; that's exactly right.

| james@ornery:~$ dig +short @127.0.1.1 osmium-host.ppa
| 10.222.37.176
| james@ornery:~$ dig +short @127.0.1.1 www.openbsd.org
| james@ornery:~$

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I just ran into this on a new 16.04 laptop as well.

tags: added: xenial
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I was pointed at this bug by Stéphane, and looked again with some extra logging patched in. I'm under the impression that NM is doing exactly as it's told, which also means dnsmasq will do the same: it's simply not configuring a "global" nameserver to go with the per-domain ones.

From what I can tell after careful testing with the debug logs enabled and watching what NM and dnsmasq say to each other, it looks like this failure scenario happens when you configure the IPv4 settings to "Use this connection only for the resources on its network" (ie. split-tunnelling), but don't enable the same option for the IPv6 settings.

That state appears to confuse NM into thinking it shouldn't set the "global" DNS because one of the connections is meant to take the default gateway.

I'm still looking at the code to figure out how best to make this work as expected, but I think in the meantime a good workaround would be to mirror the split-tunnelling option in IPv4 and IPv6 settings (the checkbox "Use this connection..."). You may then put IPv6 back to "Ignore" or leave it as-is, since if there are no IPv6 addresses given by the VPN this will simply be ignored.

Revision history for this message
Stéphane Graber (stgraber) wrote :

So I just ran into the exact same problem with my personal VPN when I do have both IPv4 and IPv6 configured to route all traffic over the VPN.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Please provide any extra information you can to reproduce and debug this issue. I can't reproduce it. If both IPv4 and IPv6 are set to not take the default route, things are behaving correctly here; just like the DNS settings are correctly configured when no split-tunnelling is in use at all.

Also note that this will not work if dns=dnsmasq isn't set; this is important as some versions of NetworkManager have disabled that feature.

From my logs; calling on to kill -USR1 dnsmasq before doing any tests (5 sent to local (192.168.0.1), 0 to the VPN (10.x.x.1) DNS):

ug 19 14:04:30 demeter NetworkManager[2922]: <debug> [1471629870.6888] dnsmasq[0x5580dd157040]: dnsmasq update successful
Aug 19 14:05:16 demeter dnsmasq[5238]: horodatage 1471629916
Aug 19 14:05:16 demeter dnsmasq[5238]: taille de cache 0, 0/0 insertions dans le cache entrées non-expirées réutilisées
Aug 19 14:05:16 demeter dnsmasq[5238]: requêtes transmises 40322, requêtes résolues localement 448
Aug 19 14:05:16 demeter dnsmasq[5238]: queries for authoritative zones 0
Aug 19 14:05:16 demeter dnsmasq[5238]: serveur 192.168.0.1#53: requêtes envoyées 5, requêtes réessayées ou échouées 0
Aug 19 14:05:16 demeter dnsmasq[5238]: serveur 10.x.x.1#53: requêtes envoyées 0, requêtes réessayées ou échouées 0
Aug 19 14:05:49 demeter dnsmasq[5238]: horodatage 1471629949
Aug 19 14:05:49 demeter dnsmasq[5238]: taille de cache 0, 0/0 insertions dans le cache entrées non-expirées réutilisées
Aug 19 14:05:49 demeter dnsmasq[5238]: requêtes transmises 40324, requêtes résolues localement 448
Aug 19 14:05:49 demeter dnsmasq[5238]: queries for authoritative zones 0
Aug 19 14:05:49 demeter dnsmasq[5238]: serveur 192.168.0.1#53: requêtes envoyées 7, requêtes réessayées ou échouées 0
Aug 19 14:05:49 demeter dnsmasq[5238]: serveur 10.x.x.1#53: requêtes envoyées 0, requêtes réessayées ou échouées 0
Aug 19 14:06:06 demeter dnsmasq[5238]: horodatage 1471629966
Aug 19 14:06:06 demeter dnsmasq[5238]: taille de cache 0, 0/0 insertions dans le cache entrées non-expirées réutilisées
Aug 19 14:06:06 demeter dnsmasq[5238]: requêtes transmises 40325, requêtes résolues localement 448
Aug 19 14:06:06 demeter dnsmasq[5238]: queries for authoritative zones 0
Aug 19 14:06:06 demeter dnsmasq[5238]: serveur 192.168.0.1#53: requêtes envoyées 7, requêtes réessayées ou échouées 0
Aug 19 14:06:06 demeter dnsmasq[5238]: serveur 10.x.x.1#53: requêtes envoyées 1, requêtes réessayées ou échouées 0

Followed by tries to resolve www.google.com (local), www.canonical.com (local), at which point you reached 7/0; then lcy01.buildd (intended for the VPN), which brings the status up to 7/1. The only one that went to the VPN was the request for lcy01.buildd; and it was rejected NXDOMAIN (and didn't go to the local DNS at all). Everything happened as intended.

Revision history for this message
James Troup (elmo) wrote :

I can still reproduce this and I've double checked that my IPv4 and v6
settings are identical in terms of both the 'Method' filed (set to
'Automatic VPN' for both) and that both are set to accept all routes
from the VPN server.

Logs are here: https://pastebin.canonical.com/164434/

Revision history for this message
James Troup (elmo) wrote :

cyphermox asked me for receipts!

  http://people.canonical.com/~james/nm-settings/

Revision history for this message
Lee Trager (ltrager) wrote :

I'm running into the same issue. My network doesn't have IPv6 although its configured to try, turning off IPv6 had no effect.

If I direct all traffic through the VPN ('Use this connection only for resources on its network' in the routes window is left unchecked) I get a DNS server but its not used by default

$ dig @127.0.1.1 +short chaos txt servers.bind
"10.172.64.1#53 12 0"
$ dig google.com +short
# No result returned
$ dig google.com +short @10.172.64.1
172.217.4.174

If I only direct VPN traffic for resources on the VPN network('Use this connection only for resources on its network' in the routes window is checked) on BOTH IPv4 and IPv6 I get two DNS servers and DNS seems to work.

$ dig @127.0.1.1 +short chaos txt servers.bind
"192.168.1.1#53 6 0" "10.172.64.1#53 0 0"
$ dig google.com +short
216.58.216.174

So it seems network manager is adding the VPN DNS server but its not using it.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu Xenial):
status: New → Confirmed
Changed in network-manager (Ubuntu):
status: New → Confirmed
Revision history for this message
Stuart Bishop (stub) wrote :

I see this with IPv6 disabled completely on the laptop:

$ cat /etc/sysctl.d/99-noipv6.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

I can confirm that both ipv4 and ipv6 settings have both 'Use this connection only for resources on its network' and 'Ignore automatically obtained routes' disabled, and when I try connecting the ipv6 method is 'ignore'.

Changed in network-manager (Ubuntu Xenial):
importance: Undecided → High
Changed in network-manager (Ubuntu):
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.