postfix name lookup failed after dist-upgrade (Aug-2018)

Bug #1787739 reported by Mike Dotson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Internal DNS cache configured on LXD system:

Upon further investigation, nslookup is having similar issues:
18.04 LTS system:

# nslookup oracle.com - 192.168.0.30
Server: 192.168.0.30
Address: 192.168.0.30#53

Non-authoritative answer:
Name: oracle.com
Address: 137.254.120.50
** server can't find oracle.com: SERVFAIL

# nslookup www.oracle.com - 192.168.0.30
Server: 192.168.0.30
Address: 192.168.0.30#53

Non-authoritative answer:
www.oracle.com canonical name = ds-www.oracle.com.edgekey.net.
ds-www.oracle.com.edgekey.net canonical name = e870.dscx.akamaiedge.net.
Name: e870.dscx.akamaiedge.net
Address: 23.62.67.62

Notice SERVFAIL on first lookup. However on older Ubuntu system (16.10) pointing to the same DNS server

# nslookup oracle.com - 192.168.0.30
Server: 192.168.0.30
Address: 192.168.0.30#53

Non-authoritative answer:
Name: oracle.com
Address: 137.254.120.50

# nslookup www.oracle.com - 192.168.0.30
Server: 192.168.0.30
Address: 192.168.0.30#53

Non-authoritative answer:
www.oracle.com canonical name = ds-www.oracle.com.edgekey.net.
ds-www.oracle.com.edgekey.net canonical name = e870.dscx.akamaiedge.net.
Name: e870.dscx.akamaiedge.net
Address: 23.62.67.62

Most lookup requests will end in the SERVFAIL but retrieve the correct address.

# nslookup www.ubuntu.com - 192.168.0.30
Server: 192.168.0.30
Address: 192.168.0.30#53

Non-authoritative answer:
Name: www.ubuntu.com
Address: 91.189.89.110
** server can't find www.ubuntu.com: SERVFAIL

Internal systems look up without any issues.

In order to resolve postfix, I had to create an entry in the DNS server with the IP address of the external mail server.

This was working until I did a apt update; apt dist-upgrade -y; around the 15th of August, 2018

Pointing to external DNS resolver does not have the issue:

# nslookup www.ubuntu.com - 1.1.1.1
Server: 1.1.1.1
Address: 1.1.1.1#53

Non-authoritative answer:
Name: www.ubuntu.com
Address: 91.189.89.103

So there seems to be some incompatibility between bind9 package/server and the dns library/tools on 18.04.

Can replicate this in virtualbox with the ubuntu-18.04-desktop-amd64.iso live image pointing to the same DNS server.

# lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04

# apt show bind9
Package: bind9
Version: 1:9.11.3+dfsg-1ubuntu1.1

# apt show dnsutils
Package: dnsutils
Version: 1:9.11.3+dfsg-1ubuntu1.1

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: dnsutils 1:9.11.3+dfsg-1ubuntu1.1
ProcVersionSignature: Ubuntu 4.15.0-32.35-generic 4.15.18
Uname: Linux 4.15.0-32-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Sat Aug 18 09:43:02 2018
InstallationDate: Installed on 2018-07-30 (19 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: bind9
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Mike Dotson (mgdotson) wrote :
Revision history for this message
Mike Dotson (mgdotson) wrote :

Attaching Vagrant file to duplicate issue

Changed in bind9 (Ubuntu):
status: New → Incomplete
Revision history for this message
Mike Dotson (mgdotson) wrote :

Why is this incomplete? What additional information is needed?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Oh no, my big comment was lost?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I worked 1h on this yesterday :/

Revision history for this message
Mike Dotson (mgdotson) wrote :

Unfortunately not seeing a comment. I did include a vagrant file that should reproduce the issue. I found a work around for postfix but the nslookup issue is still there.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, let's try again.

I have named.conf.options set like this in a bionic lxd container:
options {
 directory "/var/cache/bind";
 forwarders {
  1.1.1.1;
 };
 dnssec-validation auto;
 auth-nxdomain no; # conform to RFC1035
        listen-on { 10.0.100.137; };
};

10.0.100.137 is the container's eth0 address.

This works all the time:
ubuntu@bionic-bind9:~$ nslookup ubuntu.com - 10.0.100.137
Server: 10.0.100.137
Address: 10.0.100.137#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40

Same with dnssec set to false.

Can you try with dig perhaps? I don't know how to enable debugging in nslookup (-deb or -d2 didn't change anything here).

Something like:
dig @127.0.0.1 +trace ubuntu.com

host also has some debugging available:
host -d ubuntu.com 127.0.0.1 <-- or the actual ip where bind is listening

I wonder if packets are getting truncated somehow, as you got an answer besides having a status of failure.

Revision history for this message
Mike Dotson (mgdotson) wrote :
Download full text (7.3 KiB)

First, I want to apologize, the Vagrant file I uploaded was apparently the incorrect one. I'm attaching the version I'm testing with. I actually found this with my internal server running as an LXD container.

With my options file set to the following (192.168.0.130 eth0 address):
options {
 directory "/var/cache/bind";

 // If there is a firewall between you and nameservers you want
 // to talk to, you may need to fix the firewall to allow multiple
 // ports to talk. See http://www.kb.cert.org/vuls/id/800113

 // If your ISP provided one or more IP addresses for stable
 // nameservers, you probably want to use them as forwarders.
 // Uncomment the following block, and insert the addresses replacing
 // the all-0's placeholder.

 forwarders {
   1.1.1.1;
 };

 //======================================================================
==
 // If BIND logs error messages about the root key being expired,
 // you will need to update your keys. See https://www.isc.org/bind-keys
 //======================================================================
==
 dnssec-validation false;

 auth-nxdomain no; # conform to RFC1035
 listen-on-v6 { any; };
 listen-on { 192.168.0.130; };
};

vagrant@ubuntu-bionic:/etc/bind$ nslookup ubuntu.com - 192.168.0.130
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40
** server can't find ubuntu.com: SERVFAIL

vagrant@ubuntu-bionic:/etc/bind$ dig @192.168.0.130 +trace ubuntu.com

; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> @192.168.0.130 +trace ubuntu.com
; (1 server found)
;; global options: +cmd
. 3600000 IN NS A.ROOT-SERVERS.NET.
. 3600000 IN NS E.ROOT-SERVERS.NET.
. 3600000 IN NS L.ROOT-SERVERS.NET.
. 3600000 IN NS D.ROOT-SERVERS.NET.
. 3600000 IN NS G.ROOT-SERVERS.NET.
. 3600000 IN NS F.ROOT-SERVERS.NET.
. 3600000 IN NS J.ROOT-SERVERS.NET.
. 3600000 IN NS B.ROOT-SERVERS.NET.
. 3600000 IN NS K.ROOT-SERVERS.NET.
. 3600000 IN NS I.ROOT-SERVERS.NET.
. 3600000 IN NS H.ROOT-SERVERS.NET.
. 3600000 IN NS M.ROOT-SERVERS.NET.
. 3600000 IN NS C.ROOT-SERVERS.NET.
;; Received 343 bytes from 192.168.0.130#53(192.168.0.130) in 0 ms

;; expected opt record in response
ubuntu.com. 599 IN A 91.189.94.40
. 3574 IN NS c.root-servers.net.
. 3574 IN NS d.root-servers.net.
. 3574 IN NS e.root-servers.net.
. 3574 IN NS f.root-servers.net.
. 3574 IN NS g.root-servers.net.
. 3574 IN NS h.root-servers.net.
. 3574 IN NS i.root-servers.net.
. 3574 IN NS a.root-servers.net.
. 3574 IN NS j.root-servers.net.
. 3574 IN NS k.root-servers.net.
. 3574 IN NS l.root-servers.net.
. 3574 IN NS m.root-servers.net.
. 3574 IN NS b.root-servers.net.
;; Received 271 bytes from 199.9.14.201#53(B.ROOT-SERVERS.NET) in 61 ms

vagrant@ubuntu-bionic:/etc/bind$ host -d ubuntu.com 192.168.0.130
Trying "ubuntu.com"
Using domain server:
Name: 192.168.0.130
Address: 192.168.0.130#53
Aliases:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30799
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ubuntu.com. IN A

;; ANSWER SECTION:
ubuntu.com. 445 IN A 91.189.94.40

Received 44 bytes from 192.168.0.13...

Read more...

Revision history for this message
Mike Dotson (mgdotson) wrote :

Fixed ip addresses in vagrant file

Revision history for this message
Mike Dotson (mgdotson) wrote :

Was the previous debug information helpful or will you need additional information?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'll give the vagrant image a try. It's still odd, though.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I tried the vagrant file a try, and this is the last bit of output:
(...)
    default: Created symlink /etc/systemd/system/multi-user.target.wants/bind9.service → /lib/systemd/system/bind9.service.
    default: bind9-pkcs11.service is a disabled or a static unit, not starting it.
    default: bind9-resolvconf.service is a disabled or a static unit, not starting it.
    default: Processing triggers for ureadahead (0.100.0-20) ...
    default: Processing triggers for systemd (237-3ubuntu10.3) ...
    default: Processing triggers for ufw (0.35-5) ...
    default: Fails to look up host with dnssec-validation auto
    default: Server: 192.168.0.130
    default: Address: 192.168.0.130#53
    default:
    default: Non-authoritative answer:
    default: Name: ubuntu.com
    default: Address: 91.189.94.40
    default: Fails to get entire entry with dnssec-validation false
    default: Server: 192.168.0.130
    default: Address: 192.168.0.130#53
    default:
    default: Non-authoritative answer:
    default: Name: ubuntu.com
    default: Address: 91.189.94.40

I don't see errors from nslookup.

I tried this on a cosmic laptop, with vagrant and virtualbox as shipped in cosmic, but the VM was bionic as specified. It only has a wifi connection, so I gave it the wifi nic name when vagrant asked me which interface it should bridge with (if that matters).

Perhaps you should do a network packet capture, see if packets are being truncated somehow. If using tcpdump, be sure to specify a large size with -s, or just "-s 0" which means the whole packet iirc. And then compare it with another packet capture with the previous version of bind to see what's the difference.

A tcpdump command line to start with could be:
tcpdump -i any -s 0 -w dns.pcap port 53

You could perhaps restrict the interface a bit, instead of "any".

Revision history for this message
Mike Dotson (mgdotson) wrote :

I'll see if I can get a chance to do some packet captures later this week, however, I did notice something that may be relevant.

The domains that seem to fail do not have IPV6 look addresses. The domains that succeed have IPV6 address returns:

vagrant@ubuntu-bionic:~$ nslookup google.com
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: google.com
Address: 172.217.12.14
Name: google.com
Address: 2607:f8b0:400f:805::200e

vagrant@ubuntu-bionic:~$ nslookup yahoo.com
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: yahoo.com
Address: 98.138.219.231
Name: yahoo.com
Address: 98.137.246.7
Name: yahoo.com
Address: 98.137.246.8
Name: yahoo.com
Address: 72.30.35.9
Name: yahoo.com
Address: 72.30.35.10
Name: yahoo.com
Address: 98.138.219.232
Name: yahoo.com
Address: 2001:4998:44:41d::4
Name: yahoo.com
Address: 2001:4998:58:1836::11
Name: yahoo.com
Address: 2001:4998:c:1023::4
Name: yahoo.com
Address: 2001:4998:c:1023::5
Name: yahoo.com
Address: 2001:4998:44:41d::3
Name: yahoo.com
Address: 2001:4998:58:1836::10

However, the servers that fail are not returning IPV6 information:
vagrant@ubuntu-bionic:~$ nslookup ubuntu.com
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40
** server can't find ubuntu.com: SERVFAIL

vagrant@ubuntu-bionic:~$ nslookup oracle.com
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: oracle.com
Address: 137.254.120.50
** server can't find oracle.com: SERVFAIL

vagrant@ubuntu-bionic:~$ nslookup amazon.com
Server: 127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: amazon.com
Address: 176.32.103.205
Name: amazon.com
Address: 176.32.98.166
Name: amazon.com
Address: 205.251.242.103
** server can't find amazon.com: SERVFAIL

There's also a pause after the last "Address" output line and the "** server" line, where the IPV6 address would be.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Note that in this last set of examples you were using the systemd resolver (127.0.0.53). Which might be setup (or not) to query the bind server you are testing.

I suggest to always start from a clean slate in that case. You can clear the systemd resolver cache by running:

sudo systemd-resolve --flush-caches

Another interesting check, if you keep using that resolver, is systemd-resolve --status

Revision history for this message
Mike Dotson (mgdotson) wrote :

Same results using the vagrant vm configured bind server:

vagrant@ubuntu-bionic:~$ nslookup ubuntu.com - 192.168.0.130
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40
** server can't find ubuntu.com: SERVFAIL

vagrant@ubuntu-bionic:~$ nslookup amazon.com - 192.168.0.130
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: amazon.com
Address: 176.32.103.205
Name: amazon.com
Address: 205.251.242.103
Name: amazon.com
Address: 176.32.98.166
** server can't find amazon.com: SERVFAIL

vagrant@ubuntu-bionic:~$ nslookup google.com - 192.168.0.130
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: google.com
Address: 172.217.3.14
Name: google.com
Address: 2607:f8b0:400f:801::200e

vagrant@ubuntu-bionic:~$

Pauses are the same upon the first nslookup for the domain. After the first lookup, the entry is cached and there isn't a pause between the ipv4 and ipv6 entries.

You can see the pause in the strace output (attached):
vagrant@ubuntu-bionic:~$ strace -ftt nslookup cononical.com - 192.168.0.130 2> strace.out

[pid 1836] 18:39:56.769691 futex(0x7f5ae18c30c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1838] 18:39:57.576985 <... epoll_wait resumed> [{EPOLLIN, {u32=20, u64=20}}], 64, -1) = 1

Also doing some playing with nslookup and IPV6 settings:

vagrant@ubuntu-bionic:~$ nslookup
> server 192.168.0.130
Default server: 192.168.0.130
Address: 192.168.0.130#53
> ubuntu.com
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40
** server can't find ubuntu.com: SERVFAIL
> set querytype=a
> ubuntu.com
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: ubuntu.com
Address: 91.189.94.40
> set querytype=aaaa
> ubuntu.com
Server: 192.168.0.130
Address: 192.168.0.130#53

** server can't find ubuntu.com: SERVFAIL
> google.com
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: google.com
Address: 2607:f8b0:400f:800::200e

So definitely something going on with the IPV6. In your configuration, do you get IPV6 records for google.com?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> So definitely something going on with the IPV6. In your configuration,
> do you get IPV6 records for google.com?

I do:

andreas@nsn7:~/vagrant$ ssh -i .vagrant/machines/default/virtualbox/private_key vagrant@localhost -p 2222
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-36-generic x86_64)
...
vagrant@ubuntu-bionic:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
 ...
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:0d:76:ba:5b:1c brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 86162sec preferred_lft 86162sec
    inet6 fe80::d:76ff:feba:5b1c/64 scope link
       valid_lft forever preferred_lft forever
4: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:95:cf:d7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.130/24 brd 192.168.0.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 2804:7f4:xxxx:xxxx:xxx:xxxx:xxxx:xxxx/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 2591766sec preferred_lft 604566sec
    inet6 fe80::a00:27ff:fe95:cfd7/64 scope link
       valid_lft forever preferred_lft forever

vagrant@ubuntu-bionic:~$ host google.com
google.com has address 172.217.29.14
google.com has IPv6 address 2800:3f0:4001:803::200e
google.com mail is handled by 20 alt1.aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.
google.com mail is handled by 50 alt4.aspmx.l.google.com.

Also when asking bind9 directly:
vagrant@ubuntu-bionic:~$ host google.com 192.168.0.130
Using domain server:
Name: 192.168.0.130
Address: 192.168.0.130#53
Aliases:

google.com has address 172.217.29.142
google.com has IPv6 address 2800:3f0:4001:80f::200e
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.
google.com mail is handled by 50 alt4.aspmx.l.google.com.
google.com mail is handled by 20 alt1.aspmx.l.google.com.

vagrant@ubuntu-bionic:~$ dig @192.168.0.130 google.com -t aaaa +short
2800:3f0:4001:80f::200e

etc

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

vagrant@ubuntu-bionic:~$ nslookup google.com - 192.168.0.130
Server: 192.168.0.130
Address: 192.168.0.130#53

Non-authoritative answer:
Name: google.com
Address: 172.217.29.142
Name: google.com
Address: 2800:3f0:4001:80f::200e

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for bind9 (Ubuntu) because there has been no activity for 60 days.]

Changed in bind9 (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.