resolver failures without even sending queries, break Postfix

Bug #777855 reported by Matthias Andree
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GLibC
Fix Released
Critical
eglibc (Ubuntu)
Fix Released
Undecided
Unassigned
Lucid
Fix Released
Undecided
Unassigned
Maverick
Invalid
Undecided
Unassigned
Natty
Invalid
Undecided
Unassigned
glibc (openSUSE)
Fix Released
High
postfix (Ubuntu)
Fix Released
High
Unassigned
Lucid
Fix Released
Undecided
Unassigned
Maverick
Invalid
Undecided
Unassigned
Natty
Invalid
Undecided
Unassigned

Bug Description

The eglibc resolver is broken and doesn't attempt DNS queries for hostnames without dots if the RES_DEFNAMES option gets stripped from the _res.options (resolver options).

This breaks security-sensitive applications (I'd first observed it with Postfix) trying to resolve, for instance, localhost, thus:

res_init();
_res.options &= ~RES_DEFNAMES;
int result = res_search("localhost", C_IN, T_A, buf, sizeof buffer);

returns failure with HOST_NOT_FOUND even if the name server has a localhost zone. FreeBSD and Solaris don't have this bug.

I've reported this upstream as http://sourceware.org/bugzilla/show_bug.cgi?id=12734 where you'll find more details.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: libc6 2.13-0ubuntu13
ProcVersionSignature: Ubuntu 2.6.38-9.43-generic 2.6.38.4
Uname: Linux 2.6.38-9-generic x86_64
NonfreeKernelModules: fglrx
Architecture: amd64
Date: Thu May 5 15:55:40 2011
ProcEnviron:
 LANGUAGE=de:en
 PATH=(custom, no user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: eglibc
UpgradeStatus: Upgraded to natty on 2011-05-02 (3 days ago)

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :

Created attachment 5707
code to demonstrate the bug

(I've observed this on eglibc 2.13 and glibc 2.11.3 and confirmed it's still present in Git.)

Problem: res_search() can return -1 with h_errno == HOST_NOT_FOUND without ever having attempted a nameserver query even when it should have sent one.

In particular, this affects hostname resolution of "localhost" (without dots) if RES_DEFNAMES isn't set. (Use case: a security-sensitive application strips this flag to avoid the domain search and to avoid getting bogus localhost.example.org results that might not point to 127.0.0.1/::1.)

Pseudo code, without error checking:

res_init();
_res.options &= ~RES_DEFNAMES;
int result = res_search("localhost", C_IN, T_A, buf, sizeof buffer);

This is an important portability issue from BSD or Solaris to Linux and affects, for instance, Postfix 2.8.X.

Compare the glibc source code lines 323 ff. <http://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/res_query.c;h=5ff352e2fc6056bad92238df1fb0c826f48a2f51;hb=HEAD#l323> against FreeBSD, lines 371 ff. in <http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/resolv/res_query.c?annotate=1.6;only_with_tag=MAIN>.

I've attached a test program, show-resolv.c, to demonstrate the problem.

To compile: gcc -ggdb3 -O -std=gnu99 -pedantic -Wall -o show-resolv show-resolv.c -lresolv

To run: strace -e recv,send,recvfrom,sendto ./show-resolv

You will see that no DNS packets are sent to the nameserver configured in /etc/resolv.conf.

Actual output (no send/recv stuff!):

$ strace -e recv,send,recvfrom,sendto ./show-resolv
default _res.options = 802C1
stripped _res.options = 80241
res search result: -1, h_errno: 1 (Unknown host)

Expected output:

$ strace -e recv,send,recvfrom,sendto ./show-resolv
default _res.options = 802C1
stripped _res.options = 80241
sendto(3, "\34\264\1\0\0\1\0\0\0\0\0\0\tlocalhost\0\0\1\0\1", 27, MSG_NOSIGNAL, NULL, 0) = 27
recvfrom(3, "\34\264\205\200\0\1\0\1\0\0\0\0\tlocalhost\0\0\1\0\1\300\f\0\1\0"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.4")}, [16]) = 43
res search result: 43

Of course the recvfrom details may differ with /etc/resolv.conf configuration.
And instead of 43, any positive number that makes it plausible we've received a successful reply to a DNS query for localhost IN A is valid, should there be gratuitious other records returned from the name server.

Please fix the resolver so that it actually sends a query for bare hostnames (without any dots, inner or trailing), localhost is a valid TLD.

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

There is a massive upstream resolver error that causes "HOST_NOT_FOUND" errors without every trying to resolve a valid host.

res_init();
_res.options &= ~RES_DEFNAMES;
int result = res_search("localhost", C_IN, T_A, buf, sizeof buffer);

Upstream report: http://sourceware.org/bugzilla/show_bug.cgi?id=12734

Flagging major because in combination with Postfix 2.8 this can cause legit mail to be rejected (i. e. somewhat of a "data loss", probably not warranting critical)

Reproducible: Always

Steps to Reproduce:
see upstream report, has a test program and instructions

Revision history for this message
Matthias Andree (matthias-andree) wrote :
Changed in glibc:
importance: Undecided → Unknown
status: New → Unknown
Changed in eglibc (Ubuntu):
status: New → Confirmed
status: Confirmed → New
Revision history for this message
Matthias Andree (matthias-andree) wrote :

Please forward this upstream to Debian.

Revision history for this message
In , Petr Baudis (pasky) wrote :

I'm looking at

 479 /*
 480 * If the name has any dots at all, and no earlier 'as-is' query
 481 * for the name, and "." is not on the search list, then try an as-is
 482 * query now.
 483 */
 484 if (dots && !(tried_as_is || root_on_list)) {

I wonder why the dots check is there?

(However, I also wonder, if you want to ensure no search, wouldn't it be much more natural to use a FQDN rather than a one-off disabling of RES_DEFNAMES?)

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :

I haven't checked where the source or the dots check originated.

Using a "fully-qualified domain name" would sidestep the bug but let's not introduce workarounds if we can fix the bug and I contend that localhost by itself arguably already is a FQDN.

I have not found Internet standards prohibiting single-level domain names (unlikely though they may be), but I have found RFC 1912 and RFC 2606 that sanction localhost.

Revision history for this message
In , Petr Baudis (pasky) wrote :

FQDN has a clear definition - it ends with a dot. Otherwise, it may be by default subject to various relative searches.

(Not disputing that there is a bug, I just wanted to clarify this.)

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :

Let's not go hair splitting about FQDN or not: suppressing the relative searches along the "search" list from /etc/resolv.conf is what this is all about. And if it were working, the difference between "localhost" and "localhost." were entirely theoretical, because either way we'd look up "localhost" in the root (".") zone. :-)

tags: added: regression-release
Changed in glibc (openSUSE):
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , Drepper-fsp (drepper-fsp) wrote :

Pointing to any *BSD is irrelevant. Code like that is in BIND, too, though and that I backported as far as necessary.

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :

(In reply to comment #5)
> Pointing to any *BSD is irrelevant. Code like that is in BIND, too, though and
> that I backported as far as necessary.

In order to assist distributors with cherry-picking the fix:
Is the patch in <http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f87dfb1f11c01f2ccdc40d81e134cd06b32e28e8> all that's needed?

Revision history for this message
Matthias Andree (matthias-andree) wrote :

Fix appears to be in http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f87dfb1f11c01f2ccdc40d81e134cd06b32e28e8 and slated to appear in glibc 2.14. Not sure about eglibc.

Revision history for this message
In , Matthias Andree (matthias-andree) wrote :
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Marking this as confirmed/high per discussion and investigation in bug #777868, which has been marked as a dupe of this one.

Changed in postfix (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in glibc:
importance: Unknown → Critical
status: Unknown → Fix Released
Changed in glibc (openSUSE):
status: Confirmed → In Progress
Changed in eglibc (Ubuntu Natty):
status: New → Confirmed
Changed in postfix (Ubuntu Natty):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Matthias, I ran your test program on Natty, and its definitely present there. I also ran it on Oneiric, with libc6 version 2.13-9ubuntu3 , and the "expected" result (sending the query to the dns server) happens. So I believe this has been fixed, though I cannot point to the exact changelog entry that has done it.

I think we can also close the task on Postfix, since this is a glibc issue, unless there is something we can do to postfix to fix this.

Changed in eglibc (Ubuntu):
status: New → Fix Released
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Also confirmed on lucid

Changed in eglibc (Ubuntu Lucid):
status: New → Confirmed
Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 777855] Re: resolver failures without even sending queries, break Postfix

Am 06.08.2011 17:05, schrieb Clint Byrum:

> I think we can also close the task on Postfix, since this is a glibc
> issue, unless there is something we can do to postfix to fix this.

Postfix is one of the few software packages whose default configuration
(in newer Postfix versions) triggers this bug. If you choose to fix
glibc through SRUs, then you can, of course, close the Postfix relation.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Fixed for Oneric and later via the eglibc fix.

Changed in postfix (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Matthias Andree (matthias-andree) wrote :

So what's the state of the fix for the earlier affected releases, esp. LTS servers?

Revision history for this message
dino99 (9d9) wrote :
Changed in postfix (Ubuntu Maverick):
status: New → Invalid
Changed in postfix (Ubuntu Natty):
status: Confirmed → Invalid
Changed in eglibc (Ubuntu Natty):
status: Confirmed → Invalid
Changed in eglibc (Ubuntu Maverick):
status: New → Invalid
Revision history for this message
dino99 (9d9) wrote :

upstream should have been synced

Changed in eglibc (Ubuntu Lucid):
status: Confirmed → Fix Released
Changed in postfix (Ubuntu Lucid):
status: New → Fix Released
Revision history for this message
In , Schwab-5 (schwab-5) wrote :

openSUSE 11.4 is no longer supported and the bug was fixed in openSUSE 12.1.

Changed in glibc (openSUSE):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.