gethostbyname() cant resolve names starting/ending with "-"

Bug #144431 reported by Alexey Ten (Lynn)
88
This bug affects 12 people
Affects Status Importance Assigned to Milestone
GLibC
Confirmed
Medium
glibc (Ubuntu)
Invalid
Low
Unassigned

Bug Description

This is copy from http://sourceware.org/bugzilla/show_bug.cgi?id=4671

gethostbyname() fails to resolve domain names with hyphen sign (-) at beginning or end of domain name,
in example "-kol.deviantart.com", while it can be resolved using host and nslookup.

example session:

lynn@wastebin:~$ ping -- -kol.deviantart.com
ping: unknown host -kol.deviantart.com

lynn@wastebin:~$ host -- -kol.deviantart.com
-kol.deviantart.com has address 198.172.81.21

I know that RFC3696 says that this hostname is invalid, but 123.deviantart.com is invalid too and it's resolved perfectly.
Also this hostname resolves in Windows and Mac OS X.

Tags: patch
Revision history for this message
In , Andrey Nikolaev (nikolaeff) wrote :

gethostbyname() fails to resolve domain names with minus sign at beginning or end of domain name,
in example -kol.deviantart.com, while it can be resolved using host and nslookup.

example sessiong:

insa@devel:~$ ping -- -kol.deviantart.com
ping: unknown host -kol.deviantart.com

insa@devel:~$ host -- -kol.deviantart.com
-kol.deviantart.com has address 69.28.181.43

Breif look at linux iputils/ping.c shows that it's using gethostbyname() function.
So i wrote C test example that can be found at http://insa.pp.ru/files/bugs/gethost.c

Tested on Debian 3.1, Debian 4, FreeBSD 5.4. All i386.
Mac OS X not affected.

Revision history for this message
In , Jakub Jelinek (jakub-redhat) wrote :

Such hostnames are invalid, see section 2 of RFC3696.
For hostnames, hyphen can be only in the middle, not at the start of at the end.

Revision history for this message
In , Andrey Nikolaev (nikolaeff) wrote :

(In reply to comment #1)
> Such hostnames are invalid, see section 2 of RFC3696.
> For hostnames, hyphen can be only in the middle, not at the start of at the end.
>

Well, it's true, but
1) hostnames starting with numeric value are also not valid, but can be resolved via gethostbyname()
(i.e. ping 12345.livejournal.com);

2) We getting odd behaviour on various system. Even worse - on same machine using different tools
(nslookup vs. ping).

Revision history for this message
In , Drepper-fsp (drepper-fsp) wrote :

I looked at this and saw that not even the latest bind version allows - at the
beginning. If anybody allows it this is likely a side effect of not using the
bind code base. I see no reason to diverge here.

Plus, this could have unwanted effects. If somebody makes a mistake when
specifying a host name a parameter might be mistaken for it. This might even be
exploitable.

So, no, this won't change.

Revision history for this message
Ian Jackson (ijackson) wrote :

RFC3696 s2 does not say that all-numeric labels are not permitted, only that all-numeric TLD's are not. Even if it did, the counterfactual that some invalid names are accepted would be irrelevant

gethostbyname is operating entirely correctly.

Changed in glibc:
status: New → Invalid
Revision history for this message
Alexey Ten (Lynn) (alexeyten) wrote :

RFC1035 section 2.3.1 says that labels start with letters only. So gethostbyname should not accept 123.deviantart.com, but it do.

Revision history for this message
Ian Jackson (ijackson) wrote : Re: [Bug 144431] Re: gethostbyname() cant resolve names starting/ending with "-"

Lynn writes ("[Bug 144431] Re: gethostbyname() cant resolve names starting/ending with "-""):
> RFC1035 section 2.3.1 says that labels start with letters only. So
> gethostbyname should not accept 123.deviantart.com, but it do.

This rule was relaxed a little while later. You may be aware of a
certain very prominent networking company with a digit at the start of
its name.

Ian.

Changed in glibc:
status: Unknown → Invalid
Revision history for this message
In , Gredhat-nospam-bug (gredhat-nospam-bug) wrote :

If gethostbyname refuses the invalid name in the below request, why does it
query the DNS (as can be seen with e.g. tcpdump)?

links http://szini-.tvn.hu/Koszonjuk.mp3

Revision history for this message
Mark J. Reed (markjreed) wrote :

This bug report is valid. The RFC does not prohibit labels that don't start with a letter; it merely recommends against them. The definition of "label" mentioned above is part of a guideline introduced with this text:

"The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET)."

Points in favor of supporting domain names that don't necessarily follow those guidelines:

1. There are actual domains out on the Internet running web sites with such domain names (several blogs at blogspot.com spring to mind)
2. Such domains resolve on other OSes (not just Windows, but also OS X).
3. Direct DNS queries (dig, host, nslookup) on Linux work fine with such names.
4. Even gethostbyname() on Linux works for such names when they're in the local /etc/hosts file. Possibly in NIS maps as well.

Point 4 is especially telling; I don't see any reason for gethostbyname() to introduce a restriction between two interfaces when that both operate correctly without the restriction. Especially not a restriction that prevents access to actual web sites. Telling users that "The owner of that site shouldn't have named it that" is not helpful.

Changed in glibc:
status: Invalid → New
Revision history for this message
Mark J. Reed (markjreed) wrote :

Oh, the submitter mentioned RFC 3696. This is from that RFC:

"Any characters, or combination of bits (as octets), are permitted in DNS names."

It then goes on to make several recommendations for restrictions - but these are, again, recommendations only.

Revision history for this message
Alexander Sack (asac) wrote :

upstream says won't fix.

Changed in glibc:
status: New → Won't Fix
Revision history for this message
In , Alexey Ten (Lynn) (alexeyten) wrote :

https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/144431

This bug report is valid. The RFC does not prohibit labels that don't start with
a letter; it merely recommends against them. The definition of "label" mentioned
above is part of a guideline introduced with this text:

"The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET)."

Points in favor of supporting domain names that don't necessarily follow those
guidelines:

1. There are actual domains out on the Internet running web sites with such
domain names (several blogs at blogspot.com spring to mind)
2. Such domains resolve on other OSes (not just Windows, but also OS X).
3. Direct DNS queries (dig, host, nslookup) on Linux work fine with such names.
4. Even gethostbyname() on Linux works for such names when they're in the local
/etc/hosts file. Possibly in NIS maps as well.

Point 4 is especially telling; I don't see any reason for gethostbyname() to
introduce a restriction between two interfaces when that both operate correctly
without the restriction. Especially not a restriction that prevents access to
actual web sites. Telling users that "The owner of that site shouldn't have
named it that" is not helpful.

Revision history for this message
Alexey Ten (Lynn) (alexeyten) wrote :

I think Mark J. Reed gives quite enough reasons to fix this bug. Either in upstream or Ubuntu.
There is no technical reason not to fix it, and I do not understand why I can't visit -kol.deviantart.com from Ubuntu while my neighbour with Mac can?

Changed in glibc:
status: Won't Fix → New
Revision history for this message
qubicllj (qubicllj-gmail) wrote :

It's ridiculous not to fix this bug

Changed in glibc:
status: Invalid → Confirmed
Revision history for this message
Tyler Szabo (szabo) wrote :

The file in question is:

resolv/res_comp.c

The function: res_hnok should act like res_dnok. I am going to write a patch and build a package with the changes as soon as I can get around to it - By next Sunday would be a good bet.

Revision history for this message
xteejx (xteejx-deactivatedaccount) wrote :

Marking Confirmed, Low - I am also seeing this on Jaunty.

Changed in glibc (Ubuntu):
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
xteejx (xteejx-deactivatedaccount) wrote :

Note to upstream reporter: You may want to give them a nudge on this one, as there haven't been any replies recently.

Revision history for this message
In , Bjorn-haxx (bjorn-haxx) wrote :

"The DNS itself places only one restriction on the particular labels that can be used to identify resource
records. That one restriction relates to the length of the label and the full name. [...] Those restrictions
aside, any binary string whatever can be used as the label of any resource record."
  -- RFC 2181, section 11

RFC3696, section 2 verifies this: "Any characters, or combination of bits (as octets), are permitted in
DNS names." Then it describes how the old ARPANET rules worked. But we moved beyond those
rules a long time ago. Just look at the international domain names.

Revision history for this message
Renmazuo (siebtzen) wrote :

In case you're unaware, it's not just names beginning with, say, a hyphen, but also with parts ending with a hyphen.
I can't access my Deviantart account (my username ends with a hyphen) on Ubuntu 10.04 because of this.
It seems to me that fixing this sort of bug contributes towards fixing bug#1: I am currently using Windows 7 and trying out Ubuntu, but if I can't do stuff like accessing my Deviantart account I'm just going to go back to that and adieu.

Revision history for this message
In , Bugs-randomguy3 (bugs-randomguy3) wrote :

(In reply to comment #6)
> "The DNS itself places only one restriction on the particular labels that can
> be used to identify resource
> records. That one restriction relates to the length of the label and the full
> name. [...] Those restrictions
> aside, any binary string whatever can be used as the label of any resource
> record."
> -- RFC 2181, section 11
>
> RFC3696, section 2 verifies this: "Any characters, or combination of bits (as
> octets), are permitted in
> DNS names." Then it describes how the old ARPANET rules worked. But we moved
> beyond those
> rules a long time ago. Just look at the international domain names.

Actually, while RFC 2181 states that there are no restrictions on DNS labels, it does not say anything about host names (not all records that can be stored in DNS are host names). In fact, it explicitly says that

"Note however, that the various applications that make use of DNS data can have restrictions imposed on what particular values are acceptable in their environment."

RFC 1123 still constitutes the accepted standard for valid host names, and this is what glibc's gethostbyname() implements. Actually, glibc implements a relaxation of RFC1123 that allows underscores anywhere RFC1123 permits hyphens, presumably to deal with errant Windows machines that like to put underscores in their names.

RFC 3696 is quite woolly on the subject of host names. It describes RFC 1123's restrictions on host names as "a preferred form that is required by most applications".

Also, international domain names are a different matter entirely, as they essentially work (as I understand it) by converting invalid host names to RFC 1123-compatible host names.

Arguing by RFC is clearly not going to get us anywhere, given the above. The best argument for this change is that there are domains that require gethostbyname() to accept hyphens (and, presumably, underscores) at the start and end of domain segments in order to be resolved. Glibc already relaxes RFC 1123's restrictions to allow underscores, so why not allow hyphens in any position as well?

The argument about mistaking domain names starting with hyphens for options is spurious, by the way. Given that these domains exist, it's perfectly reasonable that they might be passed to a tool regardless of whether or not gethostbyname() accepts them, and the tool will do option parsing before calling gethostbyname().

Changed in glibc:
importance: Unknown → Medium
Revision history for this message
Alexey Osipov (lion-simba) wrote :

Almost four years have passed since this bug reported and still not fixed. Shame.

Revision history for this message
Tyler Szabo (szabo) wrote :

Both the Glibc upstream (http://sourceware.org/bugzilla/show_bug.cgi?id=4671) and Debian (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=575209) seem to not want to fix this.

The patch in the Debian bugreport will allow hyphens; my patch is a little bit more liberal and allows underscores as well (it also cleans up the logic to make res_hnok more like res_dnok).

Tested on my system, and it seems to be working.

Revision history for this message
Joe Simpson (headbangerkenny) wrote :

Hi, can we please push for this bug to be fixed.

If Ubuntu wants to hit it's expected user target then silly bugs like this need to be fixed or people will be like this, "Lol, you can't even go on my blog http://my-awesome-blog-.tumblr.com because you're computers poo" (probably with more profanity and this does happen).

It's a ridiculously simple patch to apply, even if just Ubuntu adds it

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "2011-09-02-ubuntu-bug-144431.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
Daniel Liebner (dliebner) wrote :

Is this bug still not fixed? I'm getting this issue on 12.04.

Revision history for this message
Adam Conrad (adconrad) wrote :

This bug is just as much not a bug now as it was when it was filed. If anyone other than deviantart is advertising hostnames that start or end in hyphens, I'd be curious to know. If this is pretty much just a deviantart problem, has anyone considered talking to them about not doing that? The fact that someone does it doesn't make it valid. You can put literally ANY STRING YOU WANT in DNS, but that doesn't make it a valid host name. Period.

Changed in glibc (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Daniel Liebner (dliebner) wrote :

Tumblr also has hostnames that end in hyphens.

Regardless, if there is a publicly available web address at a given hostname, albeit with an invalid hostname, this "not a bug" prevents almost any program-level function from accessing said publicly available web addresses. For example, curl, PHP, web browsers are all affected because they utilize gethostbyname() somewhere in their process chain. Do these programs need to stop using gethostbyname() and start adopting a new method of resolving hostnames in order to access said web addresses?

Revision history for this message
Adam Conrad (adconrad) wrote :

Your argument seems to be that if there's a "publicly available web address", we should support resolving that, no matter what, right? I could set up a publicly available website at "[wow-square-brackets].sili.ca" today. Would that mean every OS that doesn't resolve square-bracket-hostnames is now buggy? Or should someone tell me to stop being silly and not do that?

You're welcome to take this argument back upstream to the bug there. If they are convinced, fine. But I won't carry a distro patch for this, and I would prefer that people who care deeply about this tell tumblr and deviantart that they're breaking spec, rather than telling me that I should care about their spec abuse.

When someone blatantly violates rfc822 or 2822 and postfix/exim/sendmail reject those emails, do you file that as a bug too?

Revision history for this message
Daniel Liebner (dliebner) wrote :

No, I don't think gethostbyname should accommodate for any crazy hostname, and I understand your argument. However, the hyphen is a special character. It *is* allowed in hostnames, just not technically as a border character. Many people have made the argument that hostnames aren't technically supposed to start with a digit, and yet gethostbyname will still resolve those addresses. Why not allow border hyphen characters for the same reason of it being practical in terms of reaching actual available web addresses?

Revision history for this message
Adam Conrad (adconrad) wrote :

Except that "hostnames with a digit" argument is incorrect, as this was explicitly allowed (with a MUST on support, no less) in RFC 1123, section 2.1. Quoting ancient RFCs when they've been superseded doesn't actually help one's case here.

Revision history for this message
Daniel Liebner (dliebner) wrote :

To be fair, I'm not an expert on RFCs, just a developer with an unfixable problem. So I suppose you're saying the RFC needs to be amended for this to go through.

Revision history for this message
Kees Cook (kees) wrote :

It doesn't seem sensible to not support this if BIND supports it.

Revision history for this message
David Grossberg (davidgro) wrote :

It doesn't seem sensible to not support this if There Are Sites On The Internet I Can't Access Due To My Choice Of OS!
Please forget the precise wording of restrictions in technical specs and remember that this is an actual user facing issue.

Revision history for this message
Adam Conrad (adconrad) wrote :

David, that circles us back to my argument that you're saying we should support resolving any hostname anybody exposes on the Internet, period. There are any number of reasons that just doesn't work, and pretty much toss out the whole point of having standards.

@kees: BIND supports resolving damned near anything, because DNS != hostnames.

Revision history for this message
David Grossberg (davidgro) wrote :

I must have missed an e-mail with your previous post sorry. Anyway, what are some of these reasons that there are any number of?

I think the practical solution to deciding which domains to drop and which to support is unfortunately that we should support at minimum any that Windows does. It is (for now at least) the de facto client implementation: If Windows didn't allow accessing sites with the leading or trailing hyphens, then tumblr etc. would not expose them.

As long as we are bickering about RFCs, I would like to propose that this situation is best covered by the first half of RFC 1958 part 3.9:

3.9 Be strict when sending and tolerant when receiving.
   Implementations must follow specifications precisely when sending to
   the network, and tolerate faulty input from the network.

(it goes on to say "When in doubt, discard faulty input silently", but I disagree with that for this instance obviously, I guess it comes down to if "faulty" means would cause problems vs merely out of spec.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.