Comment 6 for bug 1868232

Revision history for this message
Dan Watkins (oddbloke) wrote : Re: [Bug 1868232] Re: underscores should be stripped from hostnames generated for apt config

On Tue, Mar 24, 2020 at 02:58:49AM -0000, Seth Arnold wrote:
> It would be nice to address the wildcard DNS entries, as those have a
> potential for abuse, and can be endlessly confusing if you're not
> prepared for them.

The wildcard DNS entries are required for the system as-designed to
work, I believe. cloud-init will configure region-based mirror names
based _only_ on the metadata available to it in the instance (so if
RandomCloud set up a eu-random-1 region, cloud-init would configure
eu-random-1.clouds.archive.ubuntu.com as the mirror for instances in
that new region). So we need some guarantee that all possible
region-named mirrors will be listened on, hence the wildcard DNS
entries.

(We generate per-region mirrors for, I believe, a couple of reasons.
Firstly, it allows us, or the cloud, to spin up in-region mirrors and
have them used by ~all already-deployed Ubuntu instances just by adding
non-wildcard DNS entries pointing at the new mirrors. And, secondly, it
means that clouds don't have an incentive to DNS-hijack
archive.ubuntu.com if they decide they want to host in-cloud mirrors, so
cloud _users_ will have an easy way around the in-cloud mirrors if they
so desire.)

> In the meantime though, this plan sounds good to me.

OK, good!

> I'm worried about collisions, where multiple providers may use
> us_west_2, us%west2, uswest~2, etc.

To some extent, we do have this problem to solve already, as clouds
could have regions named identically. That said, this certainly does
increase the chance of collision.

> Some phrases read differently if the spacing is removed. The usual
> examples are powergen_italia and experts_exchange, but perhaps there's
> more realistic phrases for region names. (This seems quite small problem
> compared to the overall wildcard DNS entries, though.)
>
> Reversible transformations are usually better but since we're presumably
> doing this with business partners, the trouble cases may fall under
> "don't do that" kinds of categories.

It wouldn't be reversible, but we could convert invalid characters into,
say, "--" which is relatively unlikely to be used in real region
names/URIs. That would at least mean that "useast_1" and "useast1"
wouldn't collapse to the same mirror hostname (although it wouldn't do
anything about useast^1 and useast_1 colliding).

(I wonder if there are URI/hostname length boundaries that we would risk
running into if we replace single characters with anything more than a
single character, though.)

> Are these actual problems? Probably it's fine but I thought I'd mention
> them just in case someone else with more context or creativity can make
> more of them.

The input is certainly welcome!