Set up a mirror based on zone

Bug #317065 reported by Soren Hansen
8
Affects Status Importance Assigned to Milestone
Ubuntu on EC2
Fix Released
Undecided
Chuck Short
ec2-init (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: ec2-init

It's likely that there will be a mirror in each of Amazon's avalability zones. We should detect which zone we're in and automatically use the right mirror.

Suggested implementation:

ec2-init installs a symlink: /etc/apt/sources.list.d/amazon -> /var/run/ec2/sources.list

We'll put a template sources.list in /etc/ec2/sources.list.template

The init script will detect the zone in which we're running, and generate /var/run/ec2/sources.list based on the zone and the template.

Related branches

Revision history for this message
Eric Hammond (esh) wrote :

There is currently no standard, reliable method for determining what specific availability zone an EC2 instance is running in.

In fact, availability zone us-east-1a for one EC2 account may not be the same availability zone as us-east-1a for another EC2 account. Amazon assigns these names randomly for each account (but keeps them consistent within that account).

There are no public, shared, external names for the EC2 availability zones. They are all relative to the specific account running the instance.

I believe the best an instance can do is to determine what region it is running in and use a mirror name which represents the mirror instance(s) in that region.

When architecting the mirrors, there should still be at least one instance running in each availability zone to provide the level of reliability that EC2 users depend on.

When the instance determines the region it is in (say, us-east-1) it could use something like:
  ec2-us-east-1.archive.canonical.com
or
  us-east-1.ec2.archive.canonical.com
and that could resolve to the internal IP addresses for the mirrors in all the availability zones in that region.

There is a chance that the user's instance might not get the mirror in its own availability zone which would add a small network transfer charge (currently $0.01/GB) but this is still much less than accessing a mirror completely outside of EC2 (about $0.10/GB).

Revision history for this message
Eric Hammond (esh) wrote :

I suppose an instance could examine a traceroute to each of the mirrors to see which one is closer, but I don't know if this is a reasonable solution to determining availability zone.

Revision history for this message
Eric Hammond (esh) wrote :

Er, my examples should have read:
  ec2-us-east-1.archive.ubuntu.com
or
  us-east-1.ec2.archive.ubuntu.com

Revision history for this message
polvi (alex-polvi) wrote :

I have not tried this in the EU zone, but it seems to work well in the US one:

root@domU-12-31-39-03-44-37:~# ZONE=`curl http://169.254.169.254/latest/meta-data/placement/availability-zone`
root@domU-12-31-39-03-44-37:~# echo $ZONE
us-east-1c

Revision history for this message
Eric Hammond (esh) wrote :

polvi: My first sentence was not well phrased. Yes, that command gives you the name of the availability zone for the current account, but that can't be easily matched up with the availability zone for a different account since the names are assigned randomly for each account.

Revision history for this message
polvi (alex-polvi) wrote :

I couldn't come up with any better ideas, so here is a bash implementation of the traceroute method. Seems to work:

#!/bin/bash

A="ec2-75-101-201-203.compute-1.amazonaws.com" # node in us-east-1a
B="ec2-75-101-228-202.compute-1.amazonaws.com" # node in us-east-1b
C="ec2-174-129-138-71.compute-1.amazonaws.com" # node in us-east-1c

for mirror in $A $B $C; do
  HOPS="`tracepath -n $mirror | grep hops | awk '{ print $5 }'`"
  # checking for an int is not pretty in bash
  [ "$HOPS" -gt 0 ] 2>&- && echo "$HOPS $mirror"
done | sort -n | head -n 1 | awk '{ print $2 }'

The host that ran this command was on us-east-1c, and selected $C correctly each time.

root@ubuntu:~# ./close-host.sh
ec2-174-129-138-71.compute-1.amazonaws.com

Example tracepath output:

ec2-75-101-201-203.compute-1.amazonaws.com
 1: domU-12-31-39-00-9D-23.compute-1.internal (10.254.162.209) 0.092ms pmtu 1500
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.099ms
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.055ms
 2: 10.254.160.2 (10.254.160.2) 0.979ms
 3: ec2-75-101-160-160.compute-1.amazonaws.com (75.101.160.160) 0.522ms
 4: othr-216-182-232-13.usma2.compute.amazonaws.com (216.182.232.13) 1.077ms
 5: ec2-75-101-160-33.compute-1.amazonaws.com (75.101.160.33) 1.171ms
 6: dom0-10-252-128-156.compute-1.internal (10.252.128.156) 1.037ms
 7: domU-12-31-38-00-7D-64.compute-1.internal (10.252.130.146) 1.074ms reached
     Resume: pmtu 1500 hops 7 back 58

ec2-75-101-228-202.compute-1.amazonaws.com
 1: domU-12-31-39-00-9D-23.compute-1.internal (10.254.162.209) 0.055ms pmtu 1500
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.061ms
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.053ms
 2: 10.254.160.2 (10.254.160.2) 0.728ms
 3: ec2-75-101-160-176.compute-1.amazonaws.com (75.101.160.176) 0.397ms
 4: othr-216-182-232-15.usma2.compute.amazonaws.com (216.182.232.15) 0.823ms
 5: othr-216-182-224-18.usma1.compute.amazonaws.com (216.182.224.18) 1.191ms
 6: ec2-75-101-160-113.compute-1.amazonaws.com (75.101.160.113) 1.434ms
 7: ip-10-251-124-167.ec2.internal (10.251.124.167) 1.094ms
 8: ip-10-251-127-65.ec2.internal (10.251.127.65) 1.201ms reached
     Resume: pmtu 1500 hops 8 back 57

ec2-174-129-138-71.compute-1.amazonaws.com
 1: domU-12-31-39-00-9D-23.compute-1.internal (10.254.162.209) 0.064ms pmtu 1500
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.069ms
 1: dom0-10-254-160-160.compute-1.internal (10.254.160.160) 0.055ms
 2: 10.254.160.2 (10.254.160.2) 0.889ms
 3: dom0-10-254-164-152.compute-1.internal (10.254.164.152) 0.305ms
 4: domU-12-31-39-00-A1-A3.compute-1.internal (10.254.166.81) 0.467ms reached
     Resume: pmtu 1500 hops 4 back 61

Revision history for this message
Chuck Short (zulcss) wrote :

Im looking at this for beta3.

Changed in ubuntu-on-ec2:
milestone: none → beta3
status: New → Triaged
Revision history for this message
Jeremy Deininger (jeremy-rightscale) wrote :

Opinion:
1) When compared to the cost of running the rest of the mirror infrastructure (such as EBS cost), the cost of traffic between zones is not a significant factor.
2) Using a traceroute method you could still end up with a bad server selection rather easily.

Question:
What's the best way to deal with failure scenarios? At RightScale we are looking at using a sources.list with all the mirrors in it. Starting with the 'round-robin' DNS which would give us balanced usage during package download. This does increase the download of index files (during apt-get update), however it also significantly increases the redundancy if one of the mirrors is down or unreachable. We preload the results from the first apt-get update onto the image so that it will minimize the impact on launch.

example snip:
deb http://ec2-us-east-mirror.rightscale.com/ubuntu_daily/latest intrepid main restricted multiverse universe
deb http://ec2-us-east-mirror.rightscale.com/ubuntu_daily/latest intrepid-updates main restricted multiverse universe
deb http://ec2-us-east-mirror.rightscale.com/ubuntu_daily/latest intrepid-security main restricted multiverse universe

deb http://ec2-us-east-mirror1.rightscale.com/ubuntu_daily/latest intrepid main restricted multiverse universe
deb http://ec2-us-east-mirror1.rightscale.com/ubuntu_daily/latest intrepid-updates main restricted multiverse universe
deb http://ec2-us-east-mirror1.rightscale.com/ubuntu_daily/latest intrepid-security main restricted multiverse universe
(and so on, for mirror2 and mirror3)

Rick Clark (dendrobates)
Changed in ubuntu-on-ec2:
assignee: nobody → zulcss
Revision history for this message
Chuck Short (zulcss) wrote :

Script is being updated as we speak.

Changed in ec2-init:
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ec2-init - 0.3.3ubuntu2

---------------
ec2-init (0.3.3ubuntu2) jaunty; urgency=low

  * debian/ec2-set-apt-sources.py:
    - Use the ec2 mirrors. (LP: #317065, #333897)
    - Update the /etc/apt/sources.list (LP: #333904)
  * debian/ec2-fetch-credentials.py:
    - Better error checking (LP: #325067)

 -- Chuck Short <email address hidden> Tue, 24 Feb 2009 14:02:37 -0500

Changed in ec2-init:
status: In Progress → Fix Released
Soren Hansen (soren)
Changed in ubuntu-on-ec2:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.