archive.ubuntu.com config prevents caching

Bug #290921 reported by Roger Binns
6
Affects Status Importance Assigned to Milestone
Ubuntu
Fix Released
Undecided
Unassigned

Bug Description

I don't know of any better way to report this bug. The configuration of archive.ubuntu.com prevents proxy servers from caching packages. This may be because of poor configuration of archive.ubuntu.com, buggy behaviour of that software or an opportunity for the apt client to ignore one part of web standards to improve things.

To reproduce, configure a Squid proxy server and then use it to upgrade one machine eg to Intrepid. Now do the same on a second machine. If you monitor the Squid logs you will see all the files being redownloaded due to a TCP_REFRESH_MISS. This means the cache had the file but on checking with the server found the cached file to be stale.

The root cause is a round robin set of ip addresses are returned for us.archive.ubuntu.com. For example I see 91.189.88.31, 91.189.88.45 and 91.189.88.46.

You can then query each one of these for the same file, looking at the headers returned replacing the IP address as appropriate.

  wget --no-proxy --header="Host: us.archive.ubuntu.com" -O /dev/null -S http://91.189.88.31/ubuntu/pool/main/libh/libhtml-tagset-perl/libhtml-tagset-perl_3.20-2_all.deb

The Last-Modified header is identical across all 3 servers, but the ETag is different. Because the ETag is different the proxy server has to conclude that the content is stale.

The bad effect is that upgrading N computers at a site that uses a normal proxy server requires N downloads which for a dist upgrade can be close to 1GB. That sucks up more of your bandwidth and pointlessly increases server utilization.

Note that .deb with the same name do not change anyway.

Some suggested fixes:

* Stop sending back ETag from servers and only rely on Last-Modified/Content-Length to detect cache invalidation

* Don't include inode in ETag calculation: http://httpd.apache.org/docs/2.2/mod/core.html#fileetag

* Calculate the ETag from the file md5/sha

* Have a really long DNS cache timeout for round robin returned values (eg two weeks) rather than the very short interval so the same server will be hit from the same proxy

I haven't been able to work out a way for the apt client to prevent the Squid proxy from paying attention to ETag. Note that this same issue will affect any caching/proxy server that obeys web standards. One other workaround is to install an apt/deb specific proxy (which then ignores the web standards) but that then requires me to manage two proxy/cache servers when just the one would work fine if the ubuntu.com ones were fixed.

Revision history for this message
Michael Vogt (mvo) wrote :

Thanks for your bugreport.

I forwarded it to the sysadmin team as ticket #32129

description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

So, setting
FileETag MTime Size
or
FileETag None
would be reasonable.

Of course, how we get all the mirrors doing that is a separate problem.

Revision history for this message
Robert Collins (lifeless) wrote :

Oh, I forgot to mention; if the mirror syncing logic doesn't preserve mtime, you would want to remove mtime from the etag too; at that point I'd remove the etag completely.

Revision history for this message
Roger Binns (ubuntu-rogerbinns) wrote :

Using the wget command the last modified times were all the same. I don't know what they use behind the scenes but if it is rsync (my best guess) then the last modified times will be in sync.

Revision history for this message
James Troup (elmo) wrote :

{us.,}.archive.ubuntu.com (and *.archive.ubuntu.com not provided by mirrors) no longer send ETag headers; thanks for the suggestion.

Revision history for this message
Roger Binns (ubuntu-rogerbinns) wrote :

Thanks. I can confirm that the servers are returning Last-Modified but not ETag. I am looking to future updates and upgrades to be much quicker across all my machines.

Revision history for this message
Roger Binns (ubuntu-rogerbinns) wrote :

Just for the record the IP addresses for us.archive.ubuntu.com all seem to resolve to machines in London. Here is one example:

$ tracepath us.archive.ubuntu.com
[first few elided for privacy]
 3: 114.at-5-0-0.gw3.200p-sf.sonic.net (74.220.64.17) 37.975ms asymm 4
 4: 0.as0.gw4.200p-sf.sonic.net (64.142.0.226) 38.212ms
 5: ge-6-22.car1.SanFrancisco1.Level3.net (4.53.128.97) 38.989ms asymm 4
 6: ae-2-4.bar1.SanFrancisco1.Level3.net (4.69.133.150) 44.527ms asymm 13
 7: ae-0-11.bar2.SanFrancisco1.Level3.net (4.69.140.146) 45.756ms asymm 12
 8: ae-6-6.ebr2.SanJose1.Level3.net (4.69.140.154) 52.477ms asymm 10
 9: ae-92-92.csw4.SanJose1.Level3.net (4.69.134.222) 47.267ms
10: ae-94-94.ebr4.SanJose1.Level3.net (4.69.134.253) 42.315ms
11: ae-2.ebr4.NewYork1.Level3.net (4.69.135.186) 114.973ms
12: ae-74-74.csw2.NewYork1.Level3.net (4.69.134.118) 121.746ms
13: ae-81-81.ebr1.NewYork1.Level3.net (4.69.134.73) 108.654ms
14: ae-41-41.ebr2.London1.Level3.net (4.69.137.65) 182.792ms
15: ae-1-100.ebr1.London1.Level3.net (4.69.132.117) 194.723ms
16: ae-2.ebr2.London2.Level3.net (4.69.132.145) 187.519ms
17: ae-26-54.car2.London2.Level3.net (4.68.117.112) 200.600ms
18: 195.50.121.2 (195.50.121.2) 193.253ms asymm 19
19: drescher.canonical.com (91.189.88.40) 195.202ms reached
     Resume: pmtu 1500 hops 19 back 53

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.