I'm uploading an attachment of the debug log with -Dhttp The overall download did take ~60 minutes. I see that in the first 1 minute we are just probing indices (this could be the revision ancestry graph search) before we start downloading any actual .pack content. After that, I see a lot of reads where we are grabbing the first 40 bytes (the pack header) and then ~540 bytes further on in the file. It does look like we are at least reading new data each time. I don't understand why the data is all discrete. Perhaps each revision, etc is in its own group, and thus each one averages approximately the same size (small, of course). I might poke at it a bit, but I'm guessing it is just that we read one group at a time, while we could be buffering a bit more than that. (The trick is figuring out how to make a bigger request without overflowing the LRUCache, etc.) I also don't understand how we could get that many groups in a single pack file without them being collapsed together. Perhaps the autopack code isn't properly grouping revision texts? (I'm assuming it is revs because it is the first item being read.) Note that we spend from approx 75s to 160s just accessing 585 byte chunks from: http readv of 1d0252d21c3632af79515b5e8c497c15.pack It does take us until 350s (~6min) until we start looking at the .iix, and then it takes until 556s before we start reading .cix, and 2063s before we start reading .tix. 3.5 finish reading format, etc files and start reading .rix 73.9 start reading revision texts 353.3 start reading .iix 380.2 read inventory .pack data 555.9 start reading root .cix 627.9 reading chk .pack data 907.1 read layer 2 .cix 1828.5 read layer 3 .cix 1911.2 .cix 2046.8 .cix 2063.4 read .tix 2247.8 read .pack 3651.8 finished reading text content, reading branch.conf and tags 3683.5 finished I'm also seeing this, which seems a bit disconcerting: 384.270 creating new compressed block on-the-fly in 0.000s 11139 bytes => 365 bytes 384.271 stripping trailing bytes from groupcompress block 11046 => 275 384.274 creating new compressed block on-the-fly in 0.001s 11139 bytes => 2757 bytes 384.277 creating new compressed block on-the-fly in 0.000s 11046 bytes => 269 bytes 384.278 creating new compressed block on-the-fly in 0.000s 11139 bytes => 917 bytes 384.281 creating new compressed block on-the-fly in 0.001s 11046 bytes => 2832 bytes 384.298 creating new compressed block on-the-fly in 0.003s 11139 bytes => 5331 bytes 384.306 creating new compressed block on-the-fly in 0.000s 11046 bytes => 782 bytes 384.309 creating new compressed block on-the-fly in 0.002s 11139 bytes => 2386 bytes 384.313 http readv of 0ca910916f100de2c571f97d71c6bf9f.pack offsets => 2 collapsed 2 384.314 > GET /~launchpad-pqm/launchpad/devel/.bzr/repository/packs/0ca910916f100de2c571f97d71c6bf9f.pack Basically, it seems to be taking two modest sized groups (11kb) and splitting them back up into fairly small groups. Perhaps this is a 'groupcompress sort order' issue. That because we think the optimal order is slightly different than what was previously on disk, we think we need to break up the groups on the fly. Though we should be doing an 'unordered' fetch at this point, which should avoid this. Anyway, it definitely looks like we are thrashing between 2 groups and breaking them up into a smaller stream because of that. (I'm seeing this fairly often.) Reading chk .pack data seems much better behaved than the .iix and .rix data reads. They are still small (7.2k) but bigger than the 500 bytes from before. And I'm *not* seeing the fragmentation effects that we saw earlier. I think the 7k size chk groups are because of page splitting. The fetch code was written assuming groups would average around 1-4MB. The chk fetch was reworked to create smaller groups (by clustering based on search-key prefix). The clustering is definitely beneficial, we need to revisit whether we want to start a new group each time. The main trade off is the offset reference will probably grow from 2-bytes to 3-4 bytes, since the copy source will be in the middle of a group instead of at the beginning. I certainly don't expect better compression by combining unrelated groups, so if we can just make bigger readv requests, we will probably offset the cost better than if we tried to force bigger groups. This line is bad: 1956.154 creating new compressed block on-the-fly in 0.000s 3444823 bytes => 36 bytes I'm certainly hoping that the 3.4MB block is just in the local buffer. (As near as I can tell it is.) At least, I don't see a nearby read that is 3.4MB long. (Just previous to that we had a 10.9MB read of the .cix file) Note that we are 'thrashing' on reading the .cix pages for: 1d0252d21c3632af79515b5e8c497c15.cix I see it read 569.016 http readv of 1d0252d21c3632af79515b5e8c497c15.cix offsets => 3669 collapsed 1 ... > Range: bytes=98304-15123801 15MB (read *all* root pages) 907.135 http readv of 1d0252d21c3632af79515b5e8c497c15.cix offsets => 2814 collapsed 1 ... > Range: bytes=98304-11624447 11MB (read a big cluster of pages) 1828.483 http readv of 1d0252d21c3632af79515b5e8c497c15.cix offsets => 2726 collapsed 86 ... > Range: bytes=98304-233471,237568-249855,... ... < Content-Length: 11171455 11MB (read a mix of pages) 1911.204 http readv of 1d0252d21c3632af79515b5e8c497c15.cix offsets => 2652 collapsed 75 ... > Range: bytes=98304-139263,143360-303103,... ... < Content-Length: 10870012 11MB (read a mix of pages) 2046.883 http readv of 1d0252d21c3632af79515b5e8c497c15.cix offsets => 702 collapsed 542 ... > Range: bytes=110592-114687,122880-126975,... ... < Content-Length: 1113169 11MB (read a mix of pages) I believe the issue has been expressed before: 1) .cix pages don't cluster, so when you go to read, you need to read a random selection across the whole page 2) we have so many chk pages that the index gets larger than our buffer size. With 4kB pages, and an average of 100 records per page we require 2.8k btree pages to fit all the .cix records. However, our buffer is set to a hard cap of 1k btree pages. So every time we read a layer of chk pages, we have to re-download the whole index again. (in this case ~ 5 times) For the .tix, again I see some thrashing like: 2351.986 stripping trailing bytes from groupcompress block 620801 => 113468 2352.004 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 2099 bytes 2352.005 creating new compressed block on-the-fly in 0.000s 1623061 bytes => 1922 bytes 2352.008 creating new compressed block on-the-fly in 0.003s 620801 bytes => 7111 bytes 2352.012 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 948 bytes 2352.013 creating new compressed block on-the-fly in 0.001s 620801 bytes => 1283 bytes 2352.015 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 2050 bytes 2352.016 creating new compressed block on-the-fly in 0.001s 620801 bytes => 2129 bytes 2352.018 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 1930 bytes 2352.021 creating new compressed block on-the-fly in 0.002s 620801 bytes => 9805 bytes 2352.024 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 2978 bytes 2352.028 creating new compressed block on-the-fly in 0.003s 620801 bytes => 4253 bytes 2352.031 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 3075 bytes 2352.040 creating new compressed block on-the-fly in 0.007s 620801 bytes => 14191 bytes 2352.047 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 687 bytes 2352.049 creating new compressed block on-the-fly in 0.001s 620801 bytes => 1771 bytes 2352.052 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 3699 bytes 2352.062 creating new compressed block on-the-fly in 0.009s 620801 bytes => 29093 bytes 2352.069 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 2095 bytes 2352.071 creating new compressed block on-the-fly in 0.001s 620801 bytes => 3612 bytes 2352.073 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 9214 bytes 2352.075 creating new compressed block on-the-fly in 0.001s 1623061 bytes => 9520 bytes 2352.082 creating new compressed block on-the-fly in 0.005s 620801 bytes => 15460 bytes 2352.087 creating new compressed block on-the-fly in 0.001s 1623061 bytes => 17684 bytes 2352.098 creating new compressed block on-the-fly in 0.008s 620801 bytes => 27862 bytes 2352.103 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 1875 bytes 2352.105 creating new compressed block on-the-fly in 0.001s 620801 bytes => 2493 bytes 2352.107 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 1342 bytes 2352.109 creating new compressed block on-the-fly in 0.001s 620801 bytes => 3257 bytes 2352.111 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 950 bytes 2352.113 creating new compressed block on-the-fly in 0.001s 620801 bytes => 1291 bytes 2352.114 creating new compressed block on-the-fly in 0.000s 2096942 bytes => 4507 bytes 2352.122 creating new compressed block on-the-fly in 0.006s 620801 bytes => 12400 bytes Again, this doesn't seem to cause us to download extra data, as it seems to be in our local group cache, but it does mean that we are fragmenting what would otherwise be a nice big group. So it doesn't effect *this* fetch, but any fetches from this location to another would then only have small groups to read. (Such as when pqm creates the commit, does the autopack, and then it gets copied to launchpad. The launchpad copy would be more fragmented than expected.) Some of the text groups are nicely sized, but some are still surprisingly small. (Like a 1k here and there.) So I certainly think we need to teach fetch to buffer small groups together into a single request. However, we *also* need to figure out what is causing our nice big 4MB groups to be fragmented into lots of 1-10kB groups.