fetch for --2a formats seems to be fragmenting

Bug #402645 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
John A Meinel

Bug Description

split out from bug #402114

While debugging the log file for branching over http, we saw entries like this:
384.270 creating new compressed block on-the-fly in 0.000s 11139 bytes => 365 bytes
384.271 stripping trailing bytes from groupcompress block 11046 => 275
384.274 creating new compressed block on-the-fly in 0.001s 11139 bytes => 2757 bytes
384.277 creating new compressed block on-the-fly in 0.000s 11046 bytes => 269 bytes
384.278 creating new compressed block on-the-fly in 0.000s 11139 bytes => 917 bytes
384.281 creating new compressed block on-the-fly in 0.001s 11046 bytes => 2832 bytes
384.298 creating new compressed block on-the-fly in 0.003s 11139 bytes => 5331 bytes
384.306 creating new compressed block on-the-fly in 0.000s 11046 bytes => 782 bytes
384.309 creating new compressed block on-the-fly in 0.002s 11139 bytes => 2386 bytes

It would appear that the stream code believes the optimal ordering is to interleave the bytes from the two groups (one sized 11139 bytes and one sized 11046 bytes). In doing so, it fragments both groups, so that rather than having 2 medium sized groups we end up with 9+ groups.

The fetch order should be "unordered" which should cause it to be:
  source_keys = self._get_io_ordered_source_keys(locations,
    unadded_keys, source_result)

Which, in theory, would be one group at a time.

one possibility is that the ordering isn't 'unordered' but instead 'groupcompress' order, and the fact that topo_sort is not 100% stable is causing a different sorting between the two implementations. (So the fetch that created the repository thought one sort was optimal, the new fetch thinks a slightly different order is optimal.)

Changed in bzr:
importance: High → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

Targeting for 2.0, because unpacking on fetch will lead to rather bad disk usage and our users propensity to look under the hood makes it important not to do that.

Changed in bzr:
milestone: none → 2.0
Martin Pool (mbp)
Changed in bzr:
status: Triaged → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

I'm investigating

Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
status: Confirmed → In Progress
Revision history for this message
Martin Pool (mbp) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

The fetching is set to 'unordered' in the 2.0 branch right now. Which at least avoids fragmentation.
However, there maybe a different fix in 2.0 final based on bug #402652

Changed in bzr:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.