Bazaar

.cix thrashing causes us to re-download the whole index multiple times

Bug #402623 reported by John A Meinel on 2009-07-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Fix Released	Critical	John A Meinel	Bazaar 2.0.1 "Stability First"

Bug Description

being split off from bug #402114

When a .cix index gets to be larger than ~100k entries, (more than 1000 btree pages) we no longer can fit everything into the local cache. Because the access pattern is essentially 'random' this destroys our 'hit rate'.

When doing a fetch, we generally have to access the .cix 4-5 times. (read root nodes, read nodes underneath them, read root nodes for the tree shape, read nodes underneath them, etc.)

This means that when doing a fresh branch of a large project, we will re-read the entire .cix for the largest pack files up to 5 times. (in the case of LP, this is about 10MB+ of data that we re-read 5 times.) At 50MB of data, that is 50% of the size of an optimized pack file for the entire history (106MB).

The easiest thing to do is to just increase the buffering for .cix indexes.

Related branches

lp:~jameinel/bzr/2.0-cache-cix

Merged into lp:bzr/2.0

Vincent Ladeuil: Approve on 2009-10-09

lp:ubuntu/karmic-proposed/bzr

John A Meinel (jameinel) on 2009-07-21

Changed in bzr:
importance:	Undecided → High
status:	New → Triaged

Robert Collins (lifeless) on 2009-08-14

Changed in bzr:
importance:	High → Critical

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-14:

I'm not sure what to do about this :(.

Revision history for this message

Martin Pool (mbp) wrote on 2009-08-14: Re: [Bug 402623] Re: .cix thrashing causes us to re-download the whole index multiple times

2009/8/14 Robert Collins <email address hidden>:
> I'm not sure what to do about this :(.

Can we let the cache grow to fit at least one whole index? Thrashing
into swap is probably better than thrashing over the network.

Is this a release blocker?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-14: Re: [Bug 402623] Re: .cix thrashing causes us to re-download the whole index multiple times

On Fri, 2009-08-14 at 03:15 +0000, Martin Pool wrote:
> 2009/8/14 Robert Collins <email address hidden>:
> > I'm not sure what to do about this :(.
>
> Can we let the cache grow to fit at least one whole index? Thrashing
> into swap is probably better than thrashing over the network.
>
> Is this a release blocker?

I don't think so because:
- few projects are _really big_
- those that aren't should be fine.

-Rob

Revision history for this message

Christian Reis (kiko) wrote on 2009-08-14:

Is there a workaround that projects that run into this can apply? For release-noting if nothing else.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-14:

So the trivial fix is:
=== modified file 'bzrlib/btree_index.py'
--- bzrlib/btree_index.py 2009-08-13 19:56:26 +0000
+++ bzrlib/btree_index.py 2009-08-14 18:55:59 +0000
@@ -46,7 +46,7 @@
_PAGE_SIZE = 4096

# 4K per page: 4MB - 1000 entries
-_NODE_CACHE_SIZE = 1000
+_NODE_CACHE_SIZE = 100000

One could easily put this into a plugin.

Also note that this is only true when reading over a 'dumb' transport. The smart server code reads things locally and streams the content over the wire, the indexes are not read directly.

A more involved option would be to set up a special case for .cix because we know we lose 'locality' there, and to only bump up the node cache size for cix.

Revision history for this message

Martin Pool (mbp) wrote on 2009-08-15: Re: [Bug 402623] Re: .cix thrashing causes us to re-download the whole index multiple times

2009/8/15 John A Meinel <email address hidden>:
> So the trivial fix is:
> === modified file 'bzrlib/btree_index.py'
> --- bzrlib/btree_index.py 2009-08-13 19:56:26 +0000
> +++ bzrlib/btree_index.py 2009-08-14 18:55:59 +0000
> @@ -46,7 +46,7 @@
> _PAGE_SIZE = 4096
>
> # 4K per page: 4MB - 1000 entries
> -_NODE_CACHE_SIZE = 1000
> +_NODE_CACHE_SIZE = 100000

That's the kind of thing I'd look at putting in
-Dindex_cache_size=100000 (which would require a small patch to
support key/value debug flags); obviously it's inferior to just
working but it's more feasible for most users than a patch or even a
plugin.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-15: Re: [Bug 402623] Re: .cix thrashing causes us to re-download the whole index multiple times

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> 2009/8/15 John A Meinel <email address hidden>:
>> So the trivial fix is:
>> === modified file 'bzrlib/btree_index.py'
>> --- bzrlib/btree_index.py 2009-08-13 19:56:26 +0000
>> +++ bzrlib/btree_index.py 2009-08-14 18:55:59 +0000
>> @@ -46,7 +46,7 @@
>> _PAGE_SIZE = 4096
>>
>> # 4K per page: 4MB - 1000 entries
>> -_NODE_CACHE_SIZE = 1000
>> +_NODE_CACHE_SIZE = 100000
>
> That's the kind of thing I'd look at putting in
> -Dindex_cache_size=100000 (which would require a small patch to
> support key/value debug flags); obviously it's inferior to just
> working but it's more feasible for most users than a patch or even a
> plugin.
>

You could make it "-Dbig_index_cache" and then set it to something
ridiculously high.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqGvpcACgkQJdeBCYSNAAPangCgnqyH6l0G6FZgIwLHrImOcwTM
F6gAnigtp4dgM0810lMmPq4q2ZKjFfmC
=AtYU
-----END PGP SIGNATURE-----

Ian Clatworthy (ian-clatworthy) on 2009-08-31

Changed in bzr:
assignee:	nobody → Ian Clatworthy (ian-clatworthy)

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-08-31:

There are several issues here:

1. Can we tell via tools what repositories will be impacted by this, e.g. by running a plugin like repository-details?

2. What operations are impacted? Just branch or also things like pull, update and imports?

3. Solution design. If users aren't going to know that this will impact them until after they do the branch, then adding a command line flag (vs some sort of auto-detection) doesn't sound very useful to me.

It's very easy to extend the BTreeGraphIndex constructor with an optional parameter being the index size. The hard bit is passing that parameter in from a higher layer, given the way these indexes are currently instantiated by the repository layer as best I can tell. And how would the higher layer decide what size is appropriate?

Maybe we need a high level setting of "optimise for memory vs speed" as Windows does? Assuming memory is quite abundant these days, perhaps picking a larger default cache here (40MB vs 4MB) is fine. Those on limited memory laptops could add something to their bazaar.conf saying something like "optimize-for = limited-memory".

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-08-31:

I'll try John's patch on OOo and see what the impact is w.r.t speed vs memory usage of local branching.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-09-03: Re: [Bug 402623] Re: .cix thrashing causes us to re-download the whole index multiple times

#10

Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> There are several issues here:
>
> 1. Can we tell via tools what repositories will be impacted by this,
> e.g. by running a plugin like repository-details?

heads -n5 .bzr/repository/indices/*.cix

would be sufficient.

Basically, anything with more than 1000 pages is going to 'thrash' a
bit. Some more than others.

B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=99083
row_lengths=1,5,825

^- 99,083 is the number of keys, 825 is the total number of pages. Note
that this is an old bzr.dev repo with everything in one pack. And we are
already at 825. So getting > 1000 is pretty easy.

For Launchpad, I get:
len=398273
row_lengths=1,23,3671

^- So we have 3.7k chk pages. Meaning the cache hit rate is going to be
1/3.7 ~ 27%.

>
> 2. What operations are impacted? Just branch or also things like pull,
> update and imports?

Pull and update will generally only read a fraction of the chk pages,
and thus aren't very likely to be heavily impacted.

This also only really effects dumb-transport users. It does effect local
operations (and the server side of remote ops), but the actual effect is
just re-reading a local page. Slower than it has to be, but not
re-downloading over a low-bandwidth connection.

>
> 3. Solution design. If users aren't going to know that this will impact
> them until after they do the branch, then adding a command line flag (vs
> some sort of auto-detection) doesn't sound very useful to me.
>
> It's very easy to extend the BTreeGraphIndex constructor with an
> optional parameter being the index size. The hard bit is passing that
> parameter in from a higher layer, given the way these indexes are
> currently instantiated by the repository layer as best I can tell. And
> how would the higher layer decide what size is appropriate?

Arguably we could set chk pages to have 'infinite' size. When a btree
reads its root page, it knows how many pages are available. So it could do:

if self._cache_everything:
self._cache.resize(self._row_offsets[-1]) # + 1?

>
> Maybe we need a high level setting of "optimise for memory vs speed" as
> Windows does? Assuming memory is quite abundant these days, perhaps
> picking a larger default cache here (40MB vs 4MB) is fine. Those on
> limited memory laptops could add something to their bazaar.conf saying
> something like "optimize-for = limited-memory".
>

It would need to be something passed from the repository when it
constructs the indexes.

Around line 1677 of pack_repo.py:
if self.chk_index is not None:
chk_index = self._make_index(name, '.cix')

That is the point at which we end up calling
"self._index_class(transport, index, index_size)" and the point where we
know that we are dealing with a chk index rather than 'just any' index.

So you could pass "make_index" a "should cache everything flag" which
could then be passed down into the BTreeIndex constructor.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqf41EACgkQJdeBCYSNAANlggCgh4UhRK0E6gZNu8uziJd46c0e
mDAAoLlQ5348Yr0rW7PUGc04R...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> There are several issues here:
> 
> 1. Can we tell via tools what repositories will be impacted by this,
> e.g. by running a plugin like repository-details?

heads -n5 .bzr/repository/indices/*.cix

would be sufficient.

Basically, anything with more than 1000 pages is going to 'thrash' a
bit. Some more than others.

B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=99083
row_lengths=1,5,825

^- 99,083 is the number of keys, 825 is the total number of pages. Note
that this is an old bzr.dev repo with everything in one pack. And we are
already at 825. So getting > 1000 is pretty easy.

For Launchpad, I get:
len=398273
row_lengths=1,23,3671

^- So we have 3.7k chk pages. Meaning the cache hit rate is going to be
1/3.7 ~ 27%.

> 
> 2. What operations are impacted? Just branch or also things like pull,
> update and imports?

Pull and update will generally only read a fraction of the chk pages,
and thus aren't very likely to be heavily impacted.

> 
> 3. Solution design. If users aren't going to know that this will impact
> them until after they do the branch, then adding a command line flag (vs
> some sort of auto-detection) doesn't sound very useful to me.
> 
> It's very easy to extend the BTreeGraphIndex constructor with an
> optional parameter being the index size. The hard bit is passing that
> parameter in from a higher layer, given the way these indexes are
> currently instantiated by the repository layer as best I can tell. And
> how would the higher layer decide what size is appropriate?

Arguably we could set chk pages to have 'infinite' size. When a btree
reads its root page, it knows how many pages are available. So it could do:

if self._cache_everything:
  self._cache.resize(self._row_offsets[-1]) # + 1?

> 
> Maybe we need a high level setting of "optimise for memory vs speed" as
> Windows does? Assuming memory is quite abundant these days, perhaps
> picking a larger default cache here (40MB vs 4MB) is fine. Those on
> limited memory laptops could add something to their bazaar.conf saying
> something like "optimize-for = limited-memory".
>

It would need to be something passed from the repository when it
constructs the indexes.

Around line 1677 of pack_repo.py:
            if self.chk_index is not None:
                chk_index = self._make_index(name, '.cix')

That is the point at which we end up calling
"self._index_class(transport, index, index_size)" and the point where we
know that we are dealing with a chk index rather than 'just any' index.

So you could pass "make_index" a "should cache everything flag" which
could then be passed down into the BTreeIndex constructor.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqf41EACgkQJdeBCYSNAANlggCgh4UhRK0E6gZNu8uziJd46c0e
mDAAoLlQ5348Yr0rW7PUGc04RroJgU4X
=ln8G
-----END PGP SIGNATURE-----

John A Meinel (jameinel) on 2009-09-09

Changed in bzr:
assignee:	Ian Clatworthy (ian-clatworthy) → John A Meinel (jameinel)

John A Meinel (jameinel) on 2009-09-09

Changed in bzr:
milestone:	none → 2.0.1

Revision history for this message

John A Meinel (jameinel) wrote on 2009-09-09:

#11

There is a patch up for review.
In testing this, I went ahead and used bzr.dev to download http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel
I set -Dhttp and then used some silly grep hacks to check Content-Length headers:
grep -A 17 "http.*cix" .bzr.log | grep Content-Length
< Content-Length: 4096
< Content-Length: 94208
< Content-Length: 15846927
< Content-Length: 433
< Content-Length: 17119
< Content-Length: 583
< Content-Length: 298
< Content-Length: 610
< Content-Length: 12349440
< Content-Length: 12004988
< Content-Length: 11701226
< Content-Length: 1092692

Which in total is 50.7MiB

Versus just 'grep Content-Length' which was:
180.5 MiB

So out of 180.5 MiB downloaded, 28% of that was the .cix headers being downloaded 4+ times. (The last one is probably reading the parent_id,basename=>file_id index pages, which are significantly less than the full content.)

In comparison, doing the same action with my patch resulted in:
< Content-Length: 4096
< Content-Length: 94208
< Content-Length: 15846927
< Content-Length: 433
< Content-Length: 17119
< Content-Length: 583
< Content-Length: 298
< Content-Length: 610

15.2 MiB / 143.1 MiB

Peak memory did increase to about 650MiB during the transfer. Though I'm trying
to address that sort of issue with different patches.

Changed in bzr:
status:	Triaged → Fix Committed

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-10-13:

#12

This fix is now merged to both lp:bzr and lp:bzr/2.0

Changed in bzr:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.