Comment 23 for bug 405317

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 405317] Re: Huge Performance regressions in bazaar 1.16 and higher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
>> Were you using filters before you upgraded?
> No; 1.16 was the 1st version where content filtering was usable. We have now enabled it to handle line endings more properly.
>
> Would it be helpful if I disabled content filtering on my machine for a
> while?
>
>> Tell me, do you have a pattern where you repeatedly merge from some branch B, but never merge back into that branch?
> Yes, totally. As I told above the 5 branches represent different versions. When I fix something in the lowest version it gets merged "upwards" to the newer versions and finally to the trunk branch. Of course we NEVER merge a "higher version" branch to a "lower version" branch for that would make the versions the same - it would merge back all new development done for the higher version. Most of the trouble you see here is from that trunk branch which collected /all/ fixes on /all/ versions below it. This I think also explains the large #of revisions; a single fix merge here contains all of the commit/merge revisions of all lower branches below it.
>
> The part about heads and having to re-check whether (files?) in the
> other branch had changed is way over my head; I would need to learn your
> terminology 1st. I am surprised a bit, because I expected merge to only
> work with the difference in revisions between the two trees, and then
> only on the files that have changed in those revisions on both sides.
>
> Thanks again for your time and have a good weekend!
>

So I know the merge code is smart enough to use a basis tree to
determine what has actually changed.

However, I don't think the commit code has that luxury. So imagine a
commit graph that looks something like:

 A
 |\
 B C
 |/|
 D E
 |/|
 F G
 |/|
 H I
 |/
 J

The left side is your latest release branch, and the right side is your
previous release branch. Also, lets assume that changes either need a
little bit of tweaking or that the target file is already slightly
different from the right hand side. What this means is that for the file
modified by C it will be slightly different in D.

So what does this get at. Let's assume there were certain changes, and
I'll classify those as C-files, E-files, G-files and I-files.

Now in I, all C-files will have a last-modified value of C, all E-files
will have a last-modified of E, and so on.

However, because of the "tweaking" that I mentioned, in *H*, all of the
C-files will have a last-modified of *D*, all of the E-files will have a
last-modified of F, etc.

So when we merge I into H to get J, the *merge* code will pick G as the
common base, and try to apply only those changes from G => I to H, which
is what we want.

However, the *commit* code will then need to figure out the per-file
graphs, etc. And what it will see is all of the C-files, E-files,
G-files have different last-modified values between that stored in H and
that stored in I. And it isn't immediately obvious that the value in H
supersedes the value in I. So we have to do a heads() check.

I'm not sure if there is much else that we could do. One argument is
that we could see that the working content is identical to the left
parent, and then not do the heads() check. However doing:
  bzr merge ../older-version
  bzr revert one/file # the change is not valid in the new version
  bzr commit -m "commit the valid changes from $old-version"

It is perfectly valid to revert some change that would have otherwise
been included. And our current policy is that refusing changes is, in
itself, an action and thus should be part of the per-file graph.

As for improving heads() speed... I'm sure that KnownGraph.heads() is
significantly faster than Graph.heads() as it gets a chance to use a
much more efficient algorithm. The tradeoff is that it currently always
has to load the whole graph. I'm pretty sure this would be a win for
your case, but there are other cases like OOo, where we have 200k
revisions, and much less graph divergence.

As another thought, I've seen cases where running under "--lsprof"
causes index code to show up significantly higher than it would if you
just timed things directly. Would you be comfortable if we asked you to
get the bzr source and try some modifications to see what the net
effects are?

This would be most effective if you could join us in a more interactive
venue like IRC, rather than sending e-mail back and forth. Though we
certainly could do it that way.

Also, now that you've isolated a real test case, we can just:

 bzr branch current-tip testing
 cd testing
 bzr uncommit
 time bzr commit -m test1
 bzr uncommit
 time bzr commit -m test2
 ...

With the added benefit that we know all of these test runs are going to
be against exactly the same history.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp3D4AACgkQJdeBCYSNAANSzgCdHV/gzF89S+gy1nbm3meCmvwR
syMAn217146OOaWWuDjpWJU+uOAkoHJz
=44Xz
-----END PGP SIGNATURE-----