Huge data transfers/bad performance OVERALL

Bug #405251 reported by Frits Jalvingh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Expired
Undecided
Unassigned

Bug Description

(I mailed the mailing list about this problem earlier; see Re: Huge data transfers/bad performance on bzr pull??).

I have a large repository where pulling small changes can easily take almost forever. This morning I did a pull and it took almost half an hour. The actual new commits in that pull changes about 40 small files. The total data pulled was > 100MB. An earlier pull that was stopped had already transfered another 150MB, to in total 250MB was pulled. All the while the bzr "python" process used 100% CPU on an Core i7 920 system!!?!?!

As was requested by the people on the mailing list, I started the 2nd pull with the -Dhpss parameter. I will add the log from that to this report.

The pull was done with a bzr 1.17 client and an 1.16 server (both on Linux).

In addition: since upgrading to 1.16 everywhere I see very bad performance all over the place (on Windows clients); commits taking 2 to 15(!) minutes; status taking > 2 minutes etc. No idea whether this is related; I'll post a separate bug for that.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
Andrew Bennetts (spiv) wrote :

> The actual new commits in that pull changes about 40 small files.

The log you attached contradicts this: it appears to show 100 new files, and a further 99 modified files.

The network protocol overhead in that log is very low (<100kB). i.e. the vast bulk of that transfer was revision data. If too much is transferred it would be some sort of bug in the way bzr selects which revisions to transfer.

Can you try doing a "bzr branch" of your local branch to a new, standalone branch (i.e. not in a shared repository, e.g. "bzr branch /home/jal/bzr/vp-trunk /tmp/vp-trunk --no-tree", and compare the size of the files in .bzr/repository/packs in the original and the copy? It may also be interesting to compare the size of that directory in the original before and after "bzr pack".

The cpu use is almost certainly a bug, but it'll probably be easier to diagnose once we figure out why you are seeing much larger transfers than expected.

Oh, can you also check to see if the repository format on the server and the client matches?

Revision history for this message
Andrew Bennetts (spiv) wrote :

(Actually, nevermind the final question, I can see from the log that both ends are 1.9-rich-root.)

Changed in bzr:
status: New → Incomplete
Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

>> The actual new commits in that pull changes about 40 small files.
>The log you attached contradicts this: it appears to show 100 new files, and a further 99 modified files.
You are of course right; sorry. I did not properly scroll the screen up. Even so, the amount of data transfered was not even close to the size of the files changed.

I am branching the copy now; it is already hanging for about an hour (with 100% cpu) at:
[#########- ] Fetching revisions:Get stream source
without any progress (a fully local action!?). The progress bar itself is not moving at all (the - stays a -)

I did an strace on the process, it seems to be reading files only (very slowly).

I will return as soon as the branch completes (if at all) 8-(

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 405251] [NEW] Huge data transfers/bad performance OVERALL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> Public bug reported:
>
> (I mailed the mailing list about this problem earlier; see Re: Huge data
> transfers/bad performance on bzr pull??).
>
> I have a large repository where pulling small changes can easily take
> almost forever. This morning I did a pull and it took almost half an
> hour. The actual new commits in that pull changes about 40 small files.
> The total data pulled was > 100MB. An earlier pull that was stopped had
> already transfered another 150MB, to in total 250MB was pulled. All the
> while the bzr "python" process used 100% CPU on an Core i7 920
> system!!?!?!
>
> As was requested by the people on the mailing list, I started the 2nd
> pull with the -Dhpss parameter. I will add the log from that to this
> report.
>
> The pull was done with a bzr 1.17 client and an 1.16 server (both on
> Linux).
>
> In addition: since upgrading to 1.16 everywhere I see very bad
> performance all over the place (on Windows clients); commits taking 2 to
> 15(!) minutes; status taking > 2 minutes etc. No idea whether this is
> related; I'll post a separate bug for that.
>
> ** Affects: bzr
> Importance: Undecided
> Status: New
>

So there was a bug in transfers that would cause us to transfer too much
data sometimes, which was this fix in 1.17:

* pack <=> pack fetching is now done via a ``PackStreamSource`` rather
  than the ``Packer`` code. The user visible change is that we now
  properly fetch the minimum number of texts for non-smart fetching.
  (John Arbash Meinel)

However that was only for dumb transfers, not transfers using bzr+ssh.
And if your client was 1.17, that wouldn't be relevant anyway.

I also see some really large content being transferred:
1100.261 638842 byte part read

^- This sort of thing would generally indicate a single content that is
640kB, and I see a lot of them. Did you perhaps have some large objects
added, modified a few times, and then removed? I see some files being
versioned like:
[23682] 2009-07-27 12:26:47.673 INFO: +N CSharpProjects/lib/log4net.dll
[23682] 2009-07-27 12:26:47.673 INFO: +N
CSharpProjects/lib/nunit-console-runner.dll

^- And versioning DLLs often creates a lot of churn in large files.

I would also be curious to find out how large the repository is overall.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkpt0+MACgkQJdeBCYSNAAMuUACfYbdIfaSSgCcZwePABmmLRQBg
GHkAn3GAXKPfFOaIWWs5Ul6k+rnCMn93
=aPhz
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 405251] Re: Huge data transfers/bad performance OVERALL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
>>> The actual new commits in that pull changes about 40 small files.
>> The log you attached contradicts this: it appears to show 100 new files, and a further 99 modified files.
> You are of course right; sorry. I did not properly scroll the screen up. Even so, the amount of data transfered was not even close to the size of the files changed.
>

200 changed items seems like potentially a lot of data. Are you sure it
isn't comparable?

> I am branching the copy now; it is already hanging for about an hour (with 100% cpu) at:
> [#########- ] Fetching revisions:Get stream source
> without any progress (a fully local action!?). The progress bar itself is not moving at all (the - stays a -)
>
> I did an strace on the process, it seems to be reading files only (very
> slowly).
>
> I will return as soon as the branch completes (if at all) 8-(
>

So how did you upgrade to 1.16.1? My first guess might be that you
didn't get the compiled extensions.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkpt1BUACgkQJdeBCYSNAAPgewCgsxnWgBMHnMEcgBhqpPIoWlZR
WxMAn220ADirnwOMFcIZjzV/l18oHGAk
=Whqu
-----END PGP SIGNATURE-----

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Ok, the branch operation completed after taking 1 hour 20 minutes(!) to branch; this used to take 20 secs. It branched 1343 revisions. Most of the hour was spent showing the exact progress bar shown before; only the last 2 minutes or so did the progress bar change.

The new branch has a single .pack file of 409MB. bzr info on that new branch states:
Standalone tree (format: 1.14-rich-root)
Location:
  branch root: .

Related branches:
  parent branch: /home/jal/bzr/vp-trunk

Format:
       control: Meta directory format 1
  working tree: Working tree format 5
        branch: Branch format 7
    repository: Packs 6 rich-root (uses btree indexes, requires bzr 1.9)

In the working tree:
     14956 unchanged
         0 modified
         0 added
         0 removed
         0 renamed
         0 unknown
         0 ignored
      1699 versioned subdirectories

Branch history:
      1343 revisions
       416 days old
   first revision: Thu 2008-06-05 23:32:54 +0200
  latest revision: Fri 2009-07-24 18:59:42 +0200

Repository:
      6706 revisions

The "old" (source) repository is a shared repository (the branch directory there has no .bzr/repository); the shared repo contains the following .pack files:
-rw-r--r-- 1 jal jal 105855630 2009-07-06 12:21 01125177ff170f19a5350760681788d2.pack
-rw-r--r-- 1 jal jal 92508459 2009-07-15 10:13 044a6c79c07933c817bfaa5fe2aefaea.pack
-rw-r--r-- 1 jal jal 3693532 2009-07-15 17:56 134fa02d2b7f2428fef7e2b9d7bf0849.pack
-rw-r--r-- 1 jal jal 6706520 2009-07-08 11:39 3dd7fdd11a10663a6d9c7f05731a96d9.pack
-rw-r--r-- 1 jal jal 76707414 2009-07-27 12:26 3debfa32e17c04c3fce3cd1c7296b24a.pack
-rw-r--r-- 1 jal jal 8902487 2009-07-17 19:16 3f13b74bb82f12fcd01fb72b1cbe96bc.pack
-rw-r--r-- 1 jal jal 13907569 2009-07-06 14:05 4bfb7e9120c6d8d69565ef53f0bb1903.pack
-rw-r--r-- 1 jal jal 21023993 2009-07-06 12:52 74a9c2ca690d44152c80204c4b21b295.pack
-rw-r--r-- 1 jal jal 91094724 2009-07-06 13:57 a007b533ab4cdad5ba747a4842d1fe52.pack
-rw-r--r-- 1 jal jal 13687 2009-07-27 13:47 a1a66860581e95dd23dfdad253a10ac9.pack
-rw-r--r-- 1 jal jal 1782016 2009-07-27 15:59 c14de2d325ad282799f20fa110d15e80.pack
-rw-r--r-- 1 jal jal 22923037 2009-07-06 17:53 d0798f5a75b83fa45bad16ac864f65cb.pack
-rw-r--r-- 1 jal jal 6156224 2009-07-16 17:44 e201f81bcef47df7b627af92d6fc8a5a.pack
-rw-r--r-- 1 jal jal 22157957 2009-07-17 12:44 eec828493036f356003a05938bc734e0.pack
-rw-r--r-- 1 jal jal 1500511 2009-07-08 11:39 f9c37aa04ed5b5aa45f09a35e311079a.pack
-rw-r--r-- 1 jal jal 4255021 2009-07-27 14:54 fcfaf7d3e3ad5813bc77728e30f9abaf.pack
-rw-r--r-- 1 jal jal 1900754 2009-07-17 19:24 fdc2bec64f92070973c037da7fcaf4c0.pack
totalling 459MB. What is remarkable is that there are a few files with todays change date (2009-07-27); the largest is 76MB while the rest is very small.

After the bzr pack (which took around 2 minutes) in the shared repo (not the branch dir again) there was a single .pack file left with a size of 459MB.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

wrt #5:
I am using a smart server on 1.16.1 for the remote part, so no bzr+ssh or "dumb" protocol I'd guess.

As far as the .dll's are concerned: the one you mentioned seems to have been added only once; I did a bzr log --verbose on the CSharpProjects path and it contains the DLL only once. The entire CSharpProjects directory is around 650K so it is the largest DLL there.

The repository's size: the total size of repository + working tree in the standalone (non-shared copy) is 721MB. The .bzr directory there is 456MB. bzr info -v there is reported in comment #7.

wrt #6:
The changes would represent a week worth of work by about 7 developers. My repository was up-to-date with the master on 2009-7-19; I did a "bzr diff -r date:2009-07-18" and it creates a diff of 688KB but this of course does not contain the content of any binaries, sadly enough. Is there any option to create a full patch including the content of added binaries? This would at least give an indication on the size of the delta.

All of the data here was collected using bzr 1.17; I got it for free this morning when I updated my Ubuntu machine (I have the bzr repo added to sources.list). I upgraded to 1.16.1 the same way. I do not know what you mean by "the compiled extensions"; is there any way I can check?

For extra info, please be aware that not just pull is slow here; I have logged bug 405317 for the other, mostly local actions that hardly work anymore. And those problems are also present on the machines of my colleagues which usually have the bad luck of running Windows.

Revision history for this message
Wouter van Heyst (larstiq) wrote :

On Mon, Jul 27, 2009 at 04:55:39PM -0000, Frits Jalvingh wrote:
> wrt #5:
> I am using a smart server on 1.16.1 for the remote part, so no bzr+ssh or "dumb" protocol I'd guess.

Is that bzr:// ? bzr+ssh is also considered a smart transport fwiw.

Wouter van Heyst

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Yes, it is bzr://

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Just did another pull in another copy of the repo on a machine I have at home. This one was up-to-date till friday last week. It pulled some 125MB taking 27 minutes; the result in files was:
+N moca.database/src/nl/itris/viewpoint/db/wfl/
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowActivity.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowActivityCondition.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowActivityDefinition.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowActivityType.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowDefinition.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowDisplayCode.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowItrisCode.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowSignal.java
+N moca.database/src/nl/itris/viewpoint/db/wfl/WorkflowType.java
+N vp-domui-components/src/nl/itris/vp/components/basic/Images.java
+N vp-qd/src/nl/itris/qd/genhib/
+N vp-qd/src/nl/itris/qd/genhib/GenHib.java
+N vp-qd/src/nl/itris/qd/misc/DumpImages.java
 M CSharpProjects/CSharpProjects.suo
 M CSharpProjects/TapiProxy/Model/State/DeviceState.cs
 M CSharpProjects/TapiProxy/Program.cs
 M CSharpProjects/TapiProxy/config.xml
 M viewpoint/Oracle/pck/bae_prolongeren.pck
 M viewpoint/Oracle/pck/bem_bewonermutaties.pck
 M viewpoint/Oracle/pck/tda_pmo.pck
 M viewpoint/Oracle/views/v_isi_huurontvangsten.vw
 M viewpoint/WebContent/iRed/Bae/Facturatie/Bae_facturenEditPage.jsp
 M viewpoint/WebContent/iRed/Bae/Facturatie/Gen_cst_lookupForm.jsp
 M viewpoint/WebContent/iRed/Kon/Maintenance/Con_contractenEditPage.jsp
 M viewpoint/WebContent/iRed/Red/Template/Document_templatesEditPage.jsp
 M viewpoint/src/iRed/Bae/Facturatie/bae_betaaltermijnenBean.java
 M viewpoint/src/iRed/Bae/Facturatie/bae_facturenBean.java
 M viewpoint/src/iRed/Kon/Maintenance/con_contractenBean.java
 M viewpoint/src/iRed/Red/Template/document_templatesBean.java
 M viewpoint/src/nl/itris/vp/telephony/ConcurrentModificationRetry.java
 M viewpoint/src/nl/itris/vp/telephony/Transaction.java
 M vp-domui-components/src/nl/itris/vp/components/basic/VpTitleBar.java
 M vp-telephony/doc/documentatie_telefooncentrale_koppeling.doc

All of these files are small; by far the biggest is the .doc file at 222KB. I did another pull on this machine thursday last week which also transferred a huge amount of data (in the range of 76MB or more).

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I pulled two tiny commits from the main repo in the repository I created for the test above. I could not see how much data it transferred but it took a very long time; the changes to both workspace and repo were tiny: some very small files added and some changes. I am also very sure that this did not contain lots of "history", so no hidden huge files added and deleted in between. What is remarkable in this log is the huge amount of "xxxx byte part read" messages (39900 lines!!) taking a long time in all; if every message here also means a full round trip to the server this at least explains the bad performance here (end-to-end ack delay).
What is also strange is that it starts doing that after 113 seconds; what is it doing before that!?

Revision history for this message
Andrew Bennetts (spiv) wrote :

Frits Jalvingh wrote:
[...]
> between. What is remarkable in this log is the huge amount of "xxxx byte part
> read" messages (39900 lines!!) taking a long time in all; if every message
> here also means a full round trip to the server this at least explains the bad
> performance here (end-to-end ack delay).

Each of those lines corresponds to a single record in bzr's database,
more or less. There isn't a full round trip to receive each one, the server
simply streams them to client continuously. (You should be able to verify this
with tcpdump or similar pretty easily, if you're curious.)

The number of these lines indicates that there's either a lot of revisions
(possibly merged revisions), a lot of new file versions, or both. The size of
some of the byte parts (e.g. 681544) indicates that there have been some very
large files added (or very large modifications to files) somewhere in the
history.

> What is also strange is that it starts doing that after 113 seconds; what is
> it doing before that!?

The server is figuring out what to stream, basically. This phase should be
faster (and there are bugs and changes in progress to address that), but for
such large streams it's pretty common for this to take a couple of minutes.
This is another symptom of whatever causing such large transfers in the first
place.

-Andrew.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Bug #405653, _knit_load_data_pyx not being used is possibly a factor here. It affects 1.17 but not 1.16.1 AIUI.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

> The number of these lines indicates that there's either a lot of revisions (possibly merged revisions),
> a lot of new file versions, or both.
Ok, I'm attaching here the log for the revisions that were added in this pull as displayed by bzr log -n0 --verbose -r1344..1346 (which are the new revisions). I edited out the email adresses 8-/. This was not a large merge in my view, and it most certainly did NOT contain large files. But an earlier commit did (see above; the ones in the CSharpProjects dir)! But these WERE already present in that repo! So it would seem that the code selects something like all revisions, not just the new ones.
The only thing "out of the ordinary" for this merge was that it contained a manual fix for another bazaar bug where it merged incorrectly (reported as bug 405301). This is probably unrelated but at least you know (yesterday was a very bad day as far as Bazaar bugs are concerned 8-/).

> for such large streams it's pretty common for this to take a couple of minutes.
I am not certain what you mean here? You should know that this is an existing repo which has not grown that much in size and has been used for over a year now; and most operations have always been very fast on it (without large data transfers). I noticed the large transfers when upgrading to 1.16, and that was also the version where all other commands got very, very slow.
It looks to me that "the stream is large" because something is wrong in the process which determines which new revisions are needed perhaps?

#14: Please remember that I AM using 1.17. My colleagues are using 1.16 and have much the same problems; but the logs and data you see here are 1.17.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

Andrew, it is unlikely that bug #405653 is causing this. It would only be an effect if he was using a *knit* format repository, as _load_data is used to parse .kndx files and nothing else.

And if you look at the log trace, he is using a KnitPackRepository, which rules out bug #405653.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I don't know whether it is related, but branching is no longer workable either. Locally branching the repository, not using any server or whatnot, now takes more than 2 hours (it is actually still running) using 1.17. While doing it Bazaar grows to use 1.6GB of resident memory.

The branch is done in a new, empty shared repo using init-repo --1.14-rich-root.

I did exactly the same branch using bzr 1.14; it completed in a few minutes (it was slow because the other bzr was still burning 100% CPU and doing IO).

Bazaar now:
- cannot commit in any reasonable time
- cannot pull in any reasonable time
- cannot push in any reasonable time
- cannot branch in any reasonable time
This version is completely unusable.

I am preparing to move back to 1.14.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Ok, I did some tests. I tested some things with different bzr versions, after removing all plugins and the rules file. This should mean that content filtering is disabled. For each version I created a new shared repo (init-repo --1.14-rich-root, --rich-root-pack for bzr 1.8). I then branched the problematic branch into here (full copy of history). Then I branch within that shared repository (this should just create a working tree).

Bazaar 1.17:
Initial branch (creating full history: 152 minutes 21 seconds(!!!) - 2 1/2 hour. The .bzr of the shared repo was 496MB; the working tree is 269MB.
Branch within shared repo: 3 minutes 19 seconds. Usually took max. 20 seconds.

Bazaar 1.14:
Initial branch (full history): 7 minutes 45 seconds. .bzr is 497MB, working tree is 269MB.
Branch within shared repo: 6 to 9 seconds.

Bazaar 1.8: (--rich-root-pack, from copy of the problem repo in rich-root-pack format)
Initial branch (full history): 8 minutes 23 seconds, .bzr is 557MB, working tree is 269MB
branch within shared repo: 6 seconds.

This is on a fast Linux workstation. The poor devils using windows can easily wait half an hour for the 3 minute branch on 1.17 above.

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/5 Frits Jalvingh <email address hidden>:
> Ok, I did some tests. I tested some things with different bzr versions,
> after removing all plugins and the rules file. This should mean that
> content filtering is disabled. For each version I created a new shared
> repo (init-repo --1.14-rich-root, --rich-root-pack for bzr 1.8). I then
> branched the problematic branch into here (full copy of history). Then I
> branch within that shared repository (this should just create a working
> tree).

Thanks for measuring that. Could you now check with --lsprof.

I wonder if this is related to binary modules not being built?

What package did you use to install bzr?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I installed bzr 1.17 from the ppa repository.

The lsprofs for 1.17 are attached; I'll attach the 1.14 ones after that.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

So interestingly both of the "branch in shared repo" times seem to be dominated by the _build_tree times.

And it would seem that the time is spent in the index level.

Specifically, on "bzr-1.17-secondbranch-3m22s" 75% of the time is spent in _btree_serializer_pyx._parse_leaf_lines

It is also being called 95k times.

If you look at the specific difference between this and the 'bzr.1.14" revision, you can see that in 1.14 we are copying the source files directly from the original tree, rather than extracting the texts from the repository.

Arguably this is incorrect behavior, because the source tree may have a different set of filters than the target, so it isn't necessarily valid to just copy the contents across. (This is at least part of what changed in 1.17 from 1.14, I would assume.)

You could do a few things to test:

1) Instead of "bzr init-repo --1.14..." use "bzr init-repo --no-trees --1.14..." and see how that differs. That would give you a timing run *without* building the working tree. (If you don't want to do the copy again, you can also just do touch .bzr/repository/no-working-trees

And then check to see how the "branch inside shared repository" time changes.

2) Remove the workingtree from the source branch, so that bzr-1.14 won't try to copy the source files from there.

That would give us a data point to see if bzr-1.14 was, in fact, faster at extracting the content than 1.17.

Hitting the index 95k times seems like a bit much for the given operations. Though I'll note that I've fixed at least one bug in this area (where when reading fulltexts from the repository, we would actually read the delta content 2 times, rather than just once).
From what I can tell, there are no more than 13k file texts (13,319 calls to get_bytes_as), but 95,392 calls into the index layer to get content out.

I'm also a bit surprised that it would seem that none of those repeated accesses are already in the cache. I might be reading that wrong. As there looks to be 191k calls to BTreeIndex.iter_entries() (as a generator, that means it is yielding 191k records.)

I'll try looking at the "first copy" cases, to see if that sheds any more light on the subject.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.8 KiB)

Just in my 10s quick look, I can see that we are spending 44% of the time in "_btree_serializer_px._parse_leaf_lines"

I can also see that part of the issue might be that we know use the StreamSource code path, rather than the Packer code path, though I don't have a good feel for why it would be that much slower.

In 1.14, 75% of the time (429k ticks) was spent in "create_pack_from_packs" (aka fetching via Packer)
versus in 1.17 90% of the time is spent in 'insert_record_stream' (which includes the time to read the source data as the source is a generator so isn't consumed until you start inserting the data.)

insert_record_stream is spending a *lot* of time in _has_key_from_parent_map, which I believe is the insert_record_stream code trying to see if the entry it is adding is already present. It turns out to be 52% of the overall time.

for some more side-by-side comparisons:
1) In your bzr-1.14 source, you don't even have the compiled extensions and we have
 37,837 calls to bzrlib._btree_serializer_py._parse_leaf_lines, consuming 95k ticks (2.5ticks each)
2) In bzr-1.17 we have
  469,030 calls to bzrlib._btree_serializer_pyx._parse_leaf_lines consuming 8.8M ticks (18.8 ticks each)

So while we have 12x more calls, they are also costing 7.5x more each, for a total increase of 92x at the index layer while it is trying to read the btree pages.

Now, I don't know how many entries you have in your btree, to have a good idea of how many pages we *should* be reading, or if it is enough pages to be causing us to overflow (and then thrash) a cache.

It is possible, I suppose, that the compiled parsing code is actually slower than the plain python code. That would be easy to test by just deleting the _btree_serializer_pyx.* files (.so on Linux, .pyd on Windows).

The 18x, though, could just be that the function is getting called so much more often and it is causing the python Garbage Collector to run while it is in the middle of processing. (it is a function that would be allocating a lot of objects, so it is more likely to have the GC get run.)

I *am* seeing that it is calling BTreeIndex._iter_smallest, which is a function that is generally only triggered when we have >100k records being written.

And, in fact, I'm seeing an "iter_all_entries" returning 1.4M records. Though I don' t have a feel for whether that is genuine, or is just because we are calling it more than once.

I *can* say that if you have 1.4M keys being transferred, then we will have a lot of index thrashing during insert_record_stream. Because we will try to cap memory consumption and flush 100k records to disk. Which we then query later during a followup insert, to check if the thing we are inserting now was inserted earlier.

I suppose one quick check is to give us the output of:
 head -n 5 .bzr/repository/indices/*

That would give us an indication of how many real records you have stored in the repository. (Which helps track down if you really do have ~1M records, or whether we are just running over the same records over and over again.)

Now, I think I'm seeing 7 calls to _spill_mem_keys_to_disk in the 1.14 callgrind. Which would hint at 700,000 keys being copied. And *that*...

Read more...

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

heads:
==> 5330e96e278ff9ea7408111227e0c8b1.iix <==
B+Tree Graph Index 2
node_ref_lists=2
key_elements=1
len=3
row_lengths=1

==> 5330e96e278ff9ea7408111227e0c8b1.rix <==
B+Tree Graph Index 2
node_ref_lists=1
key_elements=1
len=3
row_lengths=1

==> 5330e96e278ff9ea7408111227e0c8b1.six <==
B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=0
row_lengths=

==> 5330e96e278ff9ea7408111227e0c8b1.tix <==
B+Tree Graph Index 2
node_ref_lists=2
key_elements=2
len=19
row_lengths=1

==> 99a0078750de654f7eee8188ec397f4a.iix <==
B+Tree Graph Index 2
node_ref_lists=2
key_elements=1
len=6789
row_lengths=1,85

==> 99a0078750de654f7eee8188ec397f4a.rix <==
B+Tree Graph Index 2
node_ref_lists=1
key_elements=1
len=6789
row_lengths=1,71

==> 99a0078750de654f7eee8188ec397f4a.six <==
B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=0
row_lengths=

==> 99a0078750de654f7eee8188ec397f4a.tix <==
B+Tree Graph Index 2
node_ref_lists=2
key_elements=2
len=785402
row_lengths=1,83,12950

I will do the patch from #26 in 1.17; and do the full copy again in an empty repo, with trees (like before).

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> heads:

> ==> 5330e96e278ff9ea7408111227e0c8b1.tix <==
> B+Tree Graph Index 2
> node_ref_lists=2
> key_elements=2
> len=19
> row_lengths=1
>

> ==> 99a0078750de654f7eee8188ec397f4a.tix <==
> B+Tree Graph Index 2
> node_ref_lists=2
> key_elements=2
> len=785402
> row_lengths=1,83,12950
>
> I will do the patch from #26 in 1.17; and do the full copy again in an
> empty repo, with trees (like before).
>

so 19 + 785,402 = 785,421 or almost 800k nodes

Also, for indexes, we keep a cache of 1,000 "nodes". And you can see
that this last one is >10,000 nodes, which means our cache hit rate is
going to be quite low.

If you want to try something else, you could do:

=== modified file 'bzrlib/btree_index.py'
- --- bzrlib/btree_index.py 2009-07-01 10:51:47 +0000
+++ bzrlib/btree_index.py 2009-08-06 01:46:03 +0000
@@ -46,7 +46,7 @@
 _PAGE_SIZE = 4096

 # 4K per page: 4MB - 1000 entries
- -_NODE_CACHE_SIZE = 1000
+_NODE_CACHE_SIZE = 20000

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp6NfUACgkQJdeBCYSNAAMqEQCgqXYhdlt4rbExskc5fy+L1RMl
lqcAoJXCW0W/52XgywuSolHtlkd/zYB1
=PZhe
-----END PGP SIGNATURE-----

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I did the oneliner in #26 and did a full copy into empty repo again, with trees. It took 207 minutes 50 seconds this time. I am attaching the lsprof from that.

I will do the patch in #28 next; so bzr will have 3 patches now.

You mention caching a lot; while bzr is running it grows to something like 1.9 GB resident memory, so it is at least keeping *something* in memory.

I have a few questions in the meantime:
- What is being kept in the indices?
- What do the 800K nodes represent?

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

#28: Ok; the cache size change had a dramatic effect: the full copy branch now completed in 21 minutes 14 seconds.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

#25: Initial branch using the #28 and #26 fix AND with --no-trees took 15 minutes 23 seconds
Second branch took 0.7 seconds (since it does not have to do anything ;-)

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

#25: bzr 1.14 with tree in source branch took 7.8 seconds.
After removing all working tree files from the source branch creating the new branch, including working tree, took 5 minutes 6 seconds. So it would indeed seem that retrieving the sources from the repository is very slow (compared with the direct copy).

Revision history for this message
Matthew Fuller (fullermd) wrote :

> (If you don't want to do the copy again, you can also just do touch
> .bzr/repository/no-working-trees

We have a reconfigure option for that.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Pull and push performance is still abyssmal. I'm currently waiting for a trivial push containing a single small commit (the repo is one commit behind). It has transferred 15MB and is now pausing for no appearent reason; I'm already waiting for 15 minutes. The amount of data changed is very, very small and nowhere near the 15MB transferred.
Yesterday I attempted a pull of a week's worth of commits; it took 4 hours and transferred some 300MB of data - the evening was a total loss. My colleagues work hard but *not* that hard 8-/.

Actually, appearently the pull will never complete. It transferred 15MB then hangs. I did an strace on the process; it hangs on a recvfrom(4); FD(4) is the TCP/IP connection to the smart server (bzr 14087 jal 4u IPv4 884639 TCP 172.22.128.28:41612->192.168.0.110:4155 (ESTABLISHED)]. I did an strace on that smart server too to see what it was doing:
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAKE, 1) = 1
poll([{fd=4, events=POLLIN}], 1, 1000) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAKE, 1) = 1
poll([{fd=4, events=POLLIN}], 1, 1000) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAKE, 1) = 1
poll([{fd=4, events=POLLIN}], 1, 1000) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x937a5a8, FUTEX_WAIT, 0, NULL) = 0
futex(0x937a5a8, FUTEX_WAKE, 1) = 1
poll([{fd=4, events=POLLIN}], 1, 1000) = 0

so essentially it seems to check if data is arriving and it's doing nothing at all. I checked the target directory and it is locked by that smart server process; and it was locked by my push (I checked that beforehand). It seems like a nice deadly embrace: the server is waiting for data from the client and vice versa.

I am ready to start crying now.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> Pull and push performance is still abyssmal. I'm currently waiting for a trivial push containing a single small commit (the repo is one commit behind). It has transferred 15MB and is now pausing for no appearent reason; I'm already waiting for 15 minutes. The amount of data changed is very, very small and nowhere near the 15MB transferred.
> Yesterday I attempted a pull of a week's worth of commits; it took 4 hours and transferred some 300MB of data - the evening was a total loss. My colleagues work hard but *not* that hard 8-/.
>
...

> so essentially it seems to check if data is arriving and it's doing
> nothing at all. I checked the target directory and it is locked by that
> smart server process; and it was locked by my push (I checked that
> beforehand). It seems like a nice deadly embrace: the server is waiting
> for data from the client and vice versa.
>
> I am ready to start crying now.
>

There should be no case that we take out a write lock on the source
repository in order to read data.

I really don't know why the remote server would be trying to take out a
lock on your local machine. I'll try to investigate the callgrind, but I
can tell you already that 91% of the time is spent in "UNTIL_NO_EINTR"
which is just waiting for data on a socket.

Trying to pull that out, we end up with about 1.8M ticks in
"until_no_eintr", and that leaves ~94k ticks actually doing processing.
I *think* a tick is 0.1ms, so that is 94s or 1m34s sending data.

Now *what* is it doing during that time? It is possible that your push
is triggering an autopack on the server. And because of the size of your
server and memory available (and the earlier caching concern) the server
is basically thrashing on repacking the repository.

One way around it is to go to the server and manually issue "bzr pack".
Which forces a manual one, and then should decrease the frequency with
which you will see future autopacks. With 14k revisions, the next major
pack you should see shouldn't occur until after 20k revisions.

The 785k entries I mentioned earlier. That is the number of "unique
content texts" present in your repository.

So say you had 100k files, and each one was modified (and committed) 7
times. That would be 700k entries (text keys is our internal term for it).

You mentioned that this repository:
1) Used to have a lot more than ~10k files
2) Had multiple independent ancestries pulled together.
3) Has about 10-20k revisions.

I'm not positive about the specifics, but if you have that many text
keys, and only 10k revisions, that means that you average almost 80
files changing for every commit. Which is generally quite surprising.
Almost universally across projects the average is <10, and quite often
it is down around 5 or so. (This tends to happen because Humans don't
usually work on 80-files simultaneously on *every* commit.)

I could be tremendously wrong on how many revisions you have, but if the
'head -n5' result you posted is accurate, I'm only seeing 3+6789
revisions, which would imply 115 changes per commit.

If some of this bloat is due to files that you have since removed from
the t...

Read more...

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

> There should be no case that we take out a write lock on the source repository in order to read data.
I think you misunderstood; it is a push TO the target, and the target server has the lock. That is the server that the push data is being written to.

> 91% of the time is spent in "UNTIL_NO_EINTR" which is just waiting for data on a socket.
This is what I see: the client is deadlocked waiting for the server to sent data (it is in a recv).

> so that is 94s or 1m34s sending data.
That seems correct; it started to transmit 15MB of data to the server and then it blocked. Sending that data will take that time, the server is on a remote link. So the server was not packing at that time probably. (There is some huge problem here where bzr sends or receives way too much data for push/pull; the 200+ MB pull is not an exception but was seen a few times before. I get the impression that it reloads revisions it has already seen from the server time and time again.)

When the client (doing the push) was blocked (after the 2min) I checked the server and did the strace there; that server did not do anything at all. I tried again using bzr+ssh protocol which actually starts a bzr serve remote using an ssh pipe; that one hung at exactly the same time and doing an strace there revealed that it did completely nothing- it hung in a recv like the client (it did not poll or anything and did NO IO at all, the strace showed a single line for 10 minutes, then I cancelled). So this is 100% a protocol error somewhere - the server is *not* packing at all but blocking on recv.

Ok, back to the other stuff:
> You mentioned that this repository:
> 1) Used to have a lot more than ~10k files
No, when experimenting with bzr I found out that the #of files was a huge problem under Windows. So before I created this repository (from scratch, no import or whatever) I removed all files *before* creating the initial import. It has never seen the bigger number of files.

> 2) Had multiple independent ancestries pulled together.
It has been "merged with history" with a single other independent branch. That branch's bzr info (when in a standalone repo, zeroes removed) says:
In the working tree:
      1369 unchanged
       168 versioned subdirectories
Branch history:
       190 revisions
       127 days old
Repository:
       202 revisions

> 3) Has about 10-20k revisions.
I don't know if we are talking about the same thing, but bzr info -v for this repo tells me:
Branch history:
      1380 revisions
       426 days old
Repository:
      7265 revisions

> that means that you average almost 80 files changing for every commit
Well we do not work that hard at all 8-), actually your metrics sound very much like what we do (5..10 files average) - it is nowhere close to 80. There have been only a few (2 or so, but including merges) that changed 3143 files in one commit. But those were exceptional ones (CRLF fixes and source formatting fixes).

So all in all I do not understand the metrics we see here. Is there a way to associate those 100K text changes with their commits so that we can see if this is actually the case?

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

In the meantime I upgraded the server from 1.16.1 to 1.17; this at least allowed the push that was blocking earlier to continue.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I'm sorry, I lied a little.. There are two independent repo's merged with history. The other one's bzr info says:
In the working tree:
       805 unchanged
       130 versioned subdirectories
Branch history:
        15 revisions
       127 days old
Repository:
        15 revisions

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> I'm sorry, I lied a little.. There are two independent repo's merged with history. The other one's bzr info says:
> In the working tree:
> 805 unchanged
> 130 versioned subdirectories
> Branch history:
> 15 revisions
> 127 days old
> Repository:
> 15 revisions
>

Yeah, still tiny. With only 15 revisions, it would require 53k files to
be present and modified *every* revision in order to get 800k changes.

Regardless, the "15 revisions" given in the repository are also included
in the other repository's ~7000 revisions that were listed.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp7PoEACgkQJdeBCYSNAAOCbQCgpwcQN93CHx9W+Dcmq+/Ou71H
acoAn28UM0N4Ykp+8Pfk1u4zh8CUhBpe
=eo4G
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (5.4 KiB)

Frits Jalvingh wrote:
>> There should be no case that we take out a write lock on the source repository in order to read data.
> I think you misunderstood; it is a push TO the target, and the target server has the lock. That is the server that the push data is being written to.
>

So the server has a lock on the repository you are pushing to. Fair enough.

>> 91% of the time is spent in "UNTIL_NO_EINTR" which is just waiting for data on a socket.
> This is what I see: the client is deadlocked waiting for the server to sent data (it is in a recv).
>
>> so that is 94s or 1m34s sending data.
> That seems correct; it started to transmit 15MB of data to the server and then it blocked. Sending that data will take that time, the server is on a remote link. So the server was not packing at that time probably. (There is some huge problem here where bzr sends or receives way too much data for push/pull; the 200+ MB pull is not an exception but was seen a few times before. I get the impression that it reloads revisions it has already seen from the server time and time again.)
>
> When the client (doing the push) was blocked (after the 2min) I checked
> the server and did the strace there; that server did not do anything at
> all. I tried again using bzr+ssh protocol which actually starts a bzr
> serve remote using an ssh pipe; that one hung at exactly the same time
> and doing an strace there revealed that it did completely nothing- it
> hung in a recv like the client (it did not poll or anything and did NO
> IO at all, the strace showed a single line for 10 minutes, then I
> cancelled). So this is 100% a protocol error somewhere - the server is
> *not* packing at all but blocking on recv.

Hmm... I was certainly starting to lean that way. It could be something
about 1.17 not being 100% compliant with 1.16.1. I don't know off-hand
any reason for that, but it is certainly a possibility.

It would have been nice if we could have gotten a backtrace of what both
the client and server were trying to do at the moment they were stalled.

For future reference, you can send SIGQUIT (Ctrl+\) to a running
process, and it should drop you into a (pdb) debugging shell. You can
use something like "bt" at that point to print the backtrace.

(If you are on Windows, then in the next release [1.18] you should be
able to do Ctrl+Pause, aka SIGBREAK. since SIGQUIT doesn't exist on
Windows.)

>
> Ok, back to the other stuff:
>> You mentioned that this repository:
>> 1) Used to have a lot more than ~10k files
> No, when experimenting with bzr I found out that the #of files was a huge problem under Windows. So before I created this repository (from scratch, no import or whatever) I removed all files *before* creating the initial import. It has never seen the bigger number of files.
>

k

>> 2) Had multiple independent ancestries pulled together.
> It has been "merged with history" with a single other independent branch. That branch's bzr info (when in a standalone repo, zeroes removed) says:
> In the working tree:
> 1369 unchanged
> 168 versioned subdirectories
> Branch history:
> 190 revisions
> 127 days old
> Repository:
> 202 revisions...

Read more...

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

> For future reference, you can send SIGQUIT (Ctrl+\) to a running....
Ok, thanks; that is good to know. I'll do it next time.

> The other really important think to check is if the local and remote repositories are exactly the same format.
Yes, they are; I have checked it several times already.

I will report back the log thing later; it will take about 1 hour to complete.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Ok. The log generated by bzr log -v -n1 -r 1..-1 results in a list from revision 1380 down to 1; the log file is 50249 lines long. That is both files and the header for each change. Would this report every file changed since the start? Will each toplevel commit report the files changed for all merges inside it?

Revision history for this message
Martin Pool (mbp) wrote :

istr spiv recently landed a change to do with a bug in buffering hpss
traffic, and it seems slightly possible this relates to it. I can't
see it in NEWS though.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin: Hmm, perhaps, although the symptoms described here sound much more severe than that would have addressed I think. I'm also fairly sure that fix is in both 1.16 and 1.17, so my guess is that isn't the issue.

A -Dhpss log of the slow push might help.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

This call has a -Dhpss log of a more-than-slow pull at the top. It is both push and pull that are transferring enormous amounts of data. I willl try to add a push -Dhpss log (I need to remember to add the -Dhpss option which I forget all the time).

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/7 Frits Jalvingh <email address hidden>:
> This call has a -Dhpss log of a more-than-slow pull at the top. It is
> both push and pull that are transferring enormous amounts of data. I
> willl try to add a push -Dhpss log (I need to remember to add the -Dhpss
> option which I forget all the time).

You can put it in the debug_flags option in bazaar.conf if you like.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

FYI:
I restored the repository from before the move to 1.16 and the switch to --1.14-rich-root and did some comparisons.
The size of the .bzr shared repository at that point (2009-07-03) was 219MB; this repo was about a year old.
The new .bzr repository in --1.14-rich-root is currently 650+MB(!) - an increase of 431MB in a month.
The revision count in the entire repo has increased by 400; the branch has 88 extra revisions (from 1292 to 1380).

As a quick compare I did a bzr diff -r 1292 and got a 53MB file. Sadly though I cannot find a way to make bzr generate a full diff so this is lacking some binary files. It also is just a small indication as history is not present of course. Is there a way to get the /full/ delta from a commit somehow? Then at least I could go hunting for the big one - if there is one (which I do not really believe, actually).

I reverted to bzr 1.14.1 and branched the original branch (1292) into a --rich-root-pack repo, then pulled the new version repo in it and did a pack. This also results in a big repo of 570MB. So there really seems to be something very big hiding in there.

There might be some relation between this unprecedented huge growth of the repo in a month's time with the large pulls.

I will try to see if I can do anything with fast-export to see what goes wrong where.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

None of the tools to manipulate a repository (rebase, fast-export, fast-import) work at all, so I did the following:
1. Create a standalone branch to the last version that was OK (1292). This causes a .bzr of 220MB.
2. Now pull the higher revisions one by one using pull -r [next] and size the .bzr directory.
This clearly exhibits the problematic pulls at certain revisions. I will add the console history for this action; I stopped testing at revision 1322 because the trend is clear.
Highlights from the log:
pull 1307: repository grows from 234MB to 249MB for a two file fix without any merge at all; the diff for this is 1764 bytes.
pull 1311: from 249 to 275MB for a change containing two small files again
pull 1318: from 276 to 398MB; this actually added 17MB of .jar files in a single commit. The pull took a long time.
pull 1320: from 398 to 414MB, pull took a long time
pull 1322: from 415 to 585MB; the pull took very long and was busy for a long time with copying "content texts". As can be seen from the console log the actual size of the workspace itself grew with 1MB for this pull while the repository grew 170MB..
I stopped at pull 1323 ending in a repository size of 586M and total dir size (repo+working tree) of 850M.

At this point I did a "bzr pack" on this repository. This helpfully created a .bzr repo of 791MB; I assumed I could remove the "obsolete_packs" (395MB) directory; this resulted in a repository of 397MB. Still very big when I see what has changed.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

I created a new call 411349 for the deadlock problem.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

bzr log -v -r1321..1322 --include-merges: for the pull -r 1322; this pull resulted in a 170MB growth of the repo while the working tree grew 1MB.

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

info in the shared repo, containing multiple branches:

jal@odeon:~/bzr$ bzr repository-details
Commits: 7328
                      Raw % Compressed % Objects
Revisions: 3928 KiB 0% 2885 KiB 0% 7328
Inventories: 21695982 KiB 70% 108661 KiB 19% 7328
Texts: 9040343 KiB 29% 453880 KiB 80% 1017118
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 30740255 KiB 100% 565427 KiB 100% 1031774

In a standalone repository that was pulled -r 1322:
jal@odeon:~/vp-1322$ bzr repository-details
Commits: 6622
                      Raw % Compressed % Objects
Revisions: 3573 KiB 0% 2617 KiB 0% 6622
Inventories: 19979626 KiB 84% 55881 KiB 16% 6622
Texts: 3722632 KiB 15% 288029 KiB 83% 388846
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 23705832 KiB 100% 346527 KiB 100% 402090

Standalone repo up to the current commit level:
jal@odeon:~/vpstandalone$ bzr repository-details
Commits: 6812
                      Raw % Compressed % Objects
Revisions: 3675 KiB 0% 2691 KiB 0% 6812
Inventories: 20700521 KiB 73% 90470 KiB 19% 6812
Texts: 7292144 KiB 26% 370923 KiB 79% 812257
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 27996341 KiB 100% 464086 KiB 100% 825881

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.3 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> info in the shared repo, containing multiple branches:
>
> jal@odeon:~/bzr$ bzr repository-details
> Commits: 7328
> Raw % Compressed % Objects
> Revisions: 3928 KiB 0% 2885 KiB 0% 7328
> Inventories: 21695982 KiB 70% 108661 KiB 19% 7328
> Texts: 9040343 KiB 29% 453880 KiB 80% 1017118

^- The raw size of inventories is expected to be quite large in this
format. (it gets a lot better in --2a).

You can see, though that the space on disk is approx 110MB of inventory
metadata and 450MB of file content. You can also see that it is storing
~1 Million file texts.

> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 30740255 KiB 100% 565427 KiB 100% 1031774
>
> In a standalone repository that was pulled -r 1322:
> jal@odeon:~/vp-1322$ bzr repository-details
> Commits: 6622
> Raw % Compressed % Objects
> Revisions: 3573 KiB 0% 2617 KiB 0% 6622
> Inventories: 19979626 KiB 84% 55881 KiB 16% 6622
> Texts: 3722632 KiB 15% 288029 KiB 83% 388846
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 23705832 KiB 100% 346527 KiB 100% 402090
>
>
> Standalone repo up to the current commit level:
> jal@odeon:~/vpstandalone$ bzr repository-details
> Commits: 6812
> Raw % Compressed % Objects
> Revisions: 3675 KiB 0% 2691 KiB 0% 6812
> Inventories: 20700521 KiB 73% 90470 KiB 19% 6812
> Texts: 7292144 KiB 26% 370923 KiB 79% 812257
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 27996341 KiB 100% 464086 KiB 100% 825881
>

^- And after branching we still have 812,000 file texts.

I don't know why your original repo shows about 200k more file texts,
and 1.1k revisions than the "repo up to the current commit level".

However, the numbers are at least consistent, even if they are surprising.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://...

Read more...

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :

Ok, the huge amount of texts is horribly clear. Just for info, the repository one month ago, before moving to bzr 1.16 (revno 1292) has the following stats:

jal@mabillon:~/vp-1292$ bzr repository-details
Commits: 6408
                      Raw % Compressed % Objects
Revisions: 3463 KiB 0% 2534 KiB 1% 6408
Inventories: 19270232 KiB 95% 25390 KiB 11% 6408
Texts: 848675 KiB 4% 185519 KiB 86% 51993
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 20122370 KiB 100% 213445 KiB 100% 64809

So somehow the #of texts has increased from 51993 to a horrible 812257.. Quite an increase. This increase seems to be in some 400 revisions so: (812257-51993)/400=1900 delta files per revision if i'm correct? Quite steep, that.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frits Jalvingh wrote:
> Ok, the huge amount of texts is horribly clear. Just for info, the
> repository one month ago, before moving to bzr 1.16 (revno 1292) has the
> following stats:
>
> jal@mabillon:~/vp-1292$ bzr repository-details
> Commits: 6408
> Raw % Compressed % Objects
> Revisions: 3463 KiB 0% 2534 KiB 1% 6408
> Inventories: 19270232 KiB 95% 25390 KiB 11% 6408
> Texts: 848675 KiB 4% 185519 KiB 86% 51993
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 20122370 KiB 100% 213445 KiB 100% 64809
>
> So somehow the #of texts has increased from 51993 to a horrible
> 812257.. Quite an increase. This increase seems to be in some 400
> revisions so: (812257-51993)/400=1900 delta files per revision if i'm
> correct? Quite steep, that.
>

I don't have any idea why 1.16 would be causing something like this. Are
you sure something didn't change inside your process?

Anyway, yes. The closest from your earlier postings is:
> Commits: 6622
> Raw % Compressed % Objects
> Revisions: 3573 KiB 0% 2617 KiB 0% 6622
> Inventories: 19979626 KiB 84% 55881 KiB 16% 6622
> Texts: 3722632 KiB 15% 288029 KiB 83% 388846

so 6622 - 6408 = 214 revisions, and 388,846 - 51,993 = 336,853 / 214 =
1,574 changes per revision.

And then on to:
> Commits: 7328
> Raw % Compressed % Objects
> Revisions: 3928 KiB 0% 2885 KiB 0% 7328
> Inventories: 21695982 KiB 70% 108661 KiB 19% 7328
> Texts: 9040343 KiB 29% 453880 KiB 80% 1017118

(1017118 - 388846) / (7328 - 6622) = 628,272 / 706 = 889

Also, you can note that your "inventories" size went from:
19,270,232 to 19,979,626 to 21,696,982

Which doesn't look like it is increasing by much overall. (So it
wouldn't seem that you are adding a lot of new files.) But the
"compressed" size went from 25MB to 108MB, or almost 4x growth,
indicating that it thinks you have a lot of new entries.

Now, it is also possible that we have a bug. For example, what if
'heads()' was giving incorrect information, and causing us to generate a
new revision for texts that didn't actually need it. That would also
point to a lot of the other issues you've been seeing.

Certainly it is worth investigating. Though I'd also be curious as to
why we haven't been seeing this with other projects.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqA7z...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-08-11 at 04:10 +0000, John A Meinel wrote:
>
> Now, it is also possible that we have a bug. For example, what if
> 'heads()' was giving incorrect information, and causing us to generate
> a
> new revision for texts that didn't actually need it. That would also
> point to a lot of the other issues you've been seeing.
>
> Certainly it is worth investigating. Though I'd also be curious as to
> why we haven't been seeing this with other projects.

I wonder if the repeated-merge-from-branch pattern is implicated.

We do have tests for not making new heads we don't need, but its
possible there is a bug.

I'm not sure if we'd show any output when that happens..

-Rob

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Download full text (4.5 KiB)

I was forced by our management to do something quick because all 15 developers could hardly get
any work done. I decided to fully recreate our repositories without using anything from the old
ones, loosing all history in the process. We did this because:
* our old repo was appearently gravely damaged in some way
* we tried for more than a week to get some idea of how to fix it but there was no appearent end date for a fix
* Performance went from 30 minutes for a pull to over 2hours for one in a week - not a good trend
* Commit performance also went downhill very fast
* I guessed that we might find the problem, but it would probably be too much to ask for a "fix program" for
  the failed repository. So we would end up creating new ones anyway.
* Since the impression arose that we were the only ones having this problem we it could be that it
  came from something special we did; the only special things were the conversion to 1.14-rich-root and
  a merge-with-history of another, independent branch which could only be repeated a few times; after
  that it hit bug 364305. We hoped one of these caused the corruption; by starting anew without doing any
  of these we hoped the problem was gone.

We were wrong, sadly enough.
At commit 29 I noticed a very long pull, transferring 15MB again. I had not checked for 2 days because there
was a huge backlog of work to complete. But at this point (commit 29) I already have >100.000 texts in
the repository again.

I created the new repositories by just creating an empty, new repository
and adding the files as an initial commit. I started with our oldest software release (3.1) and created an
initial branch from that; I then branched that one to make the 3.2 branch; did a file sync on it so
the 3.2 branch got all of the files belonging to it and commited that (in effect creating 3.2 as a
single big delta from 3.1). This the same for 3.3 and the new trunk.

So I ended op with a new repository without *any* data from the previous one, and commit 4 or so was
the commit which created the trunk. At that point repository-details reported:
jal@odeon:~/vp-trunk-4/vp-trunk$ bzr repository-details ..
Commits: 4
                      Raw % Compressed % Objects
Revisions: 1 KiB 0% 1 KiB 0% 4
Inventories: 14099 KiB 4% 832 KiB 0% 4
Texts: 293404 KiB 95% 123475 KiB 99% 18320
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 307506 KiB 100% 124309 KiB 100% 18328
So 18K of texts.

The worst update after that was revision 26. Revision 25's repo-details was:
jal@odeon:~/test/vp$ bzr pull -r 25 ~/bzr/vp-split/vp
Now on revision 25.
jal@odeon:~/test/vp$ bzr repository-details ..
Commits: 39
                      Raw % Compressed % Objects
Revisions: 19 KiB 0% 14 KiB 0% 39
Inventories: 147345 KiB 27% 2956 KiB 2% ...

Read more...

Revision history for this message
Frits Jalvingh (fjalvingh) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

see bug #415508 as sort of the root cause of this.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Bazaar because there has been no activity for 60 days.]

Changed in bzr:
status: Incomplete → Expired
To post a comment you must log in.