Large Repository.insert_stream_1.19 call when fixing up inventory-for-parents.

Bug #534724 reported by William Grant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

Pushing new revisions (even empty ones) to LP branches on LP is slow and uploads about a megabyte each time. Attached is a log from this session:

wgrant@magrathea:~/launchpad/lp-branches$ bzr branch devel devel-push-test
Branched 10456 revision(s).
wgrant@magrathea:~/launchpad/lp-branches$ cd devel-push-test/
wgrant@magrathea:~/launchpad/lp-branches/devel-push-test$ time bzr push -Dhpss
Using saved push location: lp:~wgrant/launchpad/devel-push-test
Using default stacking branch /~launchpad-pqm/launchpad/db-devel at lp-66527888:///~wgrant/launchpad
Created new stacked branch referring to /~launchpad-pqm/launchpad/db-devel.
HPSS calls: 27 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)

real 1m42.351s
user 0m2.324s
sys 0m0.524s
wgrant@magrathea:~/launchpad/lp-branches/devel-push-test$ bzr ci --unchanged -m "Empty commit to test."
Committing to: /home/wgrant/launchpad/lp-branches/devel-push-test/
Committed revision 10457.
wgrant@magrathea:~/launchpad/lp-branches/devel-push-test$ time bzr push -Dhpss
Using saved push location: lp:~wgrant/launchpad/devel-push-test
Pushed up to revision 10457.
HPSS calls: 20 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)

real 1m16.740s
user 0m1.204s
sys 0m0.188s

Revision history for this message
William Grant (wgrant) wrote :
Revision history for this message
Martin Pool (mbp) wrote : Re: Large Repository.insert_stream_1.19 call when pushing new stacked branch with no new revisions

there are various places for improvement here (including the 10s ssh startup) but this title describes the most important one.

summary: - Pushing an LP branch with an empty commit is slow and bandwidth-hungry
+ Large Repository.insert_stream_1.19 call when pushing new stacked branch
+ with no new revisions
Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
tags: added: hpss performance stacking
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 534724] Re: Large Repository.insert_stream_1.19 call when pushing new stacked branch with no new revisions

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> there are various places for improvement here (including the 10s ssh
> startup) but this title describes the most important one.
>
> ** Summary changed:
>
> - Pushing an LP branch with an empty commit is slow and bandwidth-hungry
> + Large Repository.insert_stream_1.19 call when pushing new stacked branch with no new revisions
>
> ** Changed in: bzr
> Importance: Undecided => Medium
>
> ** Changed in: bzr
> Status: New => Confirmed
>
> ** Tags added: hpss performance stacking
>

I don't know if that is particularly bad. The really bad thing is the
"we don't trust the remote server will tell us about missing
inventories, so we *always* transmit the complete set of parent
inventories for any push to a stacked branch."

As an example:

1) Trunk is at rev 10
2) Push up a rev 11
   a) copies the new texts in 11
   b) the new inventory pages in 11
   c) and all of the inventory pages for 10
3) Push up a rev 12
   a) copies thenew texts in 12
   b) the new inventory pages for 12
   c) and all inventory pages in 11 (which are already present)

I mentioned this in the past, and the push back was that Andrew and/or
Robert didn't want to add another round trip to probe for what
inventories are actually present on the remote host. (It would probably
require direct VFS access, since I don't think we have a index RPC.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuWWU4ACgkQJdeBCYSNAAPvgQCfb51jv3vEjwraorvKgtUSL4iy
diYAnilbVUtKm90ND5RuoaAETuOag70z
=CJHK
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> there are various places for improvement here (including the 10s ssh
> startup) but this title describes the most important one.
>
> ** Summary changed:
>
> - Pushing an LP branch with an empty commit is slow and bandwidth-hungry
> + Large Repository.insert_stream_1.19 call when pushing new stacked branch with no new revisions
>

oh, i'm pretty sure William was saying that there *was* a single commit,
it just didn't introduce any new content other than the new commit id +
revision.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuWWWwACgkQJdeBCYSNAAOB5ACcCWQox3msA0PBEqUn6sLQD6HM
ccEAn2oSbg8vQJ27CtltcWC6irWXzjrQ
=BxDy
-----END PGP SIGNATURE-----

Martin Pool (mbp)
summary: Large Repository.insert_stream_1.19 call when pushing new stacked branch
- with no new revisions
+ with just one new revision
summary: - Large Repository.insert_stream_1.19 call when pushing new stacked branch
- with just one new revision
+ Large Repository.insert_stream_1.19 call when fixing up inventory-for-
+ parents.
Revision history for this message
Robert Collins (lifeless) wrote :

So the crucial things here are:
 - we want every 'revision in a repository' to be able to produce a delta-stream for it, for fetching, so that we know what texts to copy, and what texts to error on if they are absent.
 - that means we depend on having *enough* inventory-data-for-parents present to do a set difference against the parent revisions (and ghost revisions mean we have more file texts present to compensate).
 - we don't strictly need to require that a repository have the full inventory for a revision that is present - we can expect stacking to take care of that (stacking just can't be used during 'fetch').

So there are two issues here:
 - a new stacked repository currently has to copy up the full inventory when its first revision is added, because thats what we do today.
 - when we copy up the inventory data for adjacent parents, we take a very simple approach to what is copied (the main thing is simply that noone is focusing on this part of push performance at the moment: we always knew we had more to do - the streaming API has the ability to incrementally request more data - we do multiple round trips already; the main thing is to not have explosive scaling on the round trip counts). Falling back to VFS will suck immensely - more than a 1MB stream.

Until *both* of these are fixed, the first commit pushed to a new stacked branch will be size(inventory) not size(delta). Once the second is fixed, merges pushed to existing stacked branches will be much smaller (because the inventories for the merged revisions will be able to not be 'fully copied'), but we'll still want a full inventory for every revision.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.