bzr reconcile doesn't work on stacked 2a repositories

Bug #441125 reported by Scott James Remnant (Canonical)
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
John A Meinel
Breezy
Triaged
Low
Unassigned

Bug Description

$ bzr reconcile lp:~upstart-devel/upstart/release
Reconciling branch bzr+ssh://bazaar.launchpad.net/~upstart-devel/upstart/release/
revision_history ok.
Reconciling repository bzr+ssh://bazaar.launchpad.net/~upstart-devel/upstart/release/
bzr: ERROR: exceptions.KeyError: ('makefile.am-20080508221105-rbs9wugi1qq76gcs-2', '<email address hidden>')

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    attaching the crash file
        /home/scott/.cache/crash/bzr-20091003065504-30482.crash
    and including a description of the problem.

    The crash file is plain text and you can inspect or edit it to remove
    private information.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :
Revision history for this message
Martin Pool (mbp) wrote :

may be related to bug 437626?

Changed in bzr:
status: New → Confirmed
importance: Undecided → Critical
Martin Pool (mbp)
Changed in bzr:
assignee: nobody → Ian Clatworthy (ian-clatworthy)
status: Confirmed → In Progress
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

I've grabed a copy of the files using lftp and can reproduce this locally now. 'bzr check' fails as well. Curiously, 'bzr info' only finds the unshared repo with no tree - it doesn't think there's a branch there despite the files being present. That could be a red herring though.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Scott,

I'd like to find out if the data problem existed before upgrading. If not, I need to investigate whether the bug is in the upgrade command or something after that.

What's the history of this branch? What version of Bazaar was used to upgrade it to the current format? Do you still have a copy of the pre-upgraded branch?

btw, I can branch from the branch above and the new branch is clean. I'm yet to compare the created branch with the original to see how they differ in history if at all.

Revision history for this message
John A Meinel (jameinel) wrote :

Also note that because of the "hosted" versus "mirrored" sides of Launchpad, it is possible that the broken data only exists on the hosted side. And thus someone who accesses it who is not able to write to it, cannot see that data.

Even further, it is possible to have broken-but-not-referenced data in a repository, which only shows up during things like 'check' and 'upgrade'. (that is what happened to one of Martin Albisetti's repositories, where it had some broken data from 3+ years back.)

We *could* make upgrade only touch referenced data, but in the presence of stuff like stacked branches, it is technically impossible to verify what the minimal set of referenced data is.

It sounds like it would be best if you could get a copy of the original data, to find out any more details.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 441125] Re: bzr: ERROR: exceptions.KeyError: ('makefile.am-20080508221105-rbs9wugi1qq76gcs-2', 'scott@netsplit.com-20090702173125-4nayj8jp8h4f8jnq')

On Wed, 2009-10-07 at 12:50 +0000, Ian Clatworthy wrote:

> I'd like to find out if the data problem existed before upgrading. If
> not, I need to investigate whether the bug is in the upgrade command or
> something after that.
>
I believe it did, but it's hard to be sure.

Unfortunately bzr 2.0 has basically forced everybody to upgrade branches
left right and centre due to the "different rich root" error. As soon
as you end up working with someone else, you hit it.

So I've lost track of which branches were upgraded when.

> What's the history of this branch? What version of Bazaar was used to
> upgrade it to the current format? Do you still have a copy of the pre-
> upgraded branch?
>
It's got a long and complicated history, and I'm afraid I don't really
remember. It's certainly been upgraded several times with different
versions of bzr.

> btw, I can branch from the branch above and the new branch is clean. I'm
> yet to compare the created branch with the original to see how they
> differ in history if at all.
>
I can confirm that. But I'm obviously worried that "bzr branch" isn't
giving me the right thing, since the repository on LP isn't passing
while my branch is.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Re: [Bug 441125] Re: bzr: ERROR: exceptions.KeyError: ('makefile.am-20080508221105-rbs9wugi1qq76gcs-2', 'scott@netsplit.com-20090702173125-4nayj8jp8h4f8jnq')

Scott James Remnant wrote:
> On Wed, 2009-10-07 at 12:50 +0000, Ian Clatworthy wrote:
>
>> btw, I can branch from the branch above and the new branch is clean. I'm
>> yet to compare the created branch with the original to see how they
>> differ in history if at all.
>>
> I can confirm that. But I'm obviously worried that "bzr branch" isn't
> giving me the right thing, since the repository on LP isn't passing
> while my branch is.

One way to check is to compare the output of 'bzr log -v -p' or 'bzr
fast-export'. I'll do that tomorrow (it's late here) if you like but you
might be better placed to decide whether the differences, if any, are of
consequence.

Ian C.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Re: bzr: ERROR: exceptions.KeyError: ('makefile.am-20080508221105-rbs9wugi1qq76gcs-2', 'scott@netsplit.com-20090702173125-4nayj8jp8h4f8jnq')

I'll continue to dig into this with some urgency to understand what's going on. But it doesn't look like it's affecting other projects and it doesn't sound like it ought to prevent us shipping 2.0.1. I'm bumping the priority down to High accordingly.

Changed in bzr:
importance: Critical → High
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Scott,

It looks like the branch is ok. After grabbing the upstart trunk and reconfiguring things in branch.conf to point to the it as the stacked location, I can confirm that "bzr log -v -p" on both the branch on LP and the branch created from it give the same result, despite reconcile falling over on the LP one.

I'll keep digging as to the cause though.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Looking through the output of "bzr log -n0 -v --show-ids", here's the revision of interest ...

revno: 1129
revision-id: <email address hidden>
parent: <email address hidden>
committer: Scott James Remnant <email address hidden>
branch nick: upstart
timestamp: Thu 2009-07-02 18:31:25 +0100
message:
  * dbus/Makefile.am (EXTRA_DIST): Make sure we distribute it
  ...
added:
  dbus/upstart.h upstart.h-20090702170734-rkehkchaggmne1hc-1
modified:
  ChangeLog ChangeLog-20060514172403-c4a8bb77f657a0cb
  dbus/Makefile.am makefile.am-20080508221105-rbs9wugi1qq76gcs-2
  init/control.c control.c-20080418121850-82yk25zzqppnbuks-1
  init/control.h control.h-20080418121850-82yk25zzqppnbuks-2
  init/event.c event.c-20060804042848-c6cfe89fbc8c70be
  init/job.c job.c-20060802025500-97aaaab01a3a1f80
  init/job_class.c job_class.c-20080429175547-d99o8qslpiwrwg2b-1
  init/tests/test_control.c test_control.c-20080418121842-se4le4a4ucu9ifws-1
  init/tests/test_job.c test_job.c-20060802025841-69d13b49cc35d5ec
  init/tests/test_job_class.c test_job_class.c-20080429175550-1dsl7nvtbx4ajydk-1

So the file is dbus/Makefile.am.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

So I've tracked this down to line 1289 in groupcompress_repo.py giving unexpected results:

  file_id_parent_map = source_vf.get_parent_map(self._text_refs)

When run on the stacked branch, self._text_refs has 5024 members in the set, while file_id_parent_map only has 4875 keys. So, lots of them are missing. In comparison, when run on the matching generated branch, self._text_refs has 10372 members and file_id_parent_map has 10372 keys.

At this point, my guess is that this is a bug in reconcile when running on a stacked branch. I'll investigate more next week (unless someone smarter jumps in and solves it).

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

Um. FWIW, I'm experiencing this too, with lp:~mnordhoff/maria/5.1-2a (and my local copy, and another copy I branched off of LP).

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

I'm not working on this and are unlikely to get back to this anytime soon.

Changed in bzr:
assignee: Ian Clatworthy (ian-clatworthy) → nobody
Parth Malwankar (parthm)
Changed in bzr:
status: In Progress → Confirmed
Martin Pool (mbp)
Changed in bzr:
milestone: none → 2.2.0
Revision history for this message
John A Meinel (jameinel) wrote :

I just did a fresh lftp fetch of both ~scott/upstart/trunk and ~upstart-dev/upstart/release and edited branch.conf to point to my local objects.

I can still confirm that 'bzr reconcile' raises a KeyError midway through, rather than success/graceful failure.

Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :

Simple steps to reproduce:

bzr init base
cd base
echo a > a
echo b > b
bzr commit -m "add a & b"
cd ..
bzr branch base feature
cd feature
echo stuff >> b
bzr commit -m "added stuff"
cd ..
bzr branch base --stacked stacked-on-base
cd stacked-on-base
bzr pull ../feature
bzr reconcile
#boom

Rough analysis:

1) The 'bzr reconcile' command is trying to *only* reconcile the content in the stacked branch. (seems reasonable)
2) It iterates over the ancestry of the branch in _copy_text_texts and uses repo._get_text_key_index which is where it assumes self._text_refs will be appropriately set.
3) It does *not* filter based on the stacking boundary. The fetch 'bzr pull ../feature' doesn't fetch the text for file a revno1 because it is present in the base branch. It *does* fetch the text for file b revno2, but both (a,1) and (b,2) are referenced in revno2.
4) It goes to fetch the referenced text, and *boom*.

I *think* the fix is that it needs to find the text_refs for the inventories of parents which are present in the repository, and remove those from the needed text_refs.

Also note that the code doesn't seem to handle needing the basis inventories. Specifically if you look at "_copy_inventory_texts()" it makes a request for all inventories matching the keys present in revisions. It does not try to fill in basis inventories.

As such the 2a reconcile code seems pretty broken (in general) wrt stacked repositories. I'll see what I can do about it.

Revision history for this message
John A Meinel (jameinel) wrote :

Note that regular 'pack' formats work around this by not regenerating the text index (as near as I can tell).

It requests an 'ideal_index', and then iterates over the texts already present in the repo. For anything in the currently-present texts that aren't in the ideal index, it just skips them. But it doesn't do any sort of check about whether there are nodes in the ideal_index that are not present in the local one.

Note that we could do that here (just read existing text index directly, and assume it is properly filtered), but that seems less ideal that actually generating the correct set of texts that we want to have present in the reconciled inventory.

summary: - bzr: ERROR: exceptions.KeyError:
- ('makefile.am-20080508221105-rbs9wugi1qq76gcs-2',
- 'scott@netsplit.com-20090702173125-4nayj8jp8h4f8jnq')
+ bzr reconcile doesn't work on stacked 2a repositories
Revision history for this message
John A Meinel (jameinel) wrote :

Note that the related branch is based on the 2.0 series, and adds a test case which seems correct, but fails against all stackable format repositories. (Which surprises me, as I would have thought it would only fail with 2a format repos....)

Revision history for this message
John A Meinel (jameinel) wrote :

In all cases it fails with this portion of traceback:
    ideal_index = repo._generate_text_key_index(None, ancestors)
  File "bzrlib/repository.py", line 2214, in _generate_text_key_index
    text_key_references, pb)
  File "bzrlib/repository.py", line 2233, in _do_generate_text_key_index
    revision_keys[text_key[1]].add(text_key)
KeyError: 'jameinel@falco-20100623213009-55b51a60hw58gtkd'

That revision-id matches the 'base' revision-id.

As such, I haven't reproduced exactly the same failure as seen by the simple script I wrote.

Revision history for this message
John A Meinel (jameinel) wrote :

Oddly enough adding this line before doing the reconcile step:
        repo = repo.bzrdir.open_repository()

*Does* reproduce the existing failures. (For 1.9 format repos, it fails because it tries to create a pack file which is identical to an existing one [a safe and semi-expected failure], for 2a repos it fails with the file_id_parent_map[key] KeyError.)

I'm pretty surprised that re-opening the repository has such a dramatic effect. Probably something else that we need to look into.

Revision history for this message
John A Meinel (jameinel) wrote :

bumping the target, since this isn't in the release. Note that we might backport the fix to all of the 2.x series (2.0, 2.1, 2.2)

Changed in bzr:
milestone: 2.2.0 → none
Jelmer Vernooij (jelmer)
tags: added: 2a reconcile
Jelmer Vernooij (jelmer)
tags: added: missing-text
Jelmer Vernooij (jelmer)
tags: added: stacking
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Triaged
importance: Undecided → Low
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.