'linux' import fails due to deleted librarian file (orig.tar.gz) in .dsc for 2.6.24-5.9 in hardy

Bug #792193 reported by Andrew Bennetts
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Distributed Development
Confirmed
Low
Unassigned

Bug Description

<http://package-import.ubuntu.com/status/linux.html#2011-04-22%2001:24:13.263528>

Traceback (most recent call last):
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 1120, in <module>
    persistent_download_cache=options.persistent_download_cache))
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 1046, in main
    bstore, possible_transports=possible_transports)
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 612, in import_package
    possible_transports=possible_transports)
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 390, in dget
    possible_transports=possible_transports)
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 330, in grab_file
    location_f = transport.do_catching_redirections(get_file, t, redirected)
  File "/var/lib/python-support/python2.5/bzrlib/transport/__init__.py", line 1670, in do_catching_redirections
    return action(transport)
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 317, in get_file
    return transport.get(name)
  File "/var/lib/python-support/python2.5/bzrlib/transport/http/__init__.py", line 126, in get
    code, response_file = self._get(relpath, None)
  File "/var/lib/python-support/python2.5/bzrlib/transport/http/_urllib.py", line 123, in _get
    response = self._perform(request)
  File "/var/lib/python-support/python2.5/bzrlib/transport/http/_urllib.py", line 79, in _perform
    response = self._opener.open(request)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/var/lib/python-support/python2.5/bzrlib/transport/http/_urllib2_wrappers.py", line 1602, in http_response
    code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 425, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/var/lib/python-support/python2.5/bzrlib/transport/http/_urllib2_wrappers.py", line 1619, in http_error_default
    % (code, msg))
bzrlib.errors.InvalidHttpResponse: Invalid http response for https://launchpad.net/ubuntu/hardy/%2Bsource/linux/2.6.24-5.9/%2Bfiles/linux_2.6.24.orig.tar.gz: Unable to handle http code 500: Internal Server Error

It's trying to retrieve the orig.tar.gz referenced in the dsc file, but Launchpad gives a 500 error because: “AssertionError: LibraryFileAliasView can not operate on deleted librarian files, since their URL is undefined.”

And sure enough, <https://launchpad.net/ubuntu/hardy/+source/linux/2.6.24-5.9> says “linux_2.6.24.orig.tar.gz (deleted)”. Presumably it's not hard to find a copy of the original 2.6.24 tarball elsewhere… we can probably fix this manually.

Ideally we'd teach the importer to look elsewhere (perhaps just see if other source packages with the same upstream version can provide it?), but so far just this package is affected so it's probably not a big deal.

Andrew Bennetts (spiv)
Changed in udd:
status: Confirmed → In Progress
Revision history for this message
Andrew Bennetts (spiv) wrote :

> we can probably fix this manually.

Specifically, if we run this import with --persistent-download-cache, and then manually add the orig tar gz to that cache, it should avoid this error.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Hmm, see also <https://bugs.launchpad.net/launchpad/+bug/663562>, there's some history surrounding linux_2.6.24.orig.tar.gz: there's been two different copies with different contents in soyuz apparently.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Thanks to wgrant the “deleted” copy of that file has been found at <http://launchpadlibrarian.net/11930615/linux_2.6.24.orig.tar.gz>. It's probably not sufficient to just put it in the download-cache, as I think it'll tend to get overwritten by the copy that every other version of 2.6.24-* uses, but we'll see how it goes. If it fails it might be simplest to include a one-off hack in the dget function to tell it where to find that md5: this is a one-off problem, or at least it's supposed to be :)

Revision history for this message
Andrew Bennetts (spiv) wrote :

The import for linux is running in a screen on jubany (rather than via the regularly import queue, due to needing --persistent-download-cache). It seems to be succeeding; it's certainly got past the version it was tripping over before.

Revision history for this message
Andrew Bennetts (spiv) wrote :

I'm a little concerned that the later revisions of 2.6.24 in the import are going to be built on the wrong orig.tar.gz. Hopefully the revisions are old enough (these versions of from hardy) and the difference between the two orig.tar.gz's small enough that we don't care.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Specifically, although the import_package.py script for later revisions has noticed the md5 mismatch and downloaded the appropriate orig.tar.gz, it still says “We already have the needed upstream part”, because bzr-builddeb's import_dsc.DistributionBranch._do_import_package doesn't pass the upstream_md5 when it does:

            if not self.has_upstream_version(version.upstream_version):

If it did I think it would probably DTRT here. What do other people think? Does this matter? Would adding “, upstream_md5” to that call be a good idea?

Revision history for this message
Andrew Bennetts (spiv) wrote :

I've Ctrl-C'd the linux import for now until we're sure it'll give sensible results in this scenario.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 792193] Re: 'linux' import fails due to deleted librarian file (orig.tar.gz) in .dsc for 2.6.24-5.9 in hardy

Martin

On 3 June 2011 17:33, Andrew Bennetts <email address hidden> wrote:
> Specifically, although the import_package.py script for later revisions
> has noticed the md5 mismatch and downloaded the appropriate orig.tar.gz,
> it still says “We already have the needed upstream part”, because bzr-
> builddeb's import_dsc.DistributionBranch._do_import_package doesn't pass
> the upstream_md5 when it does:
>
>            if not self.has_upstream_version(version.upstream_version):
>
> If it did I think it would probably DTRT here.  What do other people
> think?  Does this matter?  Would adding “, upstream_md5” to that call be
> a good idea

It seems plausible.

Revision history for this message
Max Bowsher (maxb) wrote :

I'm running some local tests to explore this situation.

I think we do need to add the upstream md5 check as stipulated, but I'm also slightly concerned about whether the upstream-2.6.24 tag will end up in the right place.

Revision history for this message
Max Bowsher (maxb) wrote :

OK, with the patch to bzr-builddeb's import_dsc.py:

== modified file 'import_dsc.py'
--- import_dsc.py 2011-06-05 00:12:43 +0000
+++ import_dsc.py 2011-06-05 02:07:44 +0000
@@ -1149,7 +1149,8 @@
             # We need to import at least the diff, possibly upstream.
             # Work out if we need the upstream part first.
             imported_upstream = False
- if not self.has_upstream_version(version.upstream_version):
+ if not self.has_upstream_version(version.upstream_version,
+ md5=upstream_md5):
                 up_pull_branch = \
                     self.branch_to_pull_upstream_from(version.upstream_version,
                             upstream_md5)

and having first seeded the download-cache with the deleted .orig.tar.gz, and running with --persistent-download-cache, we get history that makes sense, and the upstream-2.6.24 gets moved to the second import of the upstream OK.

So we should get that fix into bzr-builddeb, and then get bzr-builddeb updated on jubany.

When running the manual import, it might be advantageous to use a patched import_package.py:

=== modified file 'import_package.py'
--- a/import_package.py 2011-06-05 03:40:10 +0000
+++ b/import_package.py 2011-06-05 04:04:11 +0000
@@ -233,6 +233,7 @@
     vlist = get_debian_versions(package, extra_debian=extra_debian)
     publications = icommon.lp_call(icommon.call_with_limited_size,
                 u_archive.getPublishedSources,
+ distro_series="/ubuntu/hardy", pocket="Release",
                 source_name=package, exact_match=True)
     pb = ui.ui_factory.nested_progress_bar()
     try:

such that the manual run only has to import hardy-release (which still takes 1½ hours for me). Then, the rest of the importing can hopefully be handled by a normal mass_import.py child.

Revision history for this message
Max Bowsher (maxb) wrote :

Hmm. A problem.

Turns out that 2.6.24-28.83 is also linked to the bogus .orig.tar.gz. This is more than slightly annoying, as resolving this just got rather less trivial.

My suggestion at this point is that we should follow the previously outlined procedure except that instead of hacking the importer to import only hardy-release, we should hack it to only import versions strictly less than 2.6.24-28.83.

Once it has done so, and pushed the resulting branches, we then cheat, doing a manual "bzr import-dsc" of the problem package revision, then we cheat some more, and use the sqlite3 command line client to insert the new bzr revision's id and testament sha into the importer's meta.db revids table.

Then we can continue importing. But, given the incredibly long time it takes, we probably want to hack the importer so that it does so in incremental batches, rather than attempting to import all the way from hardy to oneiric in one run, and losing all of the progress if it falls over again.

Revision history for this message
Max Bowsher (maxb) wrote :

Hmm. Actually... perhaps we could be more devious. We could hack the code that reads the .dsc file to say "if version == 2.6.24-28.83 and md5 == the_bad_one: md5 = the_good_one"

Evil. But potentially workable :-)

Revision history for this message
Andrew Bennetts (spiv) wrote :

Max: yes, that evil hack is very tempting. As you say it only needs to be temporary.

Revision history for this message
Max Bowsher (maxb) wrote :

Andrew said:
> bzr-builddeb's import_dsc.DistributionBranch._do_import_package
> doesn't pass the upstream_md5 when it does:
>
> if not self.has_upstream_version(version.upstream_version):
>
> If it did I think it would probably DTRT here. What do other people
> think? Does this matter? Would adding “, upstream_md5” to that
> call be a good idea?

It has just occurred to me that *not* passing the md5 here is what allows you (in a non-importer situation, using bzr-builddeb commands interactively) to set an upstream-foo tag on an arbitrary revision which may not have a deb-md5 revision property at all, and have the importer trust you know what you are doing when you subsequently import-dsc a package using that upstream version.

This may or may not be a desired feature.

Martin Pool (mbp)
Changed in udd:
importance: Undecided → High
Revision history for this message
Max Bowsher (maxb) wrote :

Is this actually of High Urgency? Surely no one is actually going to use the resultant branch for actual development, since all the kernel bits are in git.

Revision history for this message
Martin Pool (mbp) wrote :

It's fine with me if you want to downgrade it.
On Jul 6, 2011 9:45 PM, "Max Bowsher" <email address hidden> wrote:
> Is this actually of High Urgency? Surely no one is actually going to use
> the resultant branch for actual development, since all the kernel bits
> are in git.
>
> --
> You received this bug notification because you are subscribed to Ubuntu
> Distributed Development.
> https://bugs.launchpad.net/bugs/792193
>
> Title:
> 'linux' import fails due to deleted librarian file (orig.tar.gz) in
> .dsc for 2.6.24-5.9 in hardy
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/udd/+bug/792193/+subscriptions

Revision history for this message
Max Bowsher (maxb) wrote :

Alright then, I'll set it to Low on the basis that the kernel team have well established maintenance procedures based on git, so whilst fixing this import has some value in being closer to being able to state UDD has complete import coverage, and as a stress test of UDD on large branches, it does not seem likely to me that the resultant import will ever see active use in packaging maintenance.

Changed in udd:
importance: High → Low
Andrew Bennetts (spiv)
Changed in udd:
assignee: Andrew Bennetts (spiv) → nobody
Vincent Ladeuil (vila)
Changed in udd:
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.