Comment 2 for bug 1370230

Revision history for this message
Martin Pitt (pitti) wrote :

Indeed, that was quite expected. Before this, in a well-filled sandbox it hardly installed anything at all due to the "does not track versions" issue that you reported. So merely by having to download and install a lot more packages we'll get longer execution times.

> Using apport version 2830 with no cache the retrace took approximately 2:32.

Is that actually --cache (i. e. the place to cache downloaded debs and package indexes) or --sandbox-dir? Measuring without --cache isn't very useful as a starting point for optimization as the time for downloading the .debs probably varies a lot. But measuring with/without an existing --sandbox-dir would be interesting: I. e. how much slower did the current code get for the same report, filled cache (to eliminate download times), and empty sandbox.

> However, because that is used we now have to call apport.packaging.get_file_package() for every shared library, which the function indicates is very expensive.

Indeed. The first time you call it it has to download Contents.gz which is quite large; it also gets refreshed after a day. Subsequent calls will use the locally cached version. However, that function still uses zgrep, which is quite slow for multiple lookups. I think this is a good first point to optimize: Instead of storing the raw Contents.gz, we could build a path → package Python dict and cache that instead (in RAM during one retrace, and pickling it to the cache dir for subsequent invocations), and then we can use the mtime of the pickled file as an indication when to refresh it.