Archive:+copy-packages nearly unusable due to timeouts

Bug #575450 reported by Max Bowsher
106
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Colin Watson

Bug Description

  5 https://launchpad.net/ubuntu/+archive/primary/+copy-packages (Archive:+copy-packages)
       OOPS-1800C349, OOPS-1800G2129, OOPS-1800K1139, OOPS-1800M2146, OOPS-1800O1826
      3 https://edge.launchpad.net/ubuntu/+archive/primary/+copy-packages (Archive:+copy-packages)
       OOPS-1800EB307, OOPS-1800ED1901, OOPS-1800ED329
      1 https://launchpad.net/%7Eubuntu-langpack/+archive/ppa/+index (Archive:+index)
       OOPS-1800L840

Picking 1800L840:

Branch: launchpad-rev-12004
Revno: 12004
SQL time: 14566 ms
Non-sql time: 1575 ms
Total time: 16141 ms
Statement Count: 235

Long SQL Statements (Shows up to five statements)

Time Reps Database id Statement
1 3667.0 1 SQL-launchpad-main-master SELECT COUNT(*) FROM SourcePackageName, SourcePackagePublishingHistory, SourcePackageRelease WHERE SourcePackagePublishingHistory.archive = 32 AND SourcePackagePublishingHistory.sourcepackagerelease = SourcePackageRelease.id AND SourcePackageRelease.sourcepackagename = SourcePackageName.id AND (1=1)
2 1919.0 1 SQL-launchpad-main-master SELECT COUNT(*) FROM ((SELECT DISTINCT SourcePackageRelease.user_defined_fields, SourcePackageRelease.architecturehintlist, SourcePackageRelease.build_conflicts, SourcePackageRelease.build_conflicts_indep, SourcePackageRelease.builddepends, SourcePackageRelease.builddependsindep, SourcePackageRelease.changelog, SourcePackageRelease.changelog_entry, SourcePackageRelease.component, SourcePackageRelease.copyright, SourcePackageRelease.creator, SourcePackageRelease.dateuploaded, SourcePackageRelease.dsc, SourcePackageRelease.dsc_binaries, SourcePackageRelease.dsc_format, SourcePackageRelease.dsc_maintainer_rfc822, SourcePackageRelease.dsc_standards_version, SourcePackageRelease.dscsigningkey, SourcePackageRelease.format, SourcePackageRelease.homepage, SourcePackageRelease.id, SourcePackageRelease.maintainer, SourcePackageRelease.section, SourcePackageRelease.sourcepackage_recipe_build, SourcePackageRelease.sourcepackagename, SourcePackageRelease.upload_archive, SourcePackageRelease.upload_distroseries, SourcePackageRelease.urgency, SourcePackageRelease.version FROM BinaryPackageBuild, BuildFarmJob, PackageBuild, SourcePackageRelease WHERE BinaryPackageBuild.source_package_release = SourcePackageRelease.id AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.archive = %s AND PackageBuild.build_farm_job = BuildFarmJob.id AND BuildFarmJob.status = %s) EXCEPT (SELECT DISTINCT SourcePackageRelease.user_defined_fields, SourcePackageRelease.architecturehintlist, SourcePackageRelease.build_conflicts, SourcePackageRelease.build_conflicts_indep, SourcePackageRelease.builddepends, SourcePackageRelease.builddependsindep, SourcePackageRelease.changelog, SourcePackageRelease.changelog_entry, SourcePackageRelease.component, SourcePackageRelease.copyright, SourcePackageRelease.creator, SourcePackageRelease.dateuploaded, SourcePackageRelease.dsc, SourcePackageRelease.dsc_binaries, SourcePackageRelease.dsc_format, SourcePackageRelease.dsc_maintainer_rfc822, SourcePackageRelease.dsc_standards_version, SourcePackageRelease.dscsigningkey, SourcePackageRelease.format, SourcePackageRelease.homepage, SourcePackageRelease.id, SourcePackageRelease.maintainer, SourcePackageRelease.section, SourcePackageRelease.sourcepackage_recipe_build, SourcePackageRelease.sourcepackagename, SourcePackageRelease.upload_archive, SourcePackageRelease.upload_distroseries, SourcePackageRelease.urgency, SourcePackageRelease.version FROM BinaryPackageBuild, BuildFarmJob, PackageBuild, SourcePackageRelease WHERE BinaryPackageBuild.source_package_release = SourcePackageRelease.id AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.archive = %s AND PackageBuild.build_farm_job = BuildFarmJob.id AND BuildFarmJob.status = %s)) AS "_tmp"
3 1117.0 1 SQL-launchpad-main-master SELECT SourcePackageRelease.user_defined_fields, SourcePackageRelease.architecturehintlist, SourcePackageRelease.build_conflicts, SourcePackageRelease.build_conflicts_indep, SourcePackageRelease.builddepends, SourcePackageRelease.builddependsindep, SourcePackageRelease.changelog, SourcePackageRelease.changelog_entry, SourcePackageRelease.component, SourcePackageRelease.copyright, SourcePackageRelease.creator, SourcePackageRelease.dateuploaded, SourcePackageRelease.dsc, SourcePackageRelease.dsc_binaries, SourcePackageRelease.dsc_format, SourcePackageRelease.dsc_maintainer_rfc822, SourcePackageRelease.dsc_standards_version, SourcePackageRelease.dscsigningkey, SourcePackageRelease.format, SourcePackageRelease.homepage, SourcePackageRelease.id, SourcePackageRelease.maintainer, SourcePackageRelease.section, SourcePackageRelease.sourcepackage_recipe_build, SourcePackageRelease.sourcepackagename, SourcePackageRelease.upload_archive, SourcePackageRelease.upload_distroseries, SourcePackageRelease.urgency, SourcePackageRelease.version FROM SourcePackageRelease JOIN ( SELECT spr.id as sourcepackagerelease, MAX(spph.id) FROM SourcePackagePublishingHistory spph, SourcePackageRelease spr WHERE spr.sourcepackagename IN (380, 392, 52690, 396, 376, 378, 370, 406, 408, 52559, 420, 52561, 412, 414, 394, 48171, 52563, 40871, 410, 52562, 384, 52558, 416, 386, 374, 372, 52554, 402, 404, 52560, 52555, 390, 388, 62590, 400, 418, 382, 398) AND spph.sourcepackagerelease = spr.id AND spph.archive IN (1, 534) AND spph.status IN (1, 2) AND spph.distroseries = 103 GROUP BY spr.id) AS spph ON SourcePackageRelease.id = spph.sourcepackagerelease
4 669.0 1 SQL-launchpad-main-master SELECT * FROM ((SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM BinaryPackageBuild, BinaryPackagePublishingHistory, BinaryPackageRelease, BuildFarmJob, DistroArchSeries, PackageBuild, SourcePackagePublishingHistory WHERE BinaryPackageBuild.package_build = PackageBuild.id AND BinaryPackageBuild.distro_arch_series = DistroArchSeries.id AND SourcePackagePublishingHistory.distroseries = DistroArchSeries.distroseries AND SourcePackagePublishingHistory.sourcepackagerelease = BinaryPackageBuild.source_package_release AND SourcePackagePublishingHistory.id IN (%s) AND SourcePackagePublishingHistory.archive != PackageBuild.archive AND BinaryPackagePublishingHistory.archive = SourcePackagePublishingHistory.archive AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageRelease.id AND BinaryPackageRelease.build = BinaryPackageBuild.id AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.build_farm_job = BuildFarmJob.id AND BuildFarmJob.status IN (%s) ORDER BY BinaryPackageBuild.id) UNION (SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM BinaryPackageBuild, BuildFarmJob, DistroArchSeries, PackageBuild, SourcePackagePublishingHistory WHERE BinaryPackageBuild.package_build = PackageBuild.id AND BinaryPackageBuild.distro_arch_series = DistroArchSeries.id AND SourcePackagePublishingHistory.distroseries = DistroArchSeries.distroseries AND SourcePackagePublishingHistory.sourcepackagerelease = BinaryPackageBuild.source_package_release AND SourcePackagePublishingHistory.id IN (%s) AND SourcePackagePublishingHistory.archive = PackageBuild.archive AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.build_farm_job = BuildFarmJob.id AND BuildFarmJob.status IN (%s) ORDER BY BinaryPackageBuild.id)) AS "_21899" ORDER BY id
5 638.0 1 SQL-launchpad-main-master SELECT * FROM ((SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM BinaryPackageBuild, BinaryPackagePublishingHistory, BinaryPackageRelease, DistroArchSeries, PackageBuild, SourcePackagePublishingHistory WHERE BinaryPackageBuild.package_build = PackageBuild.id AND BinaryPackageBuild.distro_arch_series = DistroArchSeries.id AND SourcePackagePublishingHistory.distroseries = DistroArchSeries.distroseries AND SourcePackagePublishingHistory.sourcepackagerelease = BinaryPackageBuild.source_package_release AND SourcePackagePublishingHistory.id IN (%s) AND SourcePackagePublishingHistory.archive != PackageBuild.archive AND BinaryPackagePublishingHistory.archive = SourcePackagePublishingHistory.archive AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageRelease.id AND BinaryPackageRelease.build = BinaryPackageBuild.id AND SourcePackagePublishingHistory.archive = %s ORDER BY BinaryPackageBuild.id) UNION (SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM BinaryPackageBuild, DistroArchSeries, PackageBuild, SourcePackagePublishingHistory WHERE BinaryPackageBuild.package_build = PackageBuild.id AND BinaryPackageBuild.distro_arch_series = DistroArchSeries.id AND SourcePackagePublishingHistory.distroseries = DistroArchSeries.distroseries AND SourcePackagePublishingHistory.sourcepackagerelease = BinaryPackageBuild.source_package_release AND SourcePackagePublishingHistory.id IN (%s) AND SourcePackagePublishingHistory.archive = PackageBuild.archive AND SourcePackagePublishingHistory.archive = %s ORDER BY BinaryPackageBuild.id)) AS "_21896" ORDER BY id

Repeated SQL Statements

Reps Total time Average time Saving Database id Statement
1 79 114 1 113 SQL-launchpad-main-master SELECT PackageBuild.archive, PackageBuild.build_farm_job, PackageBuild.dependencies, PackageBuild.id, PackageBuild.pocket, PackageBuild.upload_log FROM PackageBuild WHERE PackageBuild.id = %s LIMIT 1
2 78 87 1 86 SQL-launchpad-main-master SELECT BuildFarmJob.builder, BuildFarmJob.date_created, BuildFarmJob.date_finished, BuildFarmJob.date_first_dispatched, BuildFarmJob.date_started, BuildFarmJob.failure_count, BuildFarmJob.id, BuildFarmJob.job_type, BuildFarmJob.log, BuildFarmJob.processor, BuildFarmJob.status, BuildFarmJob.virtualized FROM BuildFarmJob WHERE BuildFarmJob.id = %s LIMIT 1
3 8 10 1 9 SQL-launchpad-main-master SELECT TeamParticipation.id, TeamParticipation.person, TeamParticipation.team FROM TeamParticipation WHERE TeamParticipation.person = %s AND TeamParticipation.team = %s

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Steve, as I discussed with you previously, I suspect it might be your new copy checker code generating a load more DB queries. Michael can show you how to debug that :)

Changed in soyuz:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Steve Kowalik (stevenk)
tags: added: oops ppa timeout
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Max, do you get the same timeouts in the API? I guess this is a good way of getting you to use it instead of screen scraping :)

Revision history for this message
Micah Gersten (micahg) wrote :

I got another oops today on soyuz:
(Error ID: OOPS-1664EC1507)

tags: added: pg83
removed: oops
Revision history for this message
Robert Collins (lifeless) wrote :

@StevenK are you working on this?

description: updated
tags: removed: pg83
Changed in soyuz:
assignee: Steve Kowalik (stevenk) → nobody
Revision history for this message
Robert Collins (lifeless) wrote :

Steven isn't looking at this at the moment.

summary: - +copy-packages nearly unusable due to timeouts
+ Archive:+copy-packages nearly unusable due to timeouts
description: updated
description: updated
Revision history for this message
William Grant (wgrant) wrote :

The statement log in the description appears to be from a GET, not a POST, so it's mostly due to lazr.batchnavigator wanting a precise count. It's not the less uninteresting actual copying bits.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 575450] Re: Archive:+copy-packages nearly unusable due to timeouts

On Mon, Dec 6, 2010 at 5:30 PM, William Grant <email address hidden> wrote:
> The statement log in the description appears to be from a GET, not a
> POST, so it's mostly due to lazr.batchnavigator wanting a precise count.
> It's not the less uninteresting actual copying bits.

blah yes. OTOH we need to fix that too :) hint, hint.

Changed in launchpad:
importance: High → Critical
j.c.sackett (jcsackett)
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
Revision history for this message
William Grant (wgrant) wrote :

OOPS-1858O745 is a recent relevant OOPS. Nothing really slow -- just death by 1007 queries.

Changed in launchpad:
status: Triaged → In Progress
Revision history for this message
Robert Collins (lifeless) wrote :

The new lp-oops analyses the late evaluation well for us:
256xbinary package file in 800ms
148xBinaryPackagePublishingHistory in 1500ms
148 BinaryPackagePublishingHistory inserts in 400ms
134xlibrary file alias in 1100ms
128xDistroArchSeries in 360ms
87xBinaryPackageName in 170ms
12xSourcePackageReleaseFile in 67ms
9xBinaryPackagePublishingHistory in 170ms
3xLibraryFileAlias in 721ms due to a complex query
2 very wide queries for SourcePackagePublishingHistory + others in 670ms

Looks like this is coming from a copy of -just- 3 packages.

Revision history for this message
William Grant (wgrant) wrote :

My fix for bug #728220 should eliminate all BinaryPackageFile and all but ~9 of the LibraryFileAlias queries.

The 148 BinaryPackagePublishingHistory selects and inserts will be in PublishingSet.publishBinary.

DistroArchSeries queries will be partly copyBinariesTo, partly publishBinary, and partly getBuildByArch (inside createMissingBuilds).

Revision history for this message
William Grant (wgrant) wrote :

This is probably mostly fixed now; the copy operation itself is fairly fast, but some of the UI stuff still takes a few seconds (particularly determining the sources to show in the selection list, and determining the archives to which it is possible to copy). I expect this to stop timing out after the deployment tomorrow.

Revision history for this message
William Grant (wgrant) wrote :

Bug #733071 covers the case of slow conflict checking when some of the sources already exist in the target (OOPS-1896P15).

William Grant (wgrant)
Changed in launchpad:
assignee: William Grant (wgrant) → nobody
status: In Progress → Triaged
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This issue is _NOT_ fixed, it is still virtually impossible to copy even two packages at the same time. Can we please bump the effective priority so that it gets worked on?

Revision history for this message
Robert Collins (lifeless) wrote :

Its critical already.

Revision history for this message
William Grant (wgrant) wrote :

Zygmunt, do you have an OOPS ID? Any number of circumstances could be making your particular case slow.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Once we have a way of putting feedback in the page from the new asynchronous copying code, this bug will be fixed. It would be an easy to medium difficulty patch to make so if anyone wants to scratch that itch, let me know and I will mentor you.

Revision history for this message
Данило Шеган (danilo) wrote :

Not sure what to do with bug 878051: is it a duplicate of this one or bug 741489? Or maybe entirely "new"? I didn't dive deep enough to be able to tell, sorry.

Revision history for this message
Scott Kitterman (kitterman) wrote :

https://launchpad.net/~kubuntu-ppa/+archive/staging/+copy-packages

Here are some fresh OOPS for at least the same symptom as described in the bug:

With 75 packages:

(Error ID: OOPS-5adcfb92094e140d03e76d5d96c55c6a)
(Error ID: OOPS-d7ee0237730fa4cc3c4e58fd0df22fe7)
Error ID: OOPS-af60d0aca0accd71a23aafd2a771bd81)

With roughly half that:

(Error ID: OOPS-29df39a52b2944633a8d220903d1e743)
(Error ID: OOPS-1d7cafbd0169ba5d95ac82d4c5353c28)
(Error ID: OOPS-5255b011f0ee538180e0858703c6981b)

Works with 5 or 10, but not consistently with 15 - failed the first time (Error ID: OOPS-40b737d11c7525b48b219590c0a67d77) and then worked.

Worked with 12 several times and then failed on a different set of packages (perhaps with more history):

(Error ID: OOPS-3608a6bf667310a9f8ae1519a38d94a7)
(Error ID: OOPS-37dd49546bc0fb1aed0e95214bb62b6e)
(Error ID: OOPS-f949f9e00954beafea41395ffaf0f912)

Then 10 worked. Then I got tired of copy/pasting oops numbers, that's probably enough.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Raphaël Badin (rvb)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Steve Kowalik (stevenk)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → In Progress
Revision history for this message
Colin Watson (cjwatson) wrote :

Once deployed, the fix for bug 812869 (addressing Julian's comment 16) should be enough to let us change the feature flags that disabled asynchronous copies, and start seeing whether they work well in practice. If they work well enough now, then the correct way to fix this is ultimately to delete synchronous copies and push everything through the asynchronous path.

Revision history for this message
Colin Watson (cjwatson) wrote :

The asynchronous path is now enabled for members of ~launchpad-beta-testers.

Curtis Hovey (sinzui)
Changed in launchpad:
assignee: Raphaël Badin (rvb) → nobody
status: In Progress → Triaged
Revision history for this message
Colin Watson (cjwatson) wrote :

This is nearly done. The next step is to migrate PCJ processing to Celery or otherwise cause there to be more runners (which will require webops time, so not this week), and then retire the old synchronous copy paths to the sound of much rejoicing from LoC-counters.

Changed in launchpad:
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
William Grant (wgrant)
tags: added: package-copies
Revision history for this message
Colin Watson (cjwatson) wrote :

More runners turned out to be impractical for various reasons, so I implemented bug 1064274 instead.

We've now rolled the asynchronous copy path out to everyone, in preparation for removing the old code and closing this bug. Please shout if it still times out or otherwise breaks!

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
Changed in launchpad:
status: In Progress → Fix Committed
Steve Kowalik (stevenk)
tags: added: bad-commit-16210 qa-ok
removed: qa-needstesting
Steve Kowalik (stevenk)
tags: removed: bad-commit-16210
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote :

Reopening this so that I can use it as a QA hook for final removal of the synchronous path.

Changed in launchpad:
status: Fix Released → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
Changed in launchpad:
status: In Progress → Fix Committed
Colin Watson (cjwatson)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.