Launchpad itself

No-op snap takes 1.5 min to build, 6.5 minutes to publish

Bug #1689282 reported by Evan on 2017-05-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Colin Watson

Bug Description

Publishing takes an oddly long amount of time (6.5 minutes) off the smallest snap. Shouldn't this be seconds at most?

See https://github.com/canonical-ols/build.snapcraft.io/issues/717#issuecomment-299431648 for an initial assessment of the issue.

Tags:

Related branches

lp:~cjwatson/launchpad/snap-upload-better-retry

Merged into lp:launchpad at revision 18426

William Grant: Approve (code) on 2017-07-21

Revision history for this message

Colin Watson (cjwatson) wrote on 2017-05-08:

Just copying my explanation here so that it isn't dependent on an external site:

The background for this is that publishing involves multiple steps: we have to do the equivalent of snapcraft push and then the equivalent of snapcraft release, and between those we have to wait for the store to finish scanning the upload, which is an asynchronous job and takes an undetermined amount of time. Rather than polling and thus taking up a worker slot in Launchpad for that undetermined amount of time, we retry the job later with a one-minute delay up to a maximum of 20 times.

There are various things that we could look at to reduce the latency of this process (which aren't all mutually-exclusive, and we may not know the best strategy until we do some more analysis):

* Unlike the initial job, retries don't seem to be handled by celery for some reason, but instead are picked up by the fallback cron job some time later. This is the source of most of the unnecessary delay, and is probably just a simple bug somewhere. Assuming that the store scans the upload reasonably promptly, we could cut the typical delay for small snaps down to a little over a minute by getting celery to pick up the retries.
* We could consider having the job poll for a short time after it pushes the snap, which would cut out almost all the extra latency in the case that the store manages to scan it immediately. This may be a good idea, but probably only if the store typically does in fact manage quick scans; otherwise we'd be tying up workers for longer and degrading overall system performance.
* We could try some kind of exponential backoff approach, so that the first retries happen more quickly.
* We could look at having the store tell us when it's done by way of a webhook. This seems like the most elegant approach, but it's also a lot of work in that it requires implementing webhook sending in the store and webhook receiving in Launchpad.

tags:	added: lp-snappy performance
Changed in launchpad:
status:	New → Triaged
importance:	Undecided → High

Colin Watson (cjwatson) on 2017-06-29

Changed in launchpad:
status:	Triaged → In Progress
assignee:	nobody → Colin Watson (cjwatson)

Revision history for this message

Launchpad QA Bot (lpqabot) wrote on 2017-07-21:

Fixed in stable r18426 <http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/18426>.

tags:	added: qa-needstesting
Changed in launchpad:
status:	In Progress → Fix Committed

Revision history for this message

Colin Watson (cjwatson) wrote on 2017-07-25:

For the record, I went for options 1 and 3 from my previous comment (i.e. fix the bug that caused retries not to be handled by celery, and perform the first few retries more quickly when we're just polling the status endpoint). That gets the "Store upload in progress" stage down to 15 seconds plus change for small snaps.

tags:

added: qa-ok
removed: qa-needstesting

Colin Watson (cjwatson) on 2017-07-26

Changed in launchpad:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.