slave-scanner shouldn't block on chroot extraction

Bug #211974 reported by Adam Conrad
4
Affects Status Importance Assigned to Milestone
launchpad-buildd
Fix Released
Medium
Celso Providelo

Bug Description

When the slave-scanner dispatches jobs to launchpad-buildd, it blocks on the buildd returning from the tarball extraction. Not only does this lead to timeouts on cold caches (as seen on the PPA buildds constantly, and now on some hardy-upgraded buildds), but it also means the scanner is blocking for 60+ seconds on one buildd, when it could be looping through them all to dispatch/gather other builds.

Fixing this to make chroot dispatching an async job may or may not require code changes to launchpad-buildd as well as slave-scanner. If that's the case, I'm happy to work with you guys on this, but we do need it fixed -- the overall impact of this is going from "irritating time-waster" to "(relatively) large portions of my day are wasted re-enabling timed-out buildds".

Note that this change would have a positive effect on buildd performance in general, as queues would clear faster due to faster job polling (and we'd even be able to reduce the current timeout to detect broken buildds faster!), so it seems like an all-around win.

James Troup (elmo)
Changed in soyuz:
status: New → Confirmed
Revision history for this message
Celso Providelo (cprov) wrote :

Agreed, but I believe this bug/fix belongs to the launchpad-buildd code, which is the part that actually blocks, not slave-scanner.

It isn't that simple as it looks, because unpacking the chroot is part of the start_build xmlrpc command and making this call asynchronous will possibly involve postponing this procedure to the build time, i.e. 'start_build' would return after downloading the chroot.

Anyway, this bug exists and I'm more than happy to help you with the implementation.

Celso Providelo (cprov)
Changed in soyuz:
assignee: nobody → cprov
importance: Undecided → Medium
status: Confirmed → In Progress
Revision history for this message
Adam Conrad (adconrad) wrote :

Fixed in launchpad-buildd version 44, rolled out to production buildds today, and reviewed and merged to RF by cprov.

Changed in launchpad-buildd:
assignee: cprov → adconrad
status: In Progress → Fix Released
Revision history for this message
Celso Providelo (cprov) wrote :

Let's keep it in-progress until it reach RF.

Changed in launchpad-buildd:
assignee: adconrad → cprov
status: Fix Released → In Progress
Revision history for this message
Celso Providelo (cprov) wrote :

RF 6024

Changed in launchpad-buildd:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.