Launchpad itself

Consistent retry failures on web UI

Bug #560422 reported by Scott Kitterman on 2010-04-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Unassigned	Launchpad itself 10.04

Bug Description

I have attempted to retry https://launchpad.net/ubuntu/+source/compiz/1:0.8.4-0ubuntu15/+build/1682401/+retry several times and have gotten an oops each time. Error ID: OOPS-1562O184 is the most recent. This impacts the ability of non-Canonical archive-admins to get the archive in a consistent state prior to release.

Tags:

Revision history for this message

William Grant (wgrant) wrote on 2010-04-11:

IntegrityError: duplicate key value violates unique constraint "buildpackagejob__build__key"

The build failed without removing the BPJ, BQ and Job? I am scared.

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2010-04-12:

Indeed does sound scary... I'd initially hoped another possibility could have been that the records were removed, but the first retry OOPS may have left the build in an inconsistent state, but this doesn't seem possible either (build.retry() updates the buildstate before creating the new queue records).

Changed in soyuz:
status:	New → Confirmed
importance:	Undecided → High
milestone:	none → 10.04

Julian Edwards (julian-edwards) on 2010-04-12

Changed in soyuz:
status:	Confirmed → Triaged

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2010-04-12:

I just got a chance to check the logs, and there's nothing of interest there either, other than the repetitive:

2010-04-09 11:42:25+0100 [QueryWithTimeoutProtocol,client] <floe:http://floe.buildd:8221/> marked as done. [4]

As to cause, at the moment all I can see is a number of things that could fail between setting the buildstate and deleting the related queue record etc.:

{{{
        self.buildstate = BuildStatus.FAILEDTOBUILD
        self.storeBuildInfo(librarian, slave_status)
        self.buildqueue_record.builder.cleanSlave()
        self.notify()
        self.buildqueue_record.destroySelf()
}}}

As to a temporary fix, I'll organise to have the offending records deleted first thing tomorrow (unless someone beats me to it).

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2010-04-12: Re: [Bug 560422] Re: Consistent retry failures on web UI

On Monday 12 April 2010 17:16:30 Michael Nelson wrote:
> {{{
> self.buildstate = BuildStatus.FAILEDTOBUILD
> self.storeBuildInfo(librarian, slave_status)
> self.buildqueue_record.builder.cleanSlave()
> self.notify()
> self.buildqueue_record.destroySelf()
> }}}

One explanation if indeed this piece of code is getting called is that
something is doing a commit() in storeBuildInfo(), cleanSlave() or notify(),
before failing and aborting the rest of the txn.

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2010-04-13:

Just for the record, the query to gather all the info:
https://pastebin.canonical.com/30473/

The result:
https://pastebin.canonical.com/30474/

And then deleting the offending records:
https://pastebin.canonical.com/30475/

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2010-04-13:

And the build is now pending.

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2010-04-29:

Did you get any more of these Scott?

I've removed the cron job (queue-builder) that kicked off a script that adds missing build records, it was conflicting with the build dispatcher in unpredictable ways.

If there were no more problems then I'll close this bug.

Changed in soyuz:
status:	Triaged → Incomplete

Revision history for this message

Scott Kitterman (kitterman) wrote on 2010-04-29:

It's been intermittent, so it's hard to know. I'd say go ahead and close it and I'll file a new bug if it comes up again.

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2010-04-29:

Ok thanks Scott.

Changed in soyuz:
status:	Incomplete → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.