Change stuck in queue when Gearman errors during merge:merge submit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zuul |
In Progress
|
Undecided
|
Antoine "hashar" Musso |
Bug Description
Zuul tends to be stuck processing new changes from time to time. I have the issue today that Zuul merge client was not able to connect to Zuul internal Gearman server:
2014-08-18 19:25:42,434 ERROR gear.Client.
<gear.Connection 0x1a8e710 host: x.y.z.a port: 4730>
timed out waiting for a response to a submit job request:
<gear.Job 0x7f195db8ff10 handle: None name: merger:merge unique: 18e85d5f5c9c407
2014-08-18 19:25:42,435 ERROR zuul.Scheduler: Exception in run handler:
Traceback (most recent call last):
File "/usr/local/
while pipeline.
File "/usr/local/
item, nnfi, ready_ahead)
File "/usr/local/
ready = self.prepareRef
File "/usr/local/
item.
File "/usr/local/
self.
File "/usr/local/
self.
File "/usr/lib/
conn = self.getConnect
File "/usr/lib/
raise NoConnectedServ
NoConnectedSer
That causes all merges to fail and the changes to be stuck in the queue. The Zuul status board shows that changes are being enqueued and none are processed.
I disconnected Jenkins Gearman Plugin and reconnected it. Noticed a few:
2014-08-18 21:04:02,521 ERROR gear.Client.
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/local/
if build._
AttributeError: 'str' object has no attribute '_ZuulGearmanCl
Then reconnected Jenkins Gearman plugin.
New Changes entering the pipelines were then properly processed but a lot were stuck in the queues. I noticed in the log a lot of merge jobs
2014-08-18 21:16:52,029 INFO zuul.MergeClient: Merge <gear.Job 0x1a9a290 handle: H:208.80.
2014-08-18 21:16:52,029 WARNING zuul.Scheduler: Build set <zuul.model.
It seems Zuul attempt to rerun merge operation, which are successful, but refuse to updates the BuildSet because it is outdated.
The way I have fixed it is by using the 'zuul promote' client command on some changes. That apparently reset the BuildSet of a Change and unstuck the changes in a given pipeline.
Not sure how helpful this bug report is nor what to do from there.
python-gear: 0.5.5
Zuul: 9a95e71 (Merge "Update gerrit change attributes even if merged" which is very recent).
summary: |
- Internal Gearman server timeout for merge operations + Change stuck in queue when Gearman errors during merge:merge submit |
We had another occurrence this time with:
NoConnectedSe rversError: No connected Gearman servers
Still when attempting to submit a merge:merge job.
The issue appear to be in zuul.scheduler. BasePipelineMan ager.prepareRef () it set the merge state to PENDING before the job submission has been properly finished. Pseudo code:
def prepareRef(): merge_state == build_set.PENDING:
if build_set.
return False
build_ set.merge_ state = build_set.PENDING sched.merger. mergeChanges( ..)
self.
If an exception is thrown, the build_set is still in PENDING state and we will never attempt to propose again a merge:merge job because of the early return.
prepareRef should thus set the merge state after self.sched. merger. mergeChanges( ).
python-gear: 0.5.5
Zuul: c9d11ab (Merge "Rename doc environment to docs") from Sept 16th