pt-stalk --no-stalk and --iterations 1 don't wait for the collect

Bug #1070434 reported by Daniel Nichter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Daniel Nichter

Bug Description

A pt-stalk does:

$trunk/bin/pt-stalk --iterations 1 --dest $dest --variable Uptime --threshold $threshold --cycles 2 --run-time 2 --pid $pid_file -- --defaults-file=$cnf >$log_file 2>&1

It means to test --run-time, but it was failing sporadically. Turns out, on _fast_ systems (a rare case where being slow actually makes the test work) pt-stalk runs, triggers, collect subprocess starts, then pt-stalk exists because there's no more iterations. When the tool exists, it kills the collect subprocess, so nothing is collected.

Related branches

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Correction: it doesn't kill the collect subprocess, it just messes up testing because the tool finishes yet there are still collector subprocesses running.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

t/pt-stalk passes on all boxes with all the new waiting, though the Ubuntu 12 box is pretty damn slow, slower than we can even reasonably wait for.

summary: - pt-stalk --iterations 1 may not collect
+ pt-stalk --no-stalk or --iterations 1 may not collect
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Daniel Nichter (daniel-nichter) wrote : Re: pt-stalk --no-stalk or --iterations 1 may not collect

So the "fix" was to wait --run-time * 3 before exiting. As the new docu says, this usually won't happen because the tool runs forever by default, else if running --no-stalk or --iterations 1, then unless the system is *really* slow, the wait in collect() (bug 1047701) should have already killed anything. At worse it just means the tool takes some more time to exit, but then again, if a processes is hung or spinning out of control, this will kill it explicitly (else it may continue to run, then zombie, etc.)

summary: - pt-stalk --no-stalk or --iterations 1 may not collect
+ pt-stalk --no-stalk and --iterations 1 don't wait for the collect
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-589

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.