ptc 2.0 --resume with --tables does not always work

Bug #898318 reported by aeva black
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

Version: http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/pt-table-checksum-2.0/revision/240

Description:

Combining --resume with a limited set of --tables, pttc will not do anything when it should resume checksumming a partially checksummed table, if a table with an alphabetically-greater name is already completed.

How to repeat:

Start with an empty --replicate table.

First, checksum table "foo" (using option --tables foo), and allow checksum to complete. Then, checksum table "bar" (with option --tables bar), but stop it before completion. Then, try to "--resume --tables bar"; pttc 2.0 will not do anything. Debug output shows that it is trying to resume table "foo", which was completely checksummed, so it does nothing.

Clear the --replicate table and start again. This time, checksum "bar" to completion, then checksum "foo" and abort part way, then checksum "--resume --tables foo". This time it will resume properly.

I believe this bug is due to --resume assuming an alphabetical order to table checksumming, and not properly limiting itself to the supplied list in --tables.

Suggested fix:

--resume should only consider the specified list of --databases / --tables, and not anything else.

Related branches

Changed in percona-toolkit:
importance: Undecided → High
assignee: nobody → Daniel Nichter (daniel-nichter)
milestone: none → 2.0-beta1
tags: added: filters pt-table-checksum resume
summary: - pttc 2.0 --resume with --tables does not always work
+ ptc 2.0 --resume with --tables does not always work
Changed in percona-toolkit:
status: New → In Progress
milestone: 2.0-beta1 → none
Changed in percona-toolkit:
milestone: none → 2.0-beta1
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

For the record, this required changing how sub last_chunk() works, the SQL it executes, and we added an index over (ts,db,tbl) to the checksums table. Previously, we selected the max ts from the table, then got the row associated with that ts, but we have to use <= instead of = because of a bug in MySQL. But given that we also had to order those results DESC, in this cases that put the wrong table at the top of the list. So we would get the correct max ts but then the wrong table. This is fixed and tested now:

ok 46 - Checksum results partial t1, full t2
ok 47 - Resume from t1 when t2 is done

Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-286

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.