pt-table-checksum requires recursion when working with and XtraDB Cluster node

Bug #1373937 reported by Andrey Ilyin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Frank Cizmich

Bug Description

I'm trying to use pt-table-checksum to checksum the tables on an XtraDB Cluster setup; I don't need it to actually check the results because we have our own system in place to examine the results in percona.checksums. What I need the tool to do is to connect to a node, run the queries to checksum the data and put the results into percona.checksums, and have these queries replicated across to the other nodes so that I can see if the data on the other nodes differs.

As a minimal test case, I've set up a MySQL sandbox with a cluster inside, and then run a command as follows:
$ pt-table-checksum S=/tmp/mysql_sandbox17301.sock,p=msandbox,u=msandbox --recursion-method=none
<hostname> is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.

This is not the behaviour I have anticipated, so I'm reporting it as a bug. I have in fact managed to patch the tool to let it run through; the patch is attached as 'pt-table-checksum-pxc-no-recurse.patch'. What it does is wrap the 'are there any other nodes?' check with that error message inside an if ( $recursion_method ne 'none' ). After I applied it, the tool ran through the database without error messages, and I've still got to see the differences I introduced between the nodes. Please take a look at it.

Revision history for this message
Andrey Ilyin (jemmix) wrote :
Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Hi Andrey,

To use pt-table-checksum on a cluster you have to specify a "--recursion-method".
This must be either "cluster" or "dsn".
If you set it to "none" the tool will not try to discover any other node. I suspect that with your fix, the tool ran without error, but was unable to find any differences because it didn't find the other nodes.

The easiest method to use is --recursion-method=cluster , since this will autodiscover all the other nodes; BUT... you must make sure that every node has a distinct server_id, otherwise , repeated server_id's will be discarded!
( very soon we'll post a fix for this, but for the moment the server_id's must be different )

The other method is "dsn", which is a bit more involved and requires creating a table with information about the addresses of all the nodes and informing the tool the location of that table via the dsn parameter. Same restriction applies though, server_id's must be different!

You can find info about this here:
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html#cmdoption-pt-table-checksum--recursion-method

Revision history for this message
Andrey Ilyin (jemmix) wrote :

Frank, thank you for your response. However, like I said in the original message, in our specific use case, we don't need pt-table-checksum to actually figure out if there are any inconsistencies. We need it to fill in the checksums table on every node, to be checked later by another script written in-house. To fill in the checksums table, it doesn't actually need to know if there are any other nodes in the cluster. It can send its queries to one node and they will get replicated. This scheme works well for us in non-XtraDB Cluster setups we have.

I realise we can add a --replication-method=cluster argument (which doesn't work with non-cluster setups) or fill in the dsns table for pt-table-checksum to use but this is working around the problem. This way, pt-table-checksum would work slower because it'd have to query the other cluster nodes, and we're dismissing its output anyway because we have another solution to figure out if there are any discrepancies between servers in the replication setup.

Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Andrey,

Yes, now I see I misunderstood your point.
It's worth considering modifying this behavior in the tool. Either accepting the "none" option for clusters, or perhaps adding another clearer option, such as "--single-server-check" or similar.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Verified with pt-table-checksum 2.2.11 and PXC 5.6

root@deb-pxc56-1:~# pt-table-checksum S=/var/run/mysqld/mysqld.sock,p=root,u=root --recursion-method=none
deb-pxc56-1 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.

root@deb-pxc56-1:~# pt-table-checksum --version
pt-table-checksum 2.2.11
root@deb-pxc56-1:~#

Changed in percona-toolkit:
status: New → Confirmed
Changed in percona-toolkit:
status: Confirmed → In Progress
assignee: nobody → Frank Cizmich (frank-cizmich)
milestone: none → 2.2.12
importance: Undecided → Medium
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Now --recursion-method=none is accepted for clusters
Changed Docs to reflect this.

Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-659

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.