innobackupex doesn't copy xtrabackup_logfile if 'UNLOCK TABLES' times out at end of script

Bug #405604 reported by Brad Fino
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Triaged
High
Valentine Gostev
1.6
Triaged
High
Valentine Gostev
2.0
Triaged
High
Valentine Gostev

Bug Description

After it flushes tables with read lock during the the "innobackupex: Finished backing up .frm, .MRG, .MYD, .MYI, .TRG, .TRN, .ARM, .ARZ and .opt files" phase, UNLOCK TABLES will sometimes timeout due to whatever $process may be occurring within my application at the time (maybe an ALTER, massive traffic, etc).

So, I end up with:

xtrabackup: Transaction log of lsn (574 2452056146) to (574 3374977342) was copied.
innobackupex: Error: mysql child process has died: ERROR 2006 (HY000) at line 19: MySQL server has gone away
while waiting for reply to MySQL request: 'UNLOCK TABLES;' at /usr/bin/innobackupex-1.5.1 line 452.

innobackupex dies waiting for unlock tables to complete, but it doesn't copy the xtrabackup_logfile over to the target server.
It just leaves it behind in /tmp on the source server. If i manually copy it over then everything's fine, and recovery completes as normal.

Tags: innobackupex
Changed in percona-xtrabackup:
importance: Undecided → Medium
milestone: none → release-1.2
Changed in percona-xtrabackup:
assignee: nobody → Aleksandr Kuzminsky (akuzminsky)
Changed in percona-xtrabackup:
milestone: 1.2 → 1.3.1
Stewart Smith (stewart)
Changed in percona-xtrabackup:
milestone: 1.3.1 → none
Revision history for this message
Stewart Smith (stewart) wrote :

Is this still reproducible in current xtrabackup? there have been a lot of bug fixes since this bug was filed. Please try to reproduce with a more recent release.

thanks for your report, and sorry it has stagnated for so long.

Changed in percona-xtrabackup:
status: New → Incomplete
Revision history for this message
Kenny Gryp (gryp) wrote :

------
111026 02:37:53 innobackupex: Finished backing
up .frm, .MRG, .MYD, .MYI, .TRG, .TRN, .ARM, .ARZ, .CSV, .CSM and .opt
files

innobackupex: Resuming ibbackup

xtrabackup: The latest check point (for incremental): '191499091912'
log scanned up to (191499360626)
xtrabackup: Stopping log copying thread.......
xtrabackup: Transaction log of lsn (191487786897) to (191499360626) was
copied.
innobackupex: Error: mysql child process has died: ERROR 2006 (HY000) at
line 13: MySQL server has gone away
while waiting for reply to MySQL request: 'UNLOCK TABLES;'
at /usr/bin/innobackupex line 336.
------

===> so wait_timeout reached before the copying of the FRM files was done.

===> There is some mechanism that checks if the connection should be pinged, but it's not really intelligent:
#
# mysql_keep_alive subroutine tries to keep connection to the mysqld database
# server alive by sending a dummy query when the connection has been idle
# for the specified time.
#
sub mysql_keep_alive {
    if ((time() - $mysql_last_access_time) > $mysql_keep_alive_timeout) {
        # too long idle, send a dummy query
        mysql_check();
    }
}

===> default setting is not that good:
# time in seconds after which a dummy query is sent to mysql server
# in order to keep the database connection alive
my $mysql_keep_alive_timeout = 1800;

===> In this case, wait_timeout was set to 100, which is much lower than that default value.

===> HOWEVER, changing $mysql_keep_alive_timeout is not enough because it does run mysql_keep_alive every time a file has been copied so when the file is big and will take longer than 100 seconds, it will fail

===> a (dirty) way that it can be fixed is by applying the patch below and expecting a file not to take longer than 57600 seconds

Revision history for this message
Kenny Gryp (gryp) wrote :
Changed in percona-xtrabackup:
status: Incomplete → New
Stewart Smith (stewart)
Changed in percona-xtrabackup:
importance: Medium → High
assignee: Aleksandr Kuzminsky (akuzminsky) → nobody
tags: added: innobackupex
Revision history for this message
Stewart Smith (stewart) wrote :

We should probably ignore the result of UNLOCK TABLES as it failing/timing out really isn't anything that affects us (tables would be unlocked when we disconnect from server anyway)

Revision history for this message
Alexey Kopytov (akopytov) wrote :

It looks like the real problem was that the client connection timed out before innobackupex issued UNLOCK TABLES, but rather when copying non-InnoDB diles. It's just that the error message is misleading in such a case.

So it is essentially a duplicate of bug #408803 and will be addressed by the same patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.