pt-online-schema-change can break replication

Bug #933232 reported by gu lei
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

pt-online-schema-change h=127.0.0.1,P=3306,t=db.tiny_url --alter "drop key url","add key url(url(80))" --sleep 0.1

MySQL version:5.1.55-rel12.6-log Percona Server with XtraDB (GPL), Release 12.6, Revision 200

on slave:

show slave status\G

                 Master_Host: 192.168.1.1
                  Master_User: repuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: crtalk1-bin.009260
          Read_Master_Log_Pos: 125939386
               Relay_Log_File: tw_usr-relay-bin.029288
                Relay_Log_Pos: 122264451
        Relay_Master_Log_File: crtalk1-bin.009260
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1146
                   Last_Error: Error 'Table 'db.__tmp_tiny_url' doesn't exist' on opening tables
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 122264304
              Relay_Log_Space: 125939734
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1146
               Last_SQL_Error: Error 'Table 'db.__tmp_tiny_url' doesn't exist' on opening tables

binlog on master:
mysqlbinlog -v -v --base64-output=DECODE-ROWS crtalk1-bin.009261 | grep -B 1 -A 10 __tmp|more
#120216 0:00:27 server id 152233306 end_log_pos 29260 Table_map: `db`.`tiny_url` mapp
ed to number 278651
#120216 0:00:27 server id 152233306 end_log_pos 29324 Table_map: `db`.`__tmp_tiny_url
` mapped to number 278645
#120216 0:00:27 server id 152233306 end_log_pos 29420 Write_rows: table id 278651
#120216 0:00:27 server id 152233306 end_log_pos 29516 Write_rows: table id 278645 flags: S
TMT_END_F
### INSERT INTO db.tiny_url
### SET
### @1=88801080 /* INT meta=0 nullable=0 is_null=0 */
### @2='http://15449413.blog.hexun.com/62026240_d.html' /* VARSTRING(400) meta=400 nullable=0 is_n
ull=0 */
### @3='--' /* VARSTRING(18) meta=18 nullable=0 is_null=0 */
### @4=0 /* LONGINT meta=0 nullable=0 is_null=0 */
### @5='00:00:27' /* TIME meta=0 nullable=0 is_null=0 */
### INSERT INTO db.__tmp_tiny_url
### SET
### @1=88801080 /* INT meta=0 nullable=0 is_null=0 */
### @2='http://15449413.blog.hexun.com/62026240_d.html' /* VARSTRING(400) meta=400 nullable=0 is_n
ull=0 */
### @3='--' /* VARSTRING(18) meta=18 nullable=0 is_null=0 */
### @4=0 /* LONGINT meta=0 nullable=0 is_null=0 */
### @5='00:00:27' /* TIME meta=0 nullable=0 is_null=0 */
# at 29516
#120216 0:00:27 server id 152233306 end_log_pos 29543 Xid = 38657832443
COMMIT/*!*/;
# at 29543

__tmp_tiny_url created only on master but triggers wrote binlog so error occured on slave.

Related branches

tags: added: pt-online-schema-change risk triggers
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I misread the initial report. There aren't triggers, but pt-online-schema changes create a temp table on a master with SQL_LOG_BIN=0, then when binary logging is re-enabled, it does DROP TABLE <temp table>, which breaks replication. t/pt-online-schema-change/basics.t causes this problem, too.

Changed in percona-toolkit:
importance: Undecided → High
status: New → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
milestone: none → 2.0.4
tags: added: breaks-replication
removed: triggers
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I think I mislead myself again. basics.t was breaking replication but due to itself, not due to pt-online-schema-change. I fixed basics.t and added 2 tests to test that pt-online-schema-change does not replicate anything by default (SQL_LOG_BIN=0), and these tests passed without changing the tool's code.

Therefore, gu lei: can you confirm that the problem is that your table has triggers which caused the DROP TABLE statement to be replicated?

If nothing else, I added "IF EXISTS" to two DROP TABLE statements in the code. However, if triggers are still at work, the RENAME TABLE statements may also break replication, but there's nothing we can do about this.

Changed in percona-toolkit:
status: In Progress → Triaged
summary: - pt-online-schema-change break replication because of binlog of triggers
+ pt-online-schema-change and triggers break replication
Revision history for this message
Baron Schwartz (baron-xaprb) wrote : Re: pt-online-schema-change and triggers break replication

I think that the real fix for this is not to be disabling binary logging. I've seen this go wrong enough times that I really don't want our tools to do it, especially not by default. However, I've been reluctant to suggest changing this until I get enough time to redesign the whole tool, which I believe is needed. I don't want to make a change if it's only a halfway solution.

Revision history for this message
gu lei (gulei-om) wrote :

Hi, Daniel Nicher:

Thanks.

The triggers are created by pt-online-schema-change.

Also the triggers would write binlog if any rows changed when pt-online-schema-change is working.

You can insert one row to the table when pt-online-schema-change is working. Then replication would be broken.

Hi, Baron Schwartz:

Thanks.

I think people may need some guide in http://www.percona.com/doc/percona-toolkit/2.0/pt-online-schema-change.html before you get a good solution. For example, users should create __tmp_<table name> table on slaves before start pt-online-schema-change.

pt-online-shcema-change is a good tool.

Revision history for this message
gu lei (gulei-om) wrote :

Hi, Baron Schwartz:

A suggestion:

Create and alter __tmp table before set sql_log_bin=0.

After rename __tmp table:

set sql_log_bin=0;

create __tmp table

set sql_log_bin=1;

drop __tmp table

Changed in percona-toolkit:
status: Triaged → In Progress
Revision history for this message
gu lei (gulei-om) wrote :

Create and alter __tmp table before set sql_log_bin=0.

set sql_log_bin=0

create triggers

insert into __tmp select from ....

rename __tmp table:

create __tmp table before set sql_log_bin=1

set sql_log_bin=1;

drop __tmp table

So slave would not be broken and __tmp table on slave would be dropped.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I was able to reproduce this by add SET SQL_LOG_BIN=0 to the sql file for alter_active_table.t. At present, the test file loads the table to be altered on the master which replicates to the slave. Therefore, the changes made by the triggers don't cause an error because the slave also has the table.

After talking with Baron, and in view of the fact that we plan to redesign this tool in the future, for the current version (2.0.x), we are going to add an --execute option that the user must specify before the tool will actually work. This is to encourage users to read the docs, which we'll also update to mention that altering a table on a master that doesn't exist on a slave will cause a problem, other issues with replication, etc.

summary: - pt-online-schema-change and triggers break replication
+ pt-online-schema-change breaks replication
Changed in percona-toolkit:
milestone: 2.0.4 → none
Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
milestone: none → 2.0.4
summary: - pt-online-schema-change breaks replication
+ pt-online-schema-change can break replication (add --execute)
summary: - pt-online-schema-change can break replication (add --execute)
+ pt-online-schema-change can break replication
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Is this fix released?

Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-297

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.