When a table has a compound primary key, and the first item of that key is also not unique (either enforced through a constraint or not-enforced, but still unique due to other methods), there is a chance for data to be lost when copying to the new table.
This issue exists with 2.2.13+. It may exist with earlier versions, but I have not tested.
Consider the following schema:
CREATE TABLE `orgFeatures` (
`orgId` char(20) NOT NULL,
`instanceId` char(20) NOT NULL,
`feature` enum('FOO','BAR','BAT','BAZ','CAT','DOG','DERP','HERP','VANILLA','CHOCOLATE','MINT') NOT NULL DEFAULT 'FOO',
`isSupported` bit(1) NOT NULL,
`isEnabled` bit(1) NOT NULL,
PRIMARY KEY (`instanceId`,`feature`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Altering this table using pt-osc results in a small amount of data loss.
Here is some debug output. Notice in the lines that I have commented, the lower bound does not match the upper bound of the line above it.
INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ lower boundary: 0oa1cych5vs3jdto20x7 0oa1cych5vs3jdto20x7 FOO upper boundary: 0oa1kcu1rZUONQAMHFKX 0oa1kcu1rZUONQAMHFKX FOO
# INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ params: 0oa1kcu3bCTCTPDWEVLB, 0oa1kcu3bCTCTPDWEVLB, BAR, 0oa1t5f0ulqpCp1z80x7, 0oa1t5f0ulqpCp1z80x7, BAR
# INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ lower boundary: 0oa1kcu3bCTCTPDWEVLB 0oa1kcu3bCTCTPDWEVLB BAR upper boundary: 0oa1t5f0ulqpCp1z80x7 0oa1t5f0ulqpCp1z80x7 BAR
INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ params: 0oa1t5f0ulqpCp1z80x7, 0oa1t5f0ulqpCp1z80x7, BAZ, 0oa26z4oytDgBtxhM0x7, 0oa26z4oytDgBtxhM0x7, BAT
INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ lower boundary: 0oa1t5f0ulqpCp1z80x7 0oa1t5f0ulqpCp1z80x7 BAZ upper boundary: 0oa26z4oytDgBtxhM0x7 0oa26z4oytDgBtxhM0x7 BAT
# INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ params: 0oa26z56wQCYIZUODNAS, 0oa26z56wQCYIZUODNAS, FOO, 0oa2mruemwNu34cwH0x7, 0oa2mruemwNu34cwH0x7, CAT
# INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ lower boundary: 0oa26z56wQCYIZUODNAS 0oa26z56wQCYIZUODNAS FOO upper boundary: 0oa2mruemwNu34cwH0x7 0oa2mruemwNu34cwH0x7 CAT
INSERT LOW_PRIORITY IGNORE INTO `testing`.`_orgFeatures_new` (`orgid`, `instanceid`, `feature`, `issupported`, `isenabled`) SELECT `orgid`, `instanceid`, `feature`, `issupported`, `isenabled` FROM `testing`.`orgFeatures` FORCE INDEX(`PRIMARY`) WHERE ((`instanceid` > ?) OR (`instanceid` = ? AND `feature` >= ?)) AND ((`instanceid` < ?) OR (`instanceid` = ? AND `feature` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 29182 copy nibble*/ params: 0oa2mruemwNu34cwH0x7, 0oa2mruemwNu34cwH0x7, DOG, 0oa34pwo3dGMtIvOp0x7, 0oa34pwo3dGMtIvOp0x7, VANILLA
Just a comment to clarify the issue.
The problem described here is not because the primary key is compound nor because one of the fields admits duplicates.
The problem is pt-osc uses "SORT" to calculate the chunk's boundaries, like in this example from my tests:
# NibbleIterator:5464 15394 Upper boundary statement: SELECT /*!40001 SQL_NO_CACHE */ `instance_id`, `instance_id`, `feature` FROM `test`.`o1` FORCE INDEX(`PRIMARY`) WHERE ((`instance_
id` > ?) OR (`instance_id` = ? AND `feature` >= ?)) ORDER BY `instance_id`, `feature` LIMIT ?, 2 /*next chunk boundary*/
According to the documentation: http:// dev.mysql. com/doc/ refman/ 5.6/en/ enum.html# enum-sorting
"ENUM values are sorted based on their index numbers, which depend on the order in which the enumeration members were listed in the column specification. For example, 'b' sorts before 'a' for ENUM('b', 'a'). The empty string sorts before nonempty strings, and NULL values sort before all other enumeration values.
To prevent unexpected results when using the ORDER BY clause on an ENUM column, use one of these techniques:
Specify the ENUM list in alphabetic order.
Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col)."
Since in the example the features column was defined as `feature` enum('FOO' ,'BAR', 'BAT',' BAZ','CAT' ,'DOG', 'DERP', 'HERP', 'VANILLA' ,'CHOCOLATE' ,'MINT' ) (non-alphabetical order), pt-osc's Nibbler fails to calculate the boundaries for each chunk.