convert gerrit's mysql tables to utf8

Bug #979227 reported by James E. Blair
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
Medium
Khai Do

Bug Description

To address the problems noted here: http://code.google.com/p/gerrit/issues/detail?id=1101 we should convert the gerrit mysql tables from latin1 to utf8.

This will involve understanding how any latin1 or utf8 data are currently stored in the tables, and determining the right way to convert that data.

It will also involve making sure that any tools other than gerrit (like the launchpad sync script) do the correct thing.

Tags: gerrit
James E. Blair (corvus)
summary: - convert mysql tables to utf8
+ convert gerrit's mysql tables to utf8
Revision history for this message
Monty Taylor (mordred) wrote :

I feel like this might have been sorted, or a solution for it?

Changed in openstack-ci:
assignee: nobody → Darragh Bailey (dbailey-k)
milestone: none → folsom
Revision history for this message
Darragh Bailey (dbailey-k) wrote :

A partial solution at least. Script attached was used to test and perform the switch over to utf8 within work. I can't confirm whether there may be issues with other services accessing the same DB subsequently, but it did work as expected within Gerrit.

Only place I noticed any charset conversion issues, is within the patch_comments table. However I suspect depending on usage, the following is a more complete list of table.fields that need to be checked for conversion issues:
account_agreements.review_comments
account_group_agreements.review_comments
account_groups.description
change_messages.message
patch_comments.message

Those are the only text fields which are in any way free form. All the remaining varchar fields are restricted to what can be accepted into them due to usage.

Monty Taylor (mordred)
Changed in openstack-ci:
milestone: folsom → grizzly
assignee: Darragh Bailey (dbailey-k) → nobody
importance: Low → Medium
Revision history for this message
arje (arje) wrote :

Hi Darragh,

just wanted to say that you saved me a bit of headache before my end of workday. Works smoothly at my end. Thanks for your good work.

Regards,

Armin

James E. Blair (corvus)
Changed in openstack-ci:
milestone: grizzly → havana
Clark Boylan (cboylan)
Changed in openstack-ci:
milestone: havana → icehouse
Revision history for this message
Clark Boylan (cboylan) wrote :

Maybe we can tack this onto the Gerrit upgrade to 2.8 or 2.9. Since there will already be DB migrations.

Khai Do (zaro0508)
Changed in openstack-ci:
assignee: nobody → Khai Do (zaro0508)
Revision history for this message
Khai Do (zaro0508) wrote :

I've run Darragh's script to convert tables on review-dev.o.o and everything seems to be ok. I ran the script after upgrading to gerrit 2.8

fungi suggested we should convert to tables on review.o.o to UTF8 before upgrading to gerrit 2.8. jeblair says we should ask mordred as well.

Revision history for this message
Khai Do (zaro0508) wrote :

I tested the utf8 conversion from a mysql dump of the review.o.o data. I found that the existing tables are in 'latin1_general_cs' collation. This is a case sensitive collation which would convert well to utf8_general_cs collation however utf8_general_cs collation is NOT available for mysql (http://bugs.mysql.com/bug.php?id=65830). Converting to utf8_general_ci will not work because there are tables that contain duplicate (case insensitive) data so when attempting to insert the data back into the database duplicate entry errors occur. The other option is to convert to utf8_bin collation which will convert just fine. However the order by clause for utf8_bin returns an unnatural order ( https://stackoverflow.com/questions/5526334/what-effects-does-using-a-binary-collation-have) so I'm not sure whether it's safe to use utf8_bin collation.

Revision history for this message
Khai Do (zaro0508) wrote :
Revision history for this message
Khai Do (zaro0508) wrote :

We discussion this topic and have agreed to convert to utf8_bin collation. log of the discussion at http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-03-04-19.01.log.html

Revision history for this message
Khai Do (zaro0508) wrote :

postpone fix until after gerrit upgrade bug #1082781

Revision history for this message
Khai Do (zaro0508) wrote :

Neither I nor pelix can reproduce the problem reported in upstream issue 1101 on review.o.o which runs gerrit 2.8. I think it's probably something to do with the new lucene search backend that's now in ver 2.8. I do not think we need to go thru the trouble of converting to utf8 until we can identify a real problem with the current collation.

Khai Do (zaro0508)
Changed in openstack-ci:
status: Triaged → Fix Released
Revision history for this message
Khai Do (zaro0508) wrote :
Revision history for this message
Khai Do (zaro0508) wrote :

Ignore the previously attached db conversion scripts. I have modified the script further, the latest version is on my github account [1]. I have used it to test db migration to utf8 using openstack's production data. It was tested with MySQL ver 5.5.41-0ubuntu0.12.04.1

[1] https://github.com/zaro0508/gerrit-db-migration

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.