Launchpad itself

strings should be normalized

Bug #100086 reported by Denis Moyogo Jacquerye on 2007-04-02

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Won't Fix	Low	Unassigned

Bug Description

Unicode strings should be normalized in some form, probably NFC for better legacy compatibility.

Right now translators can type with decomposed or composed characters but launchpad doesn't normalize the strings when saving, nor when searching.

For example, a translator might use a keyboard with precomposed characters such as 'é' and another a keyboard with composed characters such as "é". Launchpad doesn't consider these two to be the same yet Unicode defines them as being equivalent.
Another example is the search, searching "é" or "é" give different results when it should give the same results.

NFC is strongly suggested since it is the form used by the W3C Charater model. See http://www.w3.org/TR/charmod-norm/#sec-NormalizationMotivation

See http://www.unicode.org/reports/tr15/ for more info on normalization of equivalent strings.

Tags:

Matthew Paul Thomas (mpt) on 2007-04-02

Changed in launchpad:
importance:	Undecided → Medium

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2008-09-02:

See here for a function that can do this for us: http://www.python.org/doc/2.4/lib/module-unicodedata.html

Changed in rosetta:
status:	New → Confirmed

Revision history for this message

Данило Шеган (danilo) wrote on 2009-10-31:

I am convinced we should not do it. We are getting translations from multiple sources and it's best if we keep them verbatim so we can better track their history and origin.

Changed in rosetta:
importance:	Medium → Low
status:	Triaged → Won't Fix

Revision history for this message

Denis Moyogo Jacquerye (moyogo) wrote on 2009-11-02:

> I am convinced we should not do it.We are getting translations from
> multiple sources and it's best if we keep them verbatim so we can
> better track their history and origin.

I don't understand how not normalizing helps or how normalizing prevent tracking history.
In any case the issue of normalization remains in Launchpad, notably with searches.

If one user translates using one form and another searches that translation using the other form, it won't match.
For exemple, user A translated using the word "é" (NFC).
User B wants to look for translations using the word "é" (NFD) but doesn't find that of user A.

Strings should be normalized during searches for matching.

Revision history for this message

Данило Шеган (danilo) wrote on 2009-11-13:

Yes, searches should work better. Though, we should rely on our infrastructure to provide that (i.e. Postgres and/or Postgres full-text-search once we start using that).

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.