PyGettextPO is not able to handle Unicode strings

Bug #404 reported by Carlos Perelló Marín
14
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Carlos Perelló Marín

Bug Description

I saw a problem with pygettextpo validation and thus I wrote a test to know if it was a problem with it or with Rosetta.

The test is:
def testUnicodeString(self):
    """Test that a translation with unicode chars is working."""
    msg = gettextpo.PoMessage()
    msg.set_msgid(u'Carlos Perell\xf3 Mar\xedn')
    msg.se...

I saw a problem with pygettextpo validation and thus I wrote a test to know if it was a problem with it or with Rosetta.

The test is:
def testUnicodeString(self):
    """Test that a translation with unicode chars is working."""
    msg = gettextpo.PoMessage()
    msg.set_msgid(u'Carlos Perell\xf3 Mar\xedn')
    msg.set_msgstr(u'Carlos Perell\xf3 Mar\xedn')

And the test run output:

carlos@frodo:~/Work/dists/launchpad/sourcecode/pygettextpo$ python ./test_gettextpo.py
.......E....
======================================================================
ERROR: Test that a translation with unicode chars is working.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_gettextpo.py", line 115, in testUnicodeString
    msg.set_msgid(u'Carlos Perell\xf3 Mar\xedn')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 13: ordinal not in range(128)

----------------------------------------------------------------------
Ran 12 tests in 0.021s

FAILED (errors=1)

Changed in launchpad:
assignee: nobody → carlos
status: New → PendingUpload
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

I have a patch for this.

It's pending to be reviewed and merged into rocketfuel.

It's located at:
<email address hidden>/pygettextpo--devel--0--patch-3

Changed in launchpad:
status: PendingUpload → Fixed
Revision history for this message
James Henstridge (jamesh) wrote :

Sorry for not looking at this sooner. There are a few things worth thinking about here.

First of all, the printf() family of functions in libc take 8-bit strings, so having to do an explicit conversion to an 8-bit string might be considered a plus since it makes the conversion explicit.

As for the patch, here's a review of the changes.

1. get_pystring_from_pyobject() will cause an error if non-unicode strings
   are passed that include 8-bit characters, since PyUnicode_FromObject()
   will try to do an ASCII -> Unicode conversion on them. It would be better
   to pass them through as is (remember to INCREF the object in this case).

2. It would be clearer to use PyUnicode_AsUTF8String(unicode) rather than
   PyUnicode_EncodeUTF8(), since it is a bit doesn't require poking around
   in the PyUnicode struct.

3. There is no need to raise an exception if string == NULL, since
   PyUnicode_EncodeUTF8 or PyUnicode_AsUTF8String will have already done so.
   In fact, it will probably provide a better error message.

4. pypo_message_set_msgid, set_msgid_plural, etc, need to return NULL if
   get_pystring_from_pyobject() returns NULL. Otherwise we run into weird
   situations like this:
     >>> msg.set_msgid([])
     >>> unrelated_code = 42
     ValueError: not valid Unicode type
   (the exception gets missed the first time, and caught on the next statement
   executed).

Other than that, the patch looks okay. I'd suggest adding a test case for passing a non-convertible object to set_msgid() to make sure it raises an exception.

Curtis Hovey (sinzui)
visibility: private → public
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/404

tags: added: iso-testing
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.