Comment 2 for bug 404

Revision history for this message
James Henstridge (jamesh) wrote :

Sorry for not looking at this sooner. There are a few things worth thinking about here.

First of all, the printf() family of functions in libc take 8-bit strings, so having to do an explicit conversion to an 8-bit string might be considered a plus since it makes the conversion explicit.

As for the patch, here's a review of the changes.

1. get_pystring_from_pyobject() will cause an error if non-unicode strings
   are passed that include 8-bit characters, since PyUnicode_FromObject()
   will try to do an ASCII -> Unicode conversion on them. It would be better
   to pass them through as is (remember to INCREF the object in this case).

2. It would be clearer to use PyUnicode_AsUTF8String(unicode) rather than
   PyUnicode_EncodeUTF8(), since it is a bit doesn't require poking around
   in the PyUnicode struct.

3. There is no need to raise an exception if string == NULL, since
   PyUnicode_EncodeUTF8 or PyUnicode_AsUTF8String will have already done so.
   In fact, it will probably provide a better error message.

4. pypo_message_set_msgid, set_msgid_plural, etc, need to return NULL if
   get_pystring_from_pyobject() returns NULL. Otherwise we run into weird
   situations like this:
     >>> msg.set_msgid([])
     >>> unrelated_code = 42
     ValueError: not valid Unicode type
   (the exception gets missed the first time, and caught on the next statement
   executed).

Other than that, the patch looks okay. I'd suggest adding a test case for passing a non-convertible object to set_msgid() to make sure it raises an exception.