Sorry for not looking at this sooner. There are a few things worth thinking about here.
First of all, the printf() family of functions in libc take 8-bit strings, so having to do an explicit conversion to an 8-bit string might be considered a plus since it makes the conversion explicit.
As for the patch, here's a review of the changes.
1. get_pystring_from_pyobject() will cause an error if non-unicode strings
are passed that include 8-bit characters, since PyUnicode_FromObject()
will try to do an ASCII -> Unicode conversion on them. It would be better
to pass them through as is (remember to INCREF the object in this case).
2. It would be clearer to use PyUnicode_AsUTF8String(unicode) rather than
PyUnicode_EncodeUTF8(), since it is a bit doesn't require poking around
in the PyUnicode struct.
3. There is no need to raise an exception if string == NULL, since
PyUnicode_EncodeUTF8 or PyUnicode_AsUTF8String will have already done so.
In fact, it will probably provide a better error message.
4. pypo_message_set_msgid, set_msgid_plural, etc, need to return NULL if
get_pystring_from_pyobject() returns NULL. Otherwise we run into weird
situations like this:
>>> msg.set_msgid([])
>>> unrelated_code = 42
ValueError: not valid Unicode type
(the exception gets missed the first time, and caught on the next statement
executed).
Other than that, the patch looks okay. I'd suggest adding a test case for passing a non-convertible object to set_msgid() to make sure it raises an exception.
Sorry for not looking at this sooner. There are a few things worth thinking about here.
First of all, the printf() family of functions in libc take 8-bit strings, so having to do an explicit conversion to an 8-bit string might be considered a plus since it makes the conversion explicit.
As for the patch, here's a review of the changes.
1. get_pystring_ from_pyobject( ) will cause an error if non-unicode strings FromObject( )
are passed that include 8-bit characters, since PyUnicode_
will try to do an ASCII -> Unicode conversion on them. It would be better
to pass them through as is (remember to INCREF the object in this case).
2. It would be clearer to use PyUnicode_ AsUTF8String( unicode) rather than EncodeUTF8( ), since it is a bit doesn't require poking around
PyUnicode_
in the PyUnicode struct.
3. There is no need to raise an exception if string == NULL, since EncodeUTF8 or PyUnicode_ AsUTF8String will have already done so.
PyUnicode_
In fact, it will probably provide a better error message.
4. pypo_message_ set_msgid, set_msgid_plural, etc, need to return NULL if pystring_ from_pyobject( ) returns NULL. Otherwise we run into weird
get_
situations like this:
>>> msg.set_msgid([])
>>> unrelated_code = 42
ValueError: not valid Unicode type
(the exception gets missed the first time, and caught on the next statement
executed).
Other than that, the patch looks okay. I'd suggest adding a test case for passing a non-convertible object to set_msgid() to make sure it raises an exception.