Comment 6 for bug 1031954

Revision history for this message
Darrell Bishop (darrellb) wrote :

Also, I think the choice isn't "do we fix this or not", but "what new (de)serialization scheme will we use". I think JSON would be a horrible choice; I'll just get that out there now.

I see two main types of schemes: some external module or custom code w/no new dependencies. They both have plusses and minuses. I think something like msgpack will be more efficient than comparably-flexible pure-python code (my "custom" deserialization is NOT robust to deserializing a different on-disk format, for instance). With the custom encoding, you'd have to version the on-disk format and worry about backwards compatibility for *deserialization*. I don't think pickle or msgpack have to worry about that (doing the right thing with what you deserialized is unavoidable and not directly relevant to the serialization format).

Note that most external tools for (de)serialization (eg. protocol buffers) will probably not handle array.array data natively (msgpack-python explicitly refuses to try). On the other hand, if an external tool did support it, there's a chance the tool would make a different portability/performance trade-off than we would like. You can see one such difference in the two ways I encoded the array.array data for the two msgpack-based implementations. Converting each array.array to a list and letting msgpack encode that, vs converting each array.array to a string and encoding/decoding it without any Unicode encoding has a deserialization performance difference of 10x! One is architecture independent (the slower, list-based one) and one is equally-architecture dependent as the Python 2.6 pickling (it's basically a memory dump following the byte-ordering of the architecture).

Anyway, just some more food for thought on the issue :)