resulting base64 in http-client is wrong

Bug #1002867 reported by David Graf
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
Fix Released
Critical
Paul J. Lucas

Bug Description

I am having a query (see attachment) which imports the same image with the file and the http-client module. Unfortunately, the resulting base64 values are different. The file module works properly. The imported base64 string is correct. But the one coming from the http-client module is broken.

Before running the query, you need to download the image file into the directory where the query is located. e.g. wget http://dl.dropbox.com/u/1004639/square.png.

Related branches

Revision history for this message
David Graf (davidagraf) wrote :
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

The query imports base64, but it's not used explicitly anywhere. How does the query "know" to output in base64?

Changed in zorba:
status: New → Opinion
Revision history for this message
David Graf (davidagraf) wrote :

>> The query imports base64, but it's not used explicitly anywhere.
Sorry, my fault. I forgot to remove the unused module.

>> How does the query "know" to output in base64?
I do not really understand. The results of file:read-binary and http-client:send-request($request)[2] are base64 items. Therefore, the query outputs base64, no?

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

Why should the results be base64 items? The results should be raw binary data. If printed to a terminal, I should expect to get what looks like garbage but is in fact the raw binary data. base64 is a TEXT representation of binary data.

Revision history for this message
David Graf (davidagraf) wrote :

The default serialization of zorba is XML (I guess). Therefore, it cannot return binary data. Therefore, it returns the base64 string. I think zorba doesn't have an option to use binary serialization. Right?

Changed in zorba:
status: Opinion → In Progress
Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

I figured out part of the problem. The HTTP spec says that the default character encoding is ISO 8859-1, so in my previous change, I always set the character encoding to that. If you specify an override-media-type such as "application/octet-stream" but do NOT include a character encoding, it therefore defaults to ISO 8859-1 that, in this case, is wrong. The bytes need to pass through untouched.

I therefore think I need to make it so that if you provide NO "charset=..." in the override-media-type, then the character encoding shall remain empty and no transcoding will take place.

Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

It seems your investigation makes sense. However, I think that the solution should be to not do any transcoding for any non-textual content (even if the user gives one).

Revision history for this message
Matthias Brantner (matthias-brantner) wrote :

@David: I don't think we should call the concept of decoding base64Binary a serialization method. IMHO, decoding doesn't have anything to do with the serializer. It's an orthogonal concept. For example, the file module provides file:write-binary functions that implicitly decode base64Binary items.

Changed in zorba:
status: In Progress → Fix Committed
Changed in zorba:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.