For me the description in the bug is confusing as you can encode every binary file into UTF-8 as it is 8 bit clean.
The error message I get is
ERROR: 'ascii' codec can't decode byte 0xdc in position 24: ordinal not in range(128)
Therefore for me it sounds as if the code tries to transcode the file from ASCII to UTF-8. As ASCII only allows 7 bits every byte with the high bit set cannot be interpreted. You get the same effect when trying this:
IMHO the root problem is the interpretation of a file as some binary string encoding. If there is a need to interpret is as a string, I suggest to use an 8 bit clean charset like ISO-8859-1:
$ recode iso8859-1..utf8 < /bin/ping > /dev/null
works fine. Certainly this would break multi-byte UTF-8 characters in text files iff they are not to be copied somewhere but interpreted as text. But this is the fundamental problem: binary files should be treated as such and not interpreted as text while text files should be treated with the encoding - e.g. UTF-8 - configured for the platform.
For me the description in the bug is confusing as you can encode every binary file into UTF-8 as it is 8 bit clean.
The error message I get is
ERROR: 'ascii' codec can't decode byte 0xdc in position 24: ordinal not in range(128)
Therefore for me it sounds as if the code tries to transcode the file from ASCII to UTF-8. As ASCII only allows 7 bits every byte with the high bit set cannot be interpreted. You get the same effect when trying this:
$ recode ascii..utf8 < /bin/ping > /dev/null 4-1968. .UTF-8'
recode: Invalid input in step `ANSI_X3.
Also you cannot interpret binary files as UTF-8 as not all byte combinations are valid:
$ recode utf8..iso8859-1 < /bin/ping > /dev/null
recode: Invalid input in step `UTF-8..ISO-8859-1'
IMHO the root problem is the interpretation of a file as some binary string encoding. If there is a need to interpret is as a string, I suggest to use an 8 bit clean charset like ISO-8859-1:
$ recode iso8859-1..utf8 < /bin/ping > /dev/null
works fine. Certainly this would break multi-byte UTF-8 characters in text files iff they are not to be copied somewhere but interpreted as text. But this is the fundamental problem: binary files should be treated as such and not interpreted as text while text files should be treated with the encoding - e.g. UTF-8 - configured for the platform.