unzip should use encoding according to locale, not utf-8

Bug #203609 reported by Barosl LEE
46
This bug affects 9 people
Affects Status Importance Assigned to Milestone
unzip (Debian)
Confirmed
Unknown
unzip (Ubuntu)
Confirmed
Wishlist
Unassigned

Bug Description

As ZIP files doesn't include information on the encoding of the filenames, most of ZIP archivers use native(system) encoding for it. This is why the ZIP files archived on Windows can't be unarchived on Linux. For example, Korean version of Windows uses 'cp949(extended euc-kr)' encoding to zip and unzip the files. Japanese version of Windows uses 'shift-jis', and so on.

Recently, encoding selection options are added to unzip. Two of them can be controlled by environment variables.

export UNZIP='-O cp949'
export ZIPINFO='-O cp949'

These settings let unzip use cp949 instead of utf-8, the native linux encoding, and improve compatibility with Windows.

So I propose that Ubuntu should include the settings above according its locale. If the system uses ko_KR.UTF-8, cp949 should be selected. For ja_JP.UTF-8, shift-jis should be used. zh_CN and other locales also can be configured.

Revision history for this message
Barosl LEE (barosl) wrote :

Here is a sample file. '한국어.zip' containing '한국어.txt' in cp949 encoding, archived on Windows.

Revision history for this message
Barosl LEE (barosl) wrote :

Here is a sample file. '日本語.zip' containing '日本語.txt' in shift-jis encoding, archived on Japanese version Windows.

Emmet Hikory (persia)
Changed in unzip:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Emmet Hikory (persia) wrote :

Some work towards fixing this appears as part of the solution to bug #10979. Perhaps the definition of a greater number of encoding matches around line 1700 in unix/unix.c would help to increase the number of supported encodings.

Revision history for this message
Dmitry Agafonov (dmitry-agafonov) wrote :

I guess we should make this bug as duplicate of https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/477755

Changed in unzip (Debian):
status: Unknown → Confirmed
Revision history for this message
Unxed (unxed) wrote :

Wrote a patch for unzip fixing this issue:
https://sourceforge.net/p/infozip/patches/29/

The same patch for p7zip:
https://sourceforge.net/p/p7zip/bugs/187/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.