bug: 128496 title: Unable to open native working tree with non-ascii filenames date-reported: Thu, 26 Jul 2007 11:32:47 -0000 date-updated: Tue, 23 Sep 2008 02:35:19 -0000 reporter: Wouter van Heyst (larstiq) duplicate-of: duplicates: 185401 245412 attachments: patches: tags: subscribers: Wouter van Heyst (larstiq) Martin von Gagern (gagern) task: bzr status: Fix Released date-created: Sun, 08 Jun 2008 13:22:31 -0000 date-left-new: Fri, 04 Jul 2008 09:20:23 -0000 date-confirmed: Fri, 04 Jul 2008 09:20:23 -0000 date-triaged: Fri, 04 Jul 2008 09:20:23 -0000 date-assigned: Fri, 04 Jul 2008 09:20:23 -0000 date-inprogress: Fri, 04 Jul 2008 09:20:23 -0000 date-closed: Sat, 09 Aug 2008 03:33:30 -0000 date-fix-committed: Fri, 04 Jul 2008 09:20:23 -0000 date-fix-released: Sat, 09 Aug 2008 03:33:30 -0000 reporter: Martin von Gagern (gagern) importance: Undecided assignee: Martin von Gagern (gagern) milestone: task: bzr-svn status: Fix Released date-created: Thu, 26 Jul 2007 11:32:47 -0000 date-confirmed: Sun, 09 Sep 2007 00:30:08 -0000 date-assigned: Sat, 04 Aug 2007 19:04:17 -0000 date-inprogress: Fri, 04 Jul 2008 09:15:21 -0000 date-closed: Sat, 09 Aug 2008 03:56:57 -0000 date-fix-committed: Fri, 04 Jul 2008 09:15:21 -0000 date-fix-released: Sat, 09 Aug 2008 03:56:57 -0000 reporter: Wouter van Heyst (larstiq) importance: Medium assignee: Jelmer Vernooij (jelmer) milestone: 0.4.11 task: subversion status: Invalid date-created: Thu, 24 Jan 2008 13:49:41 -0000 date-left-new: Fri, 04 Jul 2008 09:21:30 -0000 date-closed: Fri, 04 Jul 2008 09:21:30 -0000 reporter: Jelmer Vernooij (jelmer) importance: Undecided assignee: milestone: task: bzr-svn (Ubuntu) status: Fix Released date-created: Fri, 04 Jul 2008 10:27:02 -0000 date-left-new: Fri, 04 Jul 2008 10:27:23 -0000 date-confirmed: Fri, 04 Jul 2008 10:27:23 -0000 date-triaged: Sat, 30 Aug 2008 11:16:15 -0000 date-inprogress: Sat, 30 Aug 2008 11:16:15 -0000 date-closed: Sat, 30 Aug 2008 11:16:15 -0000 date-fix-committed: Sat, 30 Aug 2008 11:16:15 -0000 date-fix-released: Sat, 30 Aug 2008 11:16:15 -0000 reporter: Rolf Leggewie (r0lf) importance: Undecided component: universe assignee: milestone: Content-Type: multipart/mixed; boundary="===============0778623047030742221==" MIME-Version: 1.0 --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On feisty: decoy%~/work/kmx/trunk> bzr st version of bzr-svn is experimental; output may change between revisions bzr: ERROR: libsvn._core.SubversionException: ("Can't convert string from n= ative encoding to 'UTF-8':", 22) Traceback (most recent call last): File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 729, in run_= bzr_catch_errors return run_bzr(argv) File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 691, in run_= bzr ret =3D run(*run_argv) File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 389, in run_= argv_aliases return self.run(**all_cmd_args) File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 701, in igno= re_pipe result =3D func(*args, **kwargs) File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 183, in run tree, file_list =3D tree_files(file_list) File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 70, in tree_= files return internal_tree_files(file_list, default_branch) File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 94, in inter= nal_tree_files return WorkingTree.open_containing(default_branch)[0], file_list File "/home/wouter/src/bzr/bzr.dev/bzrlib/workingtree.py", line 340, in o= pen_containing return control.open_workingtree(), relpath File "/home/wouter/.bazaar/plugins/svn/workingtree.py", line 728, in open= _workingtree return SvnWorkingTree(self, self.local_path, self.open_branch()) File "/home/wouter/.bazaar/plugins/svn/workingtree.py", line 80, in __ini= t__ status =3D svn.wc.revision_status(self.basedir, None, True, None, None) File "/var/lib/python-support/python2.5/libsvn/wc.py", line 1577, in svn_= wc_revision_status return apply(_wc.svn_wc_revision_status, args) SubversionException: ("Can't convert string from native encoding to 'UTF-8'= :", 22) bzr 0.19.0dev0 on python 2.5.1.final.0 (linux2) arguments: ['/home/wouter/bin/bzr', 'st'] --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Thu, 26 Jul 2007 16:58:04 -0000 Message-Id: <1185469084.2151.1.camel@ganieda.vernstok.nl> Can you perhaps provide a test case that demonstrates this bug? I at least can't reproduce it with a simple checkout. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wouter van Heyst (larstiq) Date: Thu, 26 Jul 2007 21:30:57 -0000 Message-Id: <20070726213057.20456.88565.malone@gangotri.ubuntu.com> That repo fails on multiple machines that handle others fine, I'll see if I can narrow it down. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 09 Sep 2007 00:29:18 -0000 Message-Id: <1189297758.8174.23.camel@ganieda.vernstok.nl> Ok, I can reproduce this now. summary "Unable to open native working tree with non-ascii filenames" status triaged importance medium Thanks for the bug report. --=20 Jelmer Vernooij - http://samba.org/~jelmer/ Jabber: jelmer@jabber.fsfe.org --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Fri, 18 Jan 2008 04:09:37 -0000 Message-Id: <20080118040937.30665.88039.malone@gangotri.ubuntu.com> Looks like svn.wc.revision_status() should be avoided because it can't deal with non-ASCII stuff. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Thu, 24 Jan 2008 15:02:43 -0000 Message-Id: <20080124150244.14198.21820.malone@potassium.ubuntu.com> According to a thread on subversion-dev, this is not a bug but rather a problem caused by not being able to convert from the file system encoding to utf-8. I guess this means the only thing bzr-svn can do is catch the exception and print a clear error. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wesley J. Landaker (wjl) Date: Sun, 27 Jan 2008 21:03:49 -0000 Message-Id: <20080127210349.1643.7624.malone@gangotri.ubuntu.com> I think you are referring to a thread that I actually started. The conclusion that it's an encoding conversion issue is NOT true (I haven't gotten back to that thread yet to comment, due to being ill for a few days). The locale is UTF-8, and the filename is valid in UTF-8, and it's trying to convert that to UTF-8. It doesn't fail when libsvn does it, it only fails in the python bindings. But I do believe the conclusion is correct that this is not a bzr-svn bug, but a bug somewhere in the svn python bindings. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Mon, 28 Jan 2008 03:36:00 -0000 Message-Id: <20080128033600.7160.1389.malone@gangotri.ubuntu.com> Yes, that's the thread I'm referring to. I don't think this is specific to the Python bindings though, as the could be reproduced with svnversion which doesn't use Python at all. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wesley J. Landaker (wjl) Date: Mon, 28 Jan 2008 04:22:12 -0000 Message-Id: <20080128042212.16215.23603.malone@gandwana.ubuntu.com> Not with my reproduction recipe. The person who claimed that it could be reproduced with svnversion showed it failing on a UTF-8 name using a non-UTF-8 locale, which is supposed to fail. If you are using a UTF-8 locale, you should be able to validate this yourself: it will work with svnversion, but fail with python svn. If you use my tarball with a non UTF-8 locale, it will always fail, as tar does not transcode filenames, and the tarball I made contains UTF-8 names. (This was the original demonstration I noted on the list. The counter example someone else posted I believe is flawed.) Anyway, I'll take the rest of this discussion to the SVN mailing list, as at this point I am sure that this is not a bzr-svn bug. I don't know if it's useful to continue parallel discussion on this linked bug (but we can if someone else thinks it is helpful -- I just want to help get this fixed one way or another! =3D). --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sat, 22 Mar 2008 01:21:23 -0000 Message-Id: <20080322012123.1230.83912.malone@gandwana.canonical.com> I noticed it seems like this is fixable by setting the locale appropriately from python: import locale locale.setlocale(locale.LC_ALL, "en_US.UTF-8") except: - this is valid for the complete process, so I can't just use this in bzr-= svn - I don't know what the file system encoding is --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 08 Jun 2008 12:27:15 -0000 Message-Id: <20080608122716.30282.45087.malone@potassium.ubuntu.com> OK, I hit this as well. Did some heavy debugging. First the backtrace of where I am, mixed Python and C, the latter re-ordered to match most recent call last order. bzr: ERROR: svn.core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22) Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in = run_bzr_catch_errors return run_bzr(argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in = run_bzr ret =3D run(*run_argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in = run_argv_aliases return self.run(**all_cmd_args) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in = ignore_pipe result =3D func(*args, **kwargs) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 173, in = run tree, file_list =3D tree_files(file_list) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 64, in t= ree_files return internal_tree_files(file_list, default_branch) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 88, in i= nternal_tree_files return WorkingTree.open_containing(default_branch)[0], file_list File "/usr/lib/python2.5/site-packages/bzrlib/workingtree.py", line 325, = in open_containing return control.open_workingtree(), relpath File "~/.bazaar/plugins/svn/workingtree.py", line 743, in open_workingtree return SvnWorkingTree(self, self.local_path, self.open_branch()) File "~/.bazaar/plugins/svn/workingtree.py", line 88, in __init__ status =3D svn.wc.revision_status(self.basedir, None, True, None, None) File "/usr/lib/svn-python/libsvn/wc.py", line 2310, in svn_wc_revision_st= atus SubversionException: ("Can't convert string from native encoding to 'UTF-8'= :", 22) #11 0xb7725c76 in _wrap_svn_wc_revision_status () from /usr/lib/python2.5/site-packages/libsvn/_wc.so #10 0xb79083f3 in svn_wc_revision_status (result_p=3D0xbfe173a0,=20 wc_path=3D0xa10e184 "/home/mvg/src/java/ornament", trail_url=3D0x0,=20 committed=3D1, cancel_func=3D0xb7ade7fb ,=20 cancel_baton=3D0x4e5168c0, pool=3D0xa27b108) at subversion/libsvn_wc/revision_status.c:123 #9 0xb7acc313 in close_edit (edit_baton=3D0xa30ad18, pool=3D0xa27b108) at subversion/libsvn_delta/cancel.c:334 #8 0xb790ba4b in close_edit (edit_baton=3D0xa30a6f0, pool=3D0xa27b108) at subversion/libsvn_wc/status.c:2033 #7 0xb79092e2 in get_dir_status (eb=3D0xa30a6f0, parent_entry=3D0x0,=20 adm_access=3D0xa27b1d8, entry=3D0x0, ignore_patterns=3D0xa30a7a0,=20 depth=3Dsvn_depth_infinity, get_all=3D1, no_ignore=3D0, skip_this_dir= =3D0,=20 status_func=3D0xb79080d0 , status_baton=3D0xbfe17334,=20 cancel_func=3D0xb7ade7fb , cancel_baton=3D0x4e= 5168c0,=20 pool=3D0xa27b108) at subversion/libsvn_wc/status.c:828 #6 0xb7976164 in svn_io_get_dirents2 (dirents=3D0xbfe17168,=20 path=3D0xa27b200 "/home/mvg/src/java/ornament", pool=3D0xa2870a0) at subversion/libsvn_subr/io.c:1976 #5 0xb798263b in svn_path_cstring_to_utf8 (path_utf8=3D0xbfe17048,=20 path_apr=3D0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",=20 pool=3D0xa2870a0) at subversion/libsvn_subr/path.c:1387 #4 0xb798f611 in svn_utf_cstring_to_utf8 (dest=3D0xbfe17048,=20 src=3D0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png", pool=3D0x= a2870a0) at subversion/libsvn_subr/utf.c:752 #3 0xb798f536 in convert_cstring (dest=3D0xbfe17048,=20 src=3D0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",=20 node=3D0xa287468, pool=3D0xa2870a0) at subversion/libsvn_subr/utf.c:729 #2 0xb798ed36 in convert_to_stringbuf (node=3D0xa287468,=20 src_data=3D0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",=20 src_length=3D36, dest=3D0xbfe16f94, pool=3D0xa2870a0) at subversion/libsvn_subr/utf.c:493 #1 0x41037672 in apr_xlate_conv_buffer () from /usr/lib/libaprutil-1.so.0 #0 iconv (cd=3D, inbuf=3D, inbyt= esleft=3DCould not find the frame base for "iconv". ) at iconv.c:36 Although the value of cd is optimized out in my lib, it can be inferred from node at frame #2 if you know the memory layout of these structures. According to utf.c from subversion, node->handle has type apr_xlate_t. This type is incomplete in apr_xlate.h from package apr-utils. Its specification is given in xlate.c. It has four pointers followed by an iconv_t member. According to iconv.h, iconv_t is a void*, but it's converted to an __gconv_t* in iconv.c, which is part of glibc. So I can use these steps to look at what iconv is actually doing: (gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*))) $15 =3D {__nsteps =3D 2, __steps =3D 0xa256f80, __data =3D 0xa268d38} (gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*))).__steps[0] $16 =3D {__shlib_handle =3D 0x0, __modname =3D 0x0, __counter =3D 1,=20 __from_name =3D 0xb7f06be3 "ANSI_X3.4-1968//",=20 __to_name =3D 0x42acde2f "INTERNAL",=20 __fct =3D 0x429d5db0 <__gconv_transform_ascii_internal>,=20 __btowc_fct =3D 0x429d47b0 <__gconv_btwoc_ascii>, __init_fct =3D 0,=20 __end_fct =3D 0, __min_needed_from =3D 4, __max_needed_from =3D 4,=20 __min_needed_to =3D 1, __max_needed_to =3D 1, __stateful =3D 0, __data = =3D 0x0} (gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*))).__steps[1] $17 =3D {__shlib_handle =3D 0x0, __modname =3D 0x0, __counter =3D 1,=20 __from_name =3D 0x42acde2f "INTERNAL",=20 __to_name =3D 0xb7f07cc7 "ISO-10646/UTF8/",=20 __fct =3D 0x429d91e0 <__gconv_transform_internal_utf8>, __btowc_fct =3D 0= ,=20 __init_fct =3D 0, __end_fct =3D 0, __min_needed_from =3D 4, __max_needed_= from =3D 4,=20 __min_needed_to =3D 1, __max_needed_to =3D 6, __stateful =3D 0, __data = =3D 0x0} Accoding to gconv_builtin.h, "ASCII" is an alias for "ANSI_X3.4-1968//", and the function name __gconv_transform_ascii_internal confirms this. So iconv tries to convert from ASCII to UTF-8, which is bound to fail. As my system uses UTF-8, I wonder what made it think to use ASCII as the origin charset in the first place. Needs some more debugging, but this comment here is long enough as it is, so I'll be back later. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 08 Jun 2008 13:21:39 -0000 Message-Id: <20080608132139.30782.46187.malone@potassium.ubuntu.com> The source locale in APR is specified using apr_os_locale_encoding which in turn uses nl_langinfo which in turn bases its result on the locale currently enabled for the application. Two important breakpoints are "setlocale" and "apr_os_locale_encoding". With them in place from the beginnin I get this sequence: setlocale(LC_CTYPE, NULL) // query only setlocale(LC_CTYPE, "") // set according to environment setlocale(LC_CTYPE, "C") // set to US-ASCII default setlocale(LC_CTYPE, NULL) setlocale(LC_CTYPE, "") setlocale(LC_CTYPE, "C") apr_os_locale_encoding // repeated So it looks like python would not set any locale settings by default in ord= er to provide maximum compatibility for non-locale-aware applications. bzr = itself seems to have no call to setlocale. I would suggest this line somewh= ere in the initialization code of bzr: locale.setlocale(locale.LC_ALL, '') So this is neither a bug in subversion nor a bug in bzr-subversion but rather a bug in bzr itself, imo. I'll associate a branch containing a suggested fix with this bug here. It's against bzr.dev, so it won't work directly with bzr-svn, but I'd hope the bzr people merge it to bzr 1.5 as well. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wesley J. Landaker (wjl) Date: Sun, 08 Jun 2008 14:19:26 -0000 Message-Id: <20080608141926.18137.18756.malone@gandwana.canonical.com> I can confirm that patching this into bzr fixes the problems I personally was having in relation to this bug. import locale locale.setlocale(locale.LC_ALL, '') --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 08 Jun 2008 14:41:33 -0000 Message-Id: <20080608144133.18137.96248.malone@gandwana.canonical.com> I found out that bzr contains quite a bit of code dedicated to locale issues. There is even a test suite, blackbox.test_locale, dealing with this. However, bzr seems to do things in a pythonish way, using tools provided by the python library. And those functions seem to be designed to touch the current locale setting as little as possible, but rather relies on corresponding environment settings. I can think of three possible approaches: 1. Have bzr-svn wrap all its access to libsvn in a function temporarily set= ting the environment according to environment settings. + No modifications to bzr at large - Might need to be replicated for other applications as well - If another application which imports bzr called setlocale, that setting g= ets ignored 2. Have bzr call setlocale on a global level if it is run as an application= . Leave everything else alone. That's the current approach of my setlocale = branch. + Consistent locale-aware behaviour for all plugins + Locale-specific formatting fo dates and so on - I guess the python functions still use environment settings over this loc= ale, so behaviour remains inconsistent if imported in an application that d= id setlocale but did not modify the environment 3. Try to make Python-specific functions honour locale as set by setlocale.= I haven't tried any of this, but I guess this would require determining lo= cale settings and adjusting the environment accordingly. + Consistent behaviour even when imported in another application - Maybe less locale support when run from within a non-locale aware applica= tion - Modifying the environment when run inside an application might be a bad i= dea Any input from people with more knowledge about Python and locales highy appreciated. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wesley J. Landaker (wjl) Date: Sun, 08 Jun 2008 14:44:29 -0000 Message-Id: <20080608144429.16359.62493.malone@gangotri.canonical.com> Well, almost. When adding setlocale to bzr, it does fix a lot cases, but it looks like it then exposes [an]other problem[s] in bzr-svn. For example: # before adding setlocale $ bzr info bzr: ERROR: libsvn._core.SubversionException: ("Can't convert string from n= ative encoding to 'UTF-8':", 22) Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in = run_bzr_catch_errors return run_bzr(argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in = run_bzr ret =3D run(*run_argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in = run_argv_aliases return self.run(**all_cmd_args) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in = ignore_pipe result =3D func(*args, **kwargs) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1130, in= run verbose=3Dnoise_level, outfile=3Dself.outf) File "/usr/lib/python2.5/site-packages/bzrlib/info.py", line 315, in show= _bzrdir_info recommend_upgrade=3DFalse) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 743, in ope= n_workingtree return SvnWorkingTree(self, self.local_path, self.open_branch()) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 88, in __in= it__ status =3D svn.wc.revision_status(self.basedir, None, True, None, None) File "/var/lib/python-support/python2.5/libsvn/wc.py", line 1577, in svn_= wc_revision_status return apply(_wc.svn_wc_revision_status, args) SubversionException: ("Can't convert string from native encoding to 'UTF-8'= :", 22) bzr 1.5 on python 2.5.2 (linux2) arguments: ['/usr/bin/bzr', 'info'] encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8' plugins: bisect /home/wjlanda/.bazaar/plugins/bisect [1.1.0pre0] bzrtools /usr/lib/python2.5/site-packages/bzrlib/plugins/bzrt= ools [1.5.0] cvsps /home/wjlanda/.bazaar/plugins/cvsps [unknown] gtk /home/wjlanda/.bazaar/plugins/gtk [0.95.0dev1] launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/laun= chpad [unknown] loom /home/wjlanda/.bazaar/plugins/loom [1.4.0dev0] rebase /home/wjlanda/.bazaar/plugins/rebase [0.4.0dev0] stats /home/wjlanda/.bazaar/plugins/stats [unknown] svn /home/wjlanda/.bazaar/plugins/svn [0.4.11dev0] *** Bazaar has encountered an internal error. Please report a bug at https://bugs.launchpad.net/bzr/+filebug including this traceback, and a description of what you were doing when the error occurred. # after adding setlocale $ bzr info bzr: ERROR: exceptions.KeyError: 'doc/I\xc2\xb2C' Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in = run_bzr_catch_errors return run_bzr(argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in = run_bzr ret =3D run(*run_argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in = run_argv_aliases return self.run(**all_cmd_args) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in = ignore_pipe result =3D func(*args, **kwargs) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1130, in= run verbose=3Dnoise_level, outfile=3Dself.outf) File "/usr/lib/python2.5/site-packages/bzrlib/info.py", line 315, in show= _bzrdir_info recommend_upgrade=3DFalse) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 743, in ope= n_workingtree return SvnWorkingTree(self, self.local_path, self.open_branch()) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 90, in __in= it__ self.base_tree =3D SvnBasisTree(self) File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 276, in __init__ workingtree.branch.mapping) File "/home/wjlanda/.bazaar/plugins/svn/repository.py", line 289, in get_= fileid_map return self.fileid_map.get_map(self.uuid, revnum, path, mapping) File "/home/wjlanda/.bazaar/plugins/svn/fileids.py", line 236, in get_map map =3D self.load(revid) File "/home/wjlanda/.bazaar/plugins/svn/fileids.py", line 214, in load assert isinstance(map[urllib.unquote(filename)][0], str) KeyError: 'doc/I\xc2\xb2C' bzr 1.5 on python 2.5.2 (linux2) arguments: ['/usr/bin/bzr', 'info'] encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8' plugins: bisect /home/wjlanda/.bazaar/plugins/bisect [1.1.0pre0] bzrtools /usr/lib/python2.5/site-packages/bzrlib/plugins/bzrt= ools [1.5.0] cvsps /home/wjlanda/.bazaar/plugins/cvsps [unknown] gtk /home/wjlanda/.bazaar/plugins/gtk [0.95.0dev1] launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/laun= chpad [unknown] loom /home/wjlanda/.bazaar/plugins/loom [1.4.0dev0] rebase /home/wjlanda/.bazaar/plugins/rebase [0.4.0dev0] stats /home/wjlanda/.bazaar/plugins/stats [unknown] svn /home/wjlanda/.bazaar/plugins/svn [0.4.11dev0] *** Bazaar has encountered an internal error. Please report a bug at https://bugs.launchpad.net/bzr/+filebug including this traceback, and a description of what you were doing when the error occurred. But anyway, it does fix a lot of other cases, and in general seems like a step in the right direction. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 08 Jun 2008 15:38:55 -0000 Message-Id: <20080608153855.16476.62284.malone@gangotri.canonical.com> OK, with the modifications in the ~gagern/bzr-svn/bug128496 branch in addition to the setlocale in bzr I got bzr info working. I am not sure, however, that this is how things should work, and I guess there might be other places needing such adjustments as well. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Wesley J. Landaker (wjl) Date: Sun, 08 Jun 2008 16:58:37 -0000 Message-Id: <20080608165837.30282.38251.malone@potassium.ubuntu.com> With the latest bug128496 branch merged, and with setlocale in place, I now get an even different error. I hope this is helpful: $ bzr info bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode chara= cter u'\xb2' in position 34: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in = run_bzr_catch_errors return run_bzr(argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in = run_bzr ret =3D run(*run_argv) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in = run_argv_aliases return self.run(**all_cmd_args) File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in = ignore_pipe result =3D func(*args, **kwargs) File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1130, in= run verbose=3Dnoise_level, outfile=3Dself.outf) File "/usr/lib/python2.5/site-packages/bzrlib/info.py", line 315, in show= _bzrdir_info recommend_upgrade=3DFalse) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 743, in ope= n_workingtree return SvnWorkingTree(self, self.local_path, self.open_branch()) File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 90, in __in= it__ self.base_tree =3D SvnBasisTree(self) File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 348, in __init__ add_dir_to_inv(u"", wc, None) File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 338, in add_dir_to= _inv add_dir_to_inv(subrelpath, subwc, id) File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 336, in add_dir_to= _inv False, 0, None) File "/var/lib/python-support/python2.5/libsvn/wc.py", line 58, in svn_wc= _adm_open3 return apply(_wc.svn_wc_adm_open3, args) UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in positio= n 34: ordinal not in range(128) bzr 1.5 on python 2.5.2 (linux2) arguments: ['/usr/bin/bzr', 'info'] encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8' plugins: bisect /home/wjlanda/.bazaar/plugins/bisect [1.1.0pre0] bzrtools /usr/lib/python2.5/site-packages/bzrlib/plugins/bzrt= ools [1.5.0] cvsps /home/wjlanda/.bazaar/plugins/cvsps [unknown] gtk /home/wjlanda/.bazaar/plugins/gtk [0.95.0dev1] launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/laun= chpad [unknown] loom /home/wjlanda/.bazaar/plugins/loom [1.4.0dev0] rebase /home/wjlanda/.bazaar/plugins/rebase [0.4.0dev0] stats /home/wjlanda/.bazaar/plugins/stats [unknown] svn /home/wjlanda/.bazaar/plugins/svn [0.4.11dev0] *** Bazaar has encountered an internal error. Please report a bug at https://bugs.launchpad.net/bzr/+filebug including this traceback, and a description of what you were doing when the error occurred. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 08 Jun 2008 17:54:44 -0000 Message-Id: <20080608175444.16359.15864.malone@gangotri.canonical.com> Now I also added tests to my branch, and got problems in japan and russia, = the only non-latin locales available on my system. It seems that osutils.format_date doesn't return a unicode string, but rath= er a byte sequence. Fixed one instance in log.LongLogFormatter.log_revision but there might be = others affected as well. osutils.format_date(...).decode() didn't work as I would have expected. osutils.format_date(...).decode("utf-8") did work, but I'm not sure how it = would do in a non-latin1 non-utf8 locale, like some of the pre-unicode asia= n locales. Someone care to check? Some possible solutions: 1. Make osutils.format_date use POSIX locale. This is current behaviour in = bzr.dev but I suppose it's bad for users who don't instantly understanding = english weekday names. Would reduce the worth of the whole setlocale attemp= t. 2. Check every occurrence of osutils.format_date and figure out whether the= caller needs a unicode string, UTF-8 string or a string in current default= or terminal encoding. Probably it would be best if there was a test case f= or each of these invocations, so one would have to figure out command line = commands that use them. I'll need some developer opinion on this before I put any more work into something that might already be doomed. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Tue, 10 Jun 2008 18:39:28 -0000 Message-Id: <20080610183928.16824.21860.malone@gandwana.canonical.com> I investigated the automatic conversion between byte and unicode strings, and found a really interesting thread on the bazaar mailing list called "About encoding issues": http://thread.gmane.org/gmane.comp.version-control.bazaar- ng.general/10908 There is a function called sys.setdefaultencoding to set the encoding used for such implicit transformations. Unfortunately it usually gets removed by site.py, and should only be called before, so it's a bit tricky to use and it will affect all modules. Writing your own character encoding, it is possible to trace automatic conversions without throwing an exception each time one happens. I realized that idea as a proof of concept in my branch https://code.launchpad.net/~gagern/bzr/str-unicode but I hope for the bzr development community to extend this further, as I can't possibly investigate all automatic character conversions in bazaar all by myself. Specifying a regular expression using the STR_UNICODE environment variable, the output can be restricted to bzr-svn, but even there the number of automatic conversions is astronomical, and some more efficient log format is required. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 03:48:38 -0000 Message-Id: <20080629034838.19244.92544.malone@potassium.ubuntu.com> Now that we control the bindings, it may actually be possible to fix this there. I wonder what the most appropriate solution here is. Wrapping all bzr-svn calls by setlocale() may be the best solution here. Unless there's some other way to tell apr the encoding of everything it receives is UTF-8... --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 29 Jun 2008 10:43:30 -0000 Message-Id: <20080629104330.14161.59682.malone@gandwana.canonical.com> Jelmer Vernooij wrote: > Wrapping all bzr-svn calls by setlocale() may be the best solution here. Ugly, as setlocale is not guaranteed to be thread-safe, and can be quite inefficient as well. Quoting from http://docs.python.org/lib/node745.html "It is generally a bad idea to call setlocale() in some library routine, si= nce as a side effect it affects the entire program. Saving and restoring it= is almost as bad: it is expensive and affects other threads that happen to= run before the settings have been restored." > Unless there's some other way to tell apr the encoding of everything it > receives is UTF-8... I can see no alternative route through the sequence of library calls. One possibility might be some interposter library that overrides one of these functions, in order to relay it to the original implementation unless the current thread requested a fixed return value for the next call. Ugly, probably difficult to get right, and I guess none too portable either. Two more options, both of which I would deem superior to those mentioned above: Leave it to bzr. I have hopes to get some setlocale support merged soon: ht= tp://bundlebuggy.aaronbentley.com/request/%3C4863D8D1.405%40gmx.net%3E I'm against ugly code in one project just to work around the deficiencies o= f code in another project, as this tends to make code quite unreadable. So = you could try to get that fix backported in the next bzr 1.5 release, if th= ere will be such a thing, and otherwise wait for 1.6. Have bzr-svn modify bzrlib itself. As you can see, the fix above contains modifications to three functions, and a bit of top level code. Maybe bzr-svn can introduce this fix into bzr versions that don't provide it themselves. I don't know enough Python to be sure, but I would have thought that it should be possible to redefine these functions. Either there is a reliable way to get the source and apply the patch, or you rely on the current implementation being appropriate for all previous implementations as well, or you have a list of different implementations to choose from for different versions of bzr. The top level code could be executed during bzr-svn initialization, under the condition that the locale is still unset (i.e. set to "C"), the bzr version is known not to contain the fix, and bzrlib was actually loaded by the bzr command line tool, to stay consistent with a fix in bzr. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 11:30:31 -0000 Message-Id: <20080629113031.4012.66327.malone@palladium.canonical.com> When wrapping, it shouldn't affect other parts of bzr, since we would change the locale before calling a svn function and then change it back afterwards. We also don't use threads. Personally, I think that's a lot clearner than monkeypatching bzr, which may have potential side-effects in other parts of the code. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 11:31:55 -0000 Message-Id: <20080629113155.2122.19066.malone@gandwana.canonical.com> I've merged ~gagern/bzr-svn/bug128496 btw, since it seemed useful no matter how we resolve the other issues. Thanks! --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 11:44:11 -0000 Message-Id: <20080629114411.2122.13489.malone@gandwana.canonical.com> Setting the locale is only relevant for the paths specified to the wc module afaik, the rest expects utf8. Subversion expects the paths there to be in the file system encoding, which it seems to expect is the same thing as the locale. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 11:52:29 -0000 Message-Id: <20080629115230.14797.99577.malone@gangotri.canonical.com> Hmm, I misunderstood the meaning of setlocale(LC_ALL, "") I think. I'll just wait for your patch to be merged upstream. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Sun, 29 Jun 2008 11:55:19 -0000 Message-Id: <20080629115519.14797.8289.malone@gangotri.canonical.com> in other words, (2) --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Sun, 29 Jun 2008 14:00:48 -0000 Message-Id: <20080629140048.28729.23111.malone@gangotri.canonical.com> > When wrapping, it shouldn't affect other parts of bzr, since we would change the locale before calling a svn function and then change it back afterwards. We also don't use threads. While bzr the command line tool might not be using threads, there might well be multithreaded client applications using bzrlib. However, those should call setlocale themselves, if they can cope with it, or bzr shouldn't use locales either. At least in the C world that's the way things would be. Maybe you can achive the setlocale wrapping with a decorator that does nothing if the locale has been set to something different than C, the Posix default value. > Hmm, I misunderstood the meaning of setlocale(LC_ALL, "") I think. I'll just wait for your patch to be merged upstream. You don't want to use LC_ALL, because you can't reliably restor that to it's old value. Only use LC_CTYPE. setlocale(LC_CTYPE, "") sets the current encoding according to environment settings. setlocale(LC_CTYPE, None) simply queries the current value. The returned value is always the old value. Together you could use this to build a decorator like this (untested): if setlocale(LC_CTYPE, None) !=3D 'C': need_setlocale =3D False else: old_locale =3D setlocale(LC_CTYPE, '') need_setlocale =3D (setlocale(LC_CTYPE, old_locale) !=3D 'C') def setlocale_wrapper(unbound): if not need_setlocale: return unbound def wrapped(*args, **kwargs): old_locale =3D setlocale(LC_CTYPE, '') try: return unbound(*args, **kwargs) finally: setlocale(LC_CTYPE, old_locale) wrapped.__doc__ =3D unbound.__doc__ wrapped.__name__ =3D unbound.__name__ return wrapped --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Jelmer Vernooij (jelmer) Date: Fri, 04 Jul 2008 08:07:20 -0000 Message-Id: <20080704080720.11784.17834.malone@potassium.ubuntu.com> Should this be considered closed now that the setlocale() fixes have made it into bzr 1.6 ? The bzr-svn changes attached to this bugreport have already made it into 0.4 --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Fri, 04 Jul 2008 08:32:01 -0000 Message-Id: <20080704083201.13599.87479.malone@gangotri.canonical.com> Yes, I guess this can be closed for bzr-svn. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Fri, 04 Jul 2008 09:20:22 -0000 Message-Id: <20080704092023.16746.76368.malone@palladium.canonical.com> Comitted in http://bazaar.launchpad.net/~bzr/bzr/trunk/revision/3518 --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin von Gagern (gagern) Date: Fri, 04 Jul 2008 09:21:29 -0000 Message-Id: <20080704092130.6279.73007.malone@potassium.ubuntu.com> This issue is better addressed using setlocale in the application, not by some hack in svn or apr libs. Therefore not to be addressed in subversion. --===============0778623047030742221== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Author: Martin Pool (mbp) Date: Tue, 23 Sep 2008 02:35:19 -0000 Message-Id: <20080923023519.22339.47712.malone@gangotri.canonical.com> Martin von Gagern posted a patch that adds the setlocale call to bzr and then fixes up various issues following on from there. That particular patch is still in bb but I'm going to mark it superseded because he submitted followon patches that address most aspects of it. --===============0778623047030742221==--