[With patch] Commit and log commands: encodings problems

Bug #5041 reported by Alexander Belchenko
4
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
John A Meinel

Bug Description

1) Bug in log commands: bzr use bzrlib.user_encoding instead of sys.stdout.encoding for encoding messages. As result on russian windows machine (user_encoding == cp1251, but console encoding and sys.stdout.encoding == cp866) russian words showed as hieroglyphs.

2) Bug in commit command: bzr not decode message from external editor to unicode. As result this message stored "as is" with ampersand coding.

Proposed patch to fix this bugs here: http://bzr.onembedding.com/bzr.win/patches/patch-bialix27.diff

Revision history for this message
Ivan Vilata i Balaguer (ivilata) wrote :

I find the following patch to be a simpler solution to the *second* problem. Maybe ``edit_commit_message()`` (and not ``cmd_commit.run()``) is the right place to do the conversion to Unicode, since the editor may leave additional tips on the encoding of the file (like Emacs' ``-*- coding: xxx -*-``). Maybe ``edit_commit_message()`` should also get the ``input_encoding`` option, I don't know.

----

--- msgeditor.py.orig 2005-12-06 13:49:06.645099016 +0100
+++ msgeditor.py 2005-12-06 13:57:15.331807344 +0100
@@ -20,8 +20,10 @@
 """Commit message editor support."""

 import os
+import codecs
 from subprocess import call

+import bzrlib
 import bzrlib.config as config
 from bzrlib.errors import BzrError

@@ -95,7 +97,7 @@
         started = False
         msg = []
         lastline, nlines = 0, 0
- for line in file(msgfilename, "r"):
+ for line in codecs.open(msgfilename, 'rt', bzrlib.user_encoding):
             stripped_line = line.strip()
             # strip empty line before the log message starts
             if not started:

Revision history for this message
Ivan Vilata i Balaguer (ivilata) wrote :

I found that merging in a patch with non-ASCII characters in its commit message causes a ``UnicodeDecodeError`` to be raised. The linked patch solves this by encoding ``infotext``, in addition to the previous changes.

http://www.selidor.net/data/bzr_msgeditor_encoding.diff

Revision history for this message
Alexander Belchenko (bialix) wrote :

Your patch seems correct. I rework my previous one and send new patch to bazaar-ng mail list.

Revision history for this message
Ivan Vilata i Balaguer (ivilata) wrote :

I was trying to write a pair of test cases for this in ``test_msgeditor.py``, but I definitely know too little of bzrlib. :(

* First case: create working dir, modify a file, commit with a UTF-8 (or something else) message, then get the log message and compare.

* Second case: create two divergent branches, create a commit in the first one with UTF-8 message, merge into the second one and see if ``UnicodeDecodeError`` is not raised when commiting.

I hope these will help to check that. Thanks!

Revision history for this message
Michael Vogt (mvo) wrote :

This seems to be the relevant thread with the patch:
http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/6425

Changed in bzr:
status: Unconfirmed → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

My encoding branch has been committed which contains the test cases, and thus should have fixed this bug

Changed in bzr:
assignee: nobody → jameinel
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.