UnicodeDecodeError during backup due to non-utf8 translation

Bug #989496 reported by Pavlo Bohmat
302
This bug affects 57 people
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Medium
Unassigned
One Hundred Papercuts
Fix Released
Undecided
Unassigned
duplicity (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Backup failed (unknown error)... deja-dup/ubuntuone

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1403, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1396, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1366, in main
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 491, in full_backup
    bytes_written = dummy_backup(tarblock_iter)
  File "/usr/bin/duplicity", line 197, in dummy_backup
    while tarblock_iter.next():
  File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 507, in next
    result = self.process(self.input_iter.next(), size)
  File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 188, in get_delta_iter
    for new_path, sig_path in collated:
  File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 281, in collate2iters
    for relem1 in riter1:
  File "/usr/lib/python2.7/dist-packages/duplicity/selection.py", line 187, in Iterate
    log.Debug(_("Selecting %s") % subpath.name)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117: invalid continuation byte

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: deja-dup 22.0-0ubuntu2
ProcVersionSignature: Ubuntu 3.2.0-24.37-generic 3.2.14
Uname: Linux 3.2.0-24-generic x86_64
ApportVersion: 2.0.1-0ubuntu6
Architecture: amd64
Date: Fri Apr 27 11:49:30 2012
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=ru_UA.UTF-8
 SHELL=/bin/bash
SourcePackage: deja-dup
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.xdg.autostart.deja.dup.monitor.desktop: [modified]
mtime.conffile..etc.xdg.autostart.deja.dup.monitor.desktop: 2012-02-14T14:17:23.600015

Related branches

Revision history for this message
Pavlo Bohmat (bohm) wrote :
Revision history for this message
Michael Terry (mterry) wrote :

Ken, this looks new to me. The path may not be utf8?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 989496] Re: UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117

Looks like subpath.name is UTF8 but can't be decoded to ASCII for printing.
 We do not handle Unicode correctly, especially for printing. In Python 2
the console is ASCII, so everything has to be decoded for ASCII.

Yes, it's a new bug.

On Fri, Apr 27, 2012 at 8:12 AM, Michael Terry
<email address hidden>wrote:

> Ken, this looks new to me. The path may not be utf8?
>
> ** Also affects: duplicity
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/989496
>
> Title:
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position
> 117
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> New
> Status in “deja-dup” package in Ubuntu:
> New
>
> Bug description:
> Backup failed (unknown error)... deja-dup/ubuntuone
>
> Traceback (most recent call last):
> File "/usr/bin/duplicity", line 1403, in <module>
> with_tempdir(main)
> File "/usr/bin/duplicity", line 1396, in with_tempdir
> fn()
> File "/usr/bin/duplicity", line 1366, in main
> full_backup(col_stats)
> File "/usr/bin/duplicity", line 491, in full_backup
> bytes_written = dummy_backup(tarblock_iter)
> File "/usr/bin/duplicity", line 197, in dummy_backup
> while tarblock_iter.next():
> File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 507,
> in next
> result = self.process(self.input_iter.next(), size)
> File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 188,
> in get_delta_iter
> for new_path, sig_path in collated:
> File "/usr/lib/python2.7/dist-packages/duplicity/diffdir.py", line 281,
> in collate2iters
> for relem1 in riter1:
> File "/usr/lib/python2.7/dist-packages/duplicity/selection.py", line
> 187, in Iterate
> log.Debug(_("Selecting %s") % subpath.name)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117:
> invalid continuation byte
>
> ProblemType: Bug
> DistroRelease: Ubuntu 12.04
> Package: deja-dup 22.0-0ubuntu2
> ProcVersionSignature: Ubuntu 3.2.0-24.37-generic 3.2.14
> Uname: Linux 3.2.0-24-generic x86_64
> ApportVersion: 2.0.1-0ubuntu6
> Architecture: amd64
> Date: Fri Apr 27 11:49:30 2012
> InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64
> (20100429)
> ProcEnviron:
> TERM=xterm
> PATH=(custom, no user)
> LANG=ru_UA.UTF-8
> SHELL=/bin/bash
> SourcePackage: deja-dup
> UpgradeStatus: No upgrade log present (probably fresh install)
> modified.conffile..etc.xdg.autostart.deja.dup.monitor.desktop: [modified]
> mtime.conffile..etc.xdg.autostart.deja.dup.monitor.desktop:
> 2012-02-14T14:17:23.600015
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/989496/+subscriptions
>

Michael Terry (mterry)
affects: deja-dup (Ubuntu) → duplicity (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote : Re: UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in duplicity (Ubuntu):
status: New → Confirmed
Revision history for this message
Vv (vivien-perez) wrote :

Hello,

I've got pretty much the same problem also on precise 64 bit, with deja-dup 22.0.

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1403, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1396, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1276, in main
    globals.archive_dir).set_values()
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 691, in set_values
    self.get_backup_chains(partials + backend_filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 814, in get_backup_chains
    map(add_to_sets, filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 808, in add_to_sets
    log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

Cheers,

Vv

Revision history for this message
Savvas Radevic (medigeek) wrote :

Dido from this topic (in greek language): http://forum.ubuntu-gr.org/viewtopic.php?f=5&t=23165

  File "/usr/lib/python2.7/dist-packages/duplicity/selection.py", line 187, in Iterate
    log.Debug(_("Selecting %s") % subpath.name)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x95 in position 45: invalid start byte

Revision history for this message
Savvas Radevic (medigeek) wrote :

The problem is with the filename encoding you have. Please check if any files or folders have weird characters that look like these: �����

This is an example in python:
>>> unicode('\x95','utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0x95 in position 0: invalid start byte

This seems to work (but haven't tested it with duplicity):
>>> unicode('\x95','utf-8', "replace")
u'\ufffd'

>>> unicode('\xd1','utf-8', 'replace')
u'\ufffd'

The developers could use either "replace" or "ignore", but I don't know which one suits the purpose best. Or even use "try ... except UnicodeDecodeError" and skip these files/folders for that matter.

Revision history for this message
Savvas Radevic (medigeek) wrote :

> The problem is with the filename encoding you have.
I mean... I think that's the problem, I'm not 100% sure. :)

Revision history for this message
Savvas Radevic (medigeek) wrote :

I have a confirmation from the greek forum (mentioned above) that it is indeed the filename.
The member of the forum that reported this has renamed the files and now the program is working as expected.

Revision history for this message
Pavlo Bohmat (bohm) wrote :

File names are ok. Exclude all the directories. Backup only configuration files and directories - rename nonsense! How to keep a log scan? Crashes and there is no way to look at what the catalog/file.

Revision history for this message
Pavlo Bohmat (bohm) wrote :

program should backup everything, does not it matter to check the encoding...

Revision history for this message
Andreas Klust (andreas.klust-deactivatedaccount) wrote :

I have the same problem, very similar error message. It started right after I upgraded from Ubuntu 11.10 to 12.04.

Revision history for this message
Vincent Laisney (vlaisney) wrote :

I have the same problem. I think it is because some of my files have arabic names (in arabic script).
I have alse noticed that if I make the backup into a local file system, everything function perfectly. I hope it can help. I use Ubuntu 12.04.

Changed in duplicity:
status: New → Confirmed
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Looks pretty silly to me: the bug happens merely because the file name is printed to the logs? Better skip printing the name, possibly with a warning, than getting a program that does not work. ;-)

Changed in hundredpapercuts:
status: New → Confirmed
Daniel Hahler (blueyed)
Changed in duplicity (Ubuntu):
status: Confirmed → Triaged
Changed in duplicity:
status: Confirmed → Triaged
Changed in duplicity (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Daniel Hahler (blueyed) wrote :

The affected places where log.Debug might get called with an invalid encoded filename could use unicode() with "replace" or "ignore".

But it is likely that it will cause another similar error further down in the processing then still, when the filename gets used again.
Duplicity should make sure to use the invalid encoded filename when backing up the file, and restore it given the original filename.

I think that the different cases reported here should get added as separate test cases to Duplicity first, so that proper fixes for them can be added then to make the tests pass.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

broderix (broderix)
affects: deja-dup (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Invalid
Changed in ubuntu:
status: New → Confirmed
broderix (broderix)
Changed in duplicity:
status: Triaged → Confirmed
Revision history for this message
Mace (xmacex) wrote :

I'm getting an decoding error too; it's similar, but different than the title of this particular bug. My error message is:

  File "/usr/lib/python2.7/dist-packages/duplicity/selection.py", line 187, in Iterate
    log.Debug(_("Selecting %s") % subpath.name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

* lsb_release -d
Description: Ubuntu quantal (development branch)

* dpkg-query -W deja-dup duplicity
deja-dup 23.2-0ubuntu1
duplicity 0.6.19-0ubuntu1

Revision history for this message
Andreas Klust (andreas.klust-deactivatedaccount) wrote :

I was able to track the problem down to one file in my home directory with a character that is not conforming to UTF-8. I believe the basic problem is that Linux file names can be made up from arbitrary bytes. Python tries to decode the binary bytes depending on the local encoding. Consequently, you need the same files AND locale configuration to reproduce the error. In my case, the filename contained a hex code F1 which is not valid UTF-8. The file is very old and is likely from a time when I used a different character encoding than UTF-8.

The general problem is described in PEP 383: http://www.python.org/dev/peps/pep-0383/ . PEP 383 also proposes a solution for Python 3 but not for Python 2. See also http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html for further insight in the general problem.

Revision history for this message
Pekka Kilponen (pkilpo) wrote :

This is what I get:
...
selection.py", line 187, in Iterate
    log.Debug(_("Selecting %s") % subpath.name)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 43: invalid continuation byte

if this is of any help

Revision history for this message
Mace (xmacex) wrote :

Andreas Klust wrote (#18)
> I was able to track the problem down to one file in my home directory with a character that is not conforming to UTF-8.

I wonder if you could share how you found out the files that Duplicity chokes on? After having my backup -system disfunctional for two months, i would love to rename some files as a workaround to the bug. Duplicity itself (well, i use Deja-Dup interface) gives just an "unknown error" and the details don't unfortunately pin it down to which character string is causing the conflict.

Cheers.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote : Re: UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117

Maybe you should run duplicity with --verbose and find what's the problematic file from the last path that was printed. The wrong file must be right after the one that triggered the bug.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Hm, let me rephrase it: the wrong file must be right after the one that was printed when the bug was triggered (since it's the last one that was successful).

Revision history for this message
Andreas Klust (andreas.klust-deactivatedaccount) wrote :

Hello Mace,

to find the file causing the trouble in my case, I added a few lines to the duplicity source code. The error message tells you the python source file and line number where the error occurs:

File "/usr/lib/python2.7/dist-packages/duplicity/selection.py", line 187, in Iterate
    log.Debug(_("Selecting %s") % subpath.name)

I added the following lines just before the call to log.debug (line 187 in this example. You will have to adapt it a little depending on the exact error message and arguments passed to log.debug.):

dbgFile = open("/tmp/debug.log", 'a')
dbgFile.write("%s\n" % subpath.name)
dbgFile.close()

The last filename written to the log file /tmp/debug.log after duplicity fails is the one causing the trouble.

Revision history for this message
Mace (xmacex) wrote :

Thanks Andreas, with this help i found out that indeed the character "ä" causes the problem. It wasn't only one conflicting file (with some weird character in it's name) but all other occurrences of "ä" cause it too. It's quite a common character in my our language and i cannot really live with out it.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

The "ä" character in itself should not be a problem. It's probably not in the right encoding, but you don't see the problem because the text editor you used to read the file (gedit?) has automatically chosen the right encoding. You can open the text in LibreOffice and select UTF-8 for encoding to make sure the characters are correct.

If that's not the case, install the convmv package and use it to convert the invalid filenames to UTF-8. You have to know the encoding the filenames are using. Testing several settings is possible without any risk for the files as long as you do not add the --notest option.

Michael Terry (mterry)
no longer affects: ubuntu
Michael Terry (mterry)
Changed in duplicity (Ubuntu):
assignee: nobody → Michael Terry (mterry)
Revision history for this message
Michael Terry (mterry) wrote :

I think there are at least two things going on here.

1) The way duplicity is currently using it, gettext will return strings in whatever codeset the translation file happens to define. Which is hard to predict. If it is returned as unicode (not utf8, but actual wide byte unicode), then we get the "'ascii' codec can't decode" message. This can be forced by using the unicode=True flag for gettext.install. And it can be fixed by using the codeset='utf8' flag instead.

2) Filenames may not be in utf8. I think this is actually a pretty rare thing, but I'm betting we don't handle it well and it leads to "'utf8' codec can't decode" messages.

Looking further. I seem to be getting situation #1 in more cases than I would naively expect (like, no translation file at all).

Michael Terry (mterry)
summary: - UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 117
+ UnicodeDecodeError during backup
Revision history for this message
Michael Terry (mterry) wrote : Re: UnicodeDecodeError during backup

And a new way.

3) Only when backing up to Ubuntu One, even filenames in legal utf8 cause "'ascii' codec can't decode". I think some import caused by the u1 backend is changing the global gettext _() function to use unicode by default.

Revision history for this message
Michael Terry (mterry) wrote :

OK, for issue #3 (which I actually think is 90% of these reported crashes), I've filed bug 1050061 against ubuntu-sso-client and have a branch:
https://code.launchpad.net/~mterry/ubuntu-sso-client/no-gettext-install/+merge/124204

For issue #1, I have a branch:
https://code.launchpad.net/~mterry/duplicity/utf8-po/+merge/124209

Issue #2 is still outstanding, but I think it's a vanishingly small issue. Still a bug, but not as high priority as these other things (and at least has a workaround the user can do something about -- fix the filename to be valid utf8.

Revision history for this message
Michael Terry (mterry) wrote :

OK, I split off bug 1050509 for the non-utf8 filesystem case (#2). I already mentioned splitting off ubuntu-sso-client bug 1050061 for the more common case here (#3).

I'll leave this bug for case #1, which is fixed in trunk now.

Changed in duplicity:
status: Confirmed → Fix Committed
summary: - UnicodeDecodeError during backup
+ UnicodeDecodeError during backup due to non-utf8 translation
Revision history for this message
Michael Terry (mterry) wrote :

This is fixed now, so marking done for hundredpapercuts.

Changed in hundredpapercuts:
status: Confirmed → Fix Released
Changed in duplicity (Ubuntu):
assignee: Michael Terry (mterry) → nobody
Revision history for this message
Roman Yepishev (rye) wrote :

It looks like the fix is incomplete.

Russian locale, UTF-8 causes basically the same error here:

log.Notice(_("Copying %s to local cache.") % fn)

fn is a unicode string while _("Copying %s to local cache.") returns a str with utf-8 contents. And the value of fn is a unicode string causing the failure.

Revision history for this message
Michael Terry (mterry) wrote :

Roman, if a filename isn't in utf8, that's bug 1050509. Can you repeat your comment there, and explain how you got unicode filenames? Your filesystem locale is utf16?

Revision history for this message
Mace (xmacex) wrote :

Ok, I have been using Deja-dup and not duplicity directly, and my Ubuntu backup routine was working before this bug appeared, and was broken. I updated yesterday, and this is now fixed for me. My backup just finished successfully, and i'm happy.

Revision history for this message
Vv (vivien-perez) wrote :

Hello,

this is still not working for me after upgrading to 12.10 and Deja-dup 24.0. I got the same error message as with the previous versions :

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1404, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1397, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1277, in main
    globals.archive_dir).set_values()
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 691, in set_values
    self.get_backup_chains(partials + backend_filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 814, in get_backup_chains
    map(add_to_sets, filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 808, in add_to_sets
    log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)

Any help would be appreciated on this.

Thanks,

Vv

Changed in hundredpapercuts:
milestone: none → raring-misc
Revision history for this message
Alejandro Benitez (benitezagm) wrote :

How about inform the offending filename in the error message until a suitable fix is written?

Changed in hundredpapercuts:
status: Fix Released → Triaged
Revision history for this message
Timothy Arceri (t-fridey) wrote :

According to comment #30 this bug is fixed. The remaining issues are being tracked in bug #1050509

Changed in hundredpapercuts:
status: Triaged → Fix Released
Changed in duplicity:
status: Fix Committed → Fix Released
Changed in duplicity (Ubuntu):
status: Triaged → Fix Released
Changed in hundredpapercuts:
milestone: raring-misc → none
Revision history for this message
Michael Terry (mterry) wrote :

For those of you using Ubuntu One and still experiencing this error, see bug 1080423. It's a bug in duplicity's Ubuntu One backend where it returns unicode instead of utf8.

Revision history for this message
Oleg "Nightwing" Lomakin (nightwing666) wrote :

I'm runnin ubuntu 12.04 with duplicity 0.6.20(but had same error with duplicity 0.6.18 and 0.6.19) and had same problem. I do backups on webdav.
duplicity --include-globbing-filelist /etc/backup-files.txt / webdavs://login:<email address hidden>/Bakeups/nightserv/
Чтение подстановочного списка файлов /etc/backup-files.txt
Локальные и удалённые метаданные синхронизированы, синхронизация не требуется.
Время последней полной резервной копии: нету
Сигнатуры не найдены, переключение на полную резервную копию.
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1403, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1396, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1371, in main
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 513, in full_backup
    col_stats.set_values(sig_chain_warning=None)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 698, in set_values
    self.get_backup_chains(partials + backend_filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 821, in get_backup_chains
    map(add_to_sets, filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 815, in add_to_sets
    log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128)
But with LANG=C it works pretty well:
LANG=C duplicity --include-globbing-filelist /etc/backup-files.txt / webdavs://login:<email address hidden>/Bakeups/nightserv/
Reading globbing filelist /etc/backup-files.txt
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1358088013.80 (Sun Jan 13 18:40:13 2013)
EndTime 1358088057.13 (Sun Jan 13 18:40:57 2013)
ElapsedTime 43.32 (43.32 seconds)
SourceFiles 3127
SourceFileSize 58626935 (55.9 MB)
NewFiles 3127
NewFileSize 58626935 (55.9 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 3127
RawDeltaSize 57059559 (54.4 MB)
TotalDestinationSizeChange 46463663 (44.3 MB)
Errors 0
-------------------------------------------------
My locales:
locale
LANG=ru_RU.UTF-8
LANGUAGE=
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
So, bug still here, at least in 0.6.20.

Revision history for this message
Michael Terry (mterry) wrote :

Oleg, you may be talking about bug 1080423.

Revision history for this message
Oleg "Nightwing" Lomakin (nightwing666) wrote : Re: [Bug 989496] Re: UnicodeDecodeError during backup due to non-utf8 translation

Michael, there is a difference, I use webdav instead ubuntu one.

Revision history for this message
az (az-debian) wrote :

i've dug into this a bit more deeply and found an explanation for the logging gotchas under certain locales:

as per the python wiki (http://web.archive.org/web/20120425192131/http://wiki.python.org/moin/UnicodeEncodeError) when you run somestring.decode(whicheverencoding) python2 does weird
encode-and-then-decode things if your somestring is already unicode.

log.py uses s.decode('utf8',ignore) - which fails to ignore errors on some locales (apparently in the down-EN-coding step before the decode...).

check out the attached test script, which contains an iso8859 string to log. if you run it (at least under python 2.6) with LC_CTYPE=anything utf8 or plain C/POSIX then it works fine. if your CTYPE is iso88591, then the first decode of x works but the second decode fails, and we get the ascii can't encode complaint.

i've just changed the debian version (0.6.20-2) to do the decode in log.py conditionally:

_logger.log(DupToLoggerLevel(verb_level), s if (isinstance(s,unicode)) else s.decode("utf8", "ignore"))

regards,
az

Revision history for this message
François Marier (fmarier) wrote :

0.6.20-2 is still affected by this problem, at least using the fr_CA.utf8 locale.

Revision history for this message
Christian Meisenbichler (chmberg) wrote :

deja-dup 24.0 is still affected by this broblem.

Revision history for this message
Alfonso de Cala (alfem) wrote :

I also get this error doing a backup to webdav

Filenames have not strange characters in names (although my LOCALE is "es_ES.UTF-8")

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1411, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1404, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1374, in main
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 521, in full_backup
    col_stats.set_values(sig_chain_warning=None)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 693, in set_values
    self.get_backup_chains(partials + backend_filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 816, in get_backup_chains
    map(add_to_sets, filename_list)
  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py", line 815, in add_to_sets
    log.Debug(_("Ignoring file (rejected by backup set) '%s'") % filename)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2: ordinal not in range(128)

Revision history for this message
Alfonso de Cala (alfem) wrote :

I changed the line (/usr/lib/python2.7/dist-packages/duplicity/collections.py):

                    log.Debug(_("Ignoring file (rejected by backup set) '%s'") % filename)

to this:
                     print "Ignoring file (rejected by backup set", filename

and backups now work perfect!

So It seems debugs is the only part affected (?)

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

I think I have found a fix.

The bug does not happen only with invalid UTF-8 filenames, you simply need UTF-8 filenames and a UTF-8 locale.

For example, in collections.py:810, there is:
                log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))

On my system, when this fails (see error below), the _() string is a str object encoded in UTF-8; filename is a unicode object. The error below happens while Python encodes filename into an ASCII str object. If the _() string is a unicode object too, no encoding into a str object happens at this stage, and everything works. This can be achieved by setting gettext up differently in __init__.py, by passing unicode=True to gettext.install(). This is the solution recommended by the author of gettext for Python:
http://www.wefearchange.org/2012/06/the-right-way-to-internationalize-your.html

This change requires a few modifications in other places so that only unicode strings are passed to the logger. I'm attaching a diff of quick and dirty changes I applied to demonstrate the idea.

Any chance to get some attention for this bug? This makes duplicity completely unusable on my system for more than a year.

This is with duplicity 0.6.21 on Fedora 19.

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1411, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1404, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1257, in main
    action = commandline.ProcessCommandLine(sys.argv[1:])
  File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 981, in ProcessCommandLine
    args = parse_cmdline_options(cmdline_list)
  File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 644, in parse_cmdline_options
    log.Info(_("Using archive dir: %s") % (globals.archive_dir.name,))
  File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 106, in Info
    Log(s, INFO, code, extra)
  File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 74, in Log
    _logger.log(DupToLoggerLevel(verb_level), s.decode("utf8", "ignore"))
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)

Revision history for this message
Michael Terry (mterry) wrote :

For those experiencing the 'ascii code can't encode character' issue, I'm curious if lp:~mterry/duplicity/encoding solves the problem for you. Please report back.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Hey, thanks for working on this! ;-)

Unfortunately, with your branch I get a crash directly on start (FWIW, I've only built duplicity in-tree, and not installed it).
$ bin/duplicity --help
Traceback (most recent call last):
  File "bin/duplicity", line 1470, in <module>
    with_tempdir(main)
  File "bin/duplicity", line 1463, in with_tempdir
    fn()
  File "bin/duplicity", line 1316, in main
    action = commandline.ProcessCommandLine(sys.argv[1:])
  File "/home/milan/Dev/duplicity/duplicity/commandline.py", line 999, in ProcessCommandLine
    args = parse_cmdline_options(cmdline_list)
  File "/home/milan/Dev/duplicity/duplicity/commandline.py", line 560, in parse_cmdline_options
    (options, args) = parser.parse_args()
  File "/usr/lib64/python2.7/optparse.py", line 1399, in parse_args
    stop = self._process_args(largs, rargs, values)
  File "/usr/lib64/python2.7/optparse.py", line 1439, in _process_args
    self._process_long_opt(rargs, values)
  File "/usr/lib64/python2.7/optparse.py", line 1514, in _process_long_opt
    option.process(opt, value, values, self)
  File "/usr/lib64/python2.7/optparse.py", line 788, in process
    self.action, self.dest, opt, value, values, parser)
  File "/home/milan/Dev/duplicity/duplicity/commandline.py", line 171, in take_action
    self, action, dest, opt, value, values, parser)
  File "/usr/lib64/python2.7/optparse.py", line 810, in take_action
    parser.print_help()
  File "/home/milan/Dev/duplicity/duplicity/commandline.py", line 198, in print_help
    file.write(self.format_help().decode('utf-8').encode(encoding, "replace"))
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1997: ordinal not in range(128)

I think you can replicate the problem by using the fr_FR.utf8 locale. with LC_ALL=C I don't get the problem at all.

Have you considered the following change? I don't know anything about gettext with Python, but this guy seems to have strong feelings about it.
http://www.wefearchange.org/2012/06/the-right-way-to-internationalize-your.html

diff -u duplicity/__init__.py /usr/lib64/python2.7/site-packages/duplicity/__init__.py
--- duplicity/__init__.py 2013-08-08 11:15:50.648693111 +0200
+++ /usr/lib64/python2.7/site-packages/duplicity/__init__.py 2013-08-07 23:20:59.716631994 +0200
@@ -20,4 +20,4 @@
 # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

 import gettext
-gettext.install('duplicity', codeset='utf8')
+gettext.install('duplicity', codeset='utf8', unicode=True, names='ngettext')

Revision history for this message
Michael Terry (mterry) wrote :

Milan, thanks for testing! I've updated the branch to fix that issue. Can you try again?

You'll see that I do fix the gettext.install() line in duplicity/__init__.py, but slightly differently than you suggest (I avoid using the names= argument, because that only appeared in python 2.5; but I achieve the same result another way).

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Great, now it works -- at least it fixes the crash I fixed with my quick patch (and most probably much better).

(Sorry for missing the relevant gettext lines in the commit.)

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Utilisation du répertoire d’archive : /home/milan/.cache/duplicity/e5f4f9b85e256f59787be25a63b7fdbf
Utilisation du nom de sauvegarde : e5f4f9b85e256f59787be25a63b7fdbf
Import of duplicity.backends.cfbackend Succeeded
Import of duplicity.backends.dpbxbackend Failed: No module named dropbox
Import of duplicity.backends.botobackend Succeeded
Import of duplicity.backends.ftpbackend Succeeded
Import of duplicity.backends.hsibackend Succeeded
Import of duplicity.backends.imapbackend Succeeded
Import of duplicity.backends.localbackend Succeeded
Import of duplicity.backends.rsyncbackend Succeeded
Import of duplicity.backends.sshbackend Succeeded
Import of duplicity.backends.tahoebackend Succeeded
Import of duplicity.backends.webdavbackend Succeeded
Import of duplicity.backends.ftpsbackend Succeeded
Import of duplicity.backends.gdocsbackend Succeeded
Import of duplicity.backends.megabackend Succeeded
Import of duplicity.backends.swiftbackend Succeeded
When doing a different backup to an external drive, I got an error which was not present before:
duplicity incremental ~/ -v Info --exclude ~/... [skipped] file:///run/media/milan/SOMETHING --allow-source-mismatch
Import of duplicity.backends.u1backend Succeeded
Main action: inc
================================================================================
duplicity $version ($reldate)
Using temporary directory /tmp/duplicity-LmaRwW-tempdir
Traceback (most recent call last):
  File "/home/milan/Dev/duplicity-encoding/bin/duplicity", line 1470, in <module>
    with_tempdir(main)
  File "/home/milan/Dev/duplicity-encoding/bin/duplicity", line 1463, in with_tempdir
    fn()
  File "/home/milan/Dev/duplicity-encoding/bin/duplicity", line 1334, in main
    log_startup_parms(log.INFO)
  File "/home/milan/Dev/duplicity-encoding/bin/duplicity", line 1223, in log_startup_parms
    log.Log(u"Args: %s" % (' '.join(sys.argv),), verbosity)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 264: ordinal not in range(128)

Cmmenting out the line was enough to fix the problem.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Woops, I wrote my remarks in the middle of the log. The previous comment should have started with:

When doing a different backup to an external drive, I got an error which was not present before:
duplicity incremental ~/ -v Info --exclude ~/... [skipped] file:///run/media/milan/SOMETHING --allow-source-mismatch

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

...and when passing a path with spaces, like "file:///run/media/milan/TOSHIBA\ EXT/", I get:
Command line error : Expected 2 args, got 3

Looks like the path is split into two arguments.

Changed in duplicity:
milestone: none → 0.6.23
Changed in duplicity:
importance: Undecided → Medium
Revision history for this message
Pavlo Bohmat (bohm) wrote :

deja-dup_30.0-0ubuntu4 (trusty)
duplicity_0.6.23-1ubuntu2 (trusty)

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1493, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1487, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1336, in main
    do_backup(action)
  File "/usr/bin/duplicity", line 1457, in do_backup
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 564, in full_backup
    print_statistics(diffdir.stats, bytes_written)
  File "/usr/bin/duplicity", line 594, in print_statistics
    print diffdir.stats.get_stats_logstring(_("Backup Statistics"))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-25: ordinal not in range(128)

Revision history for this message
Michael Terry (mterry) wrote :

That more recent error (the one with "Backup Statistics" in its crash report) is bug 1286845.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.