Backun unfinished due to error in date.decode

Bug #1419694 reported by Andreas Hohenegger
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Back In Time
Fix Released
High
Germar

Bug Description

I am using backintim v1.1.2 under kubuntu 14.04. I run it with kdesudo which was successful with older versions. Now the backup does not complete. Here is the relevant output of "kdesudo backintime-qt4".

INFO: Save config file
INFO: Command "cp /root/.config/backintime/config /root/.local/share/backintime/mnt/1_6067/backintime/einstein/root/1/new_snapshot/backup/.." returns 0
INFO: Save permissions
Traceback (most recent call last):
  File "/usr/share/backintime/common/backintime.py", line 437, in <module>
    start_app()
  File "/usr/share/backintime/common/backintime.py", line 192, in start_app
    ret = take_snapshot( cfg, True )
  File "/usr/share/backintime/common/backintime.py", line 53, in take_snapshot
    ret = snapshots.Snapshots( cfg ).take_snapshot( force )
  File "/usr/share/backintime/common/snapshots.py", line 918, in take_snapshot
    ret_val, ret_error = self._take_snapshot( snapshot_id, now, include_folders )
  File "/usr/share/backintime/common/snapshots.py", line 1296, in _take_snapshot
    output = find.communicate()[0]
  File "/usr/lib/python3.4/subprocess.py", line 949, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.4/subprocess.py", line 1628, in _communicate
    self.stdout.encoding)
  File "/usr/lib/python3.4/subprocess.py", line 877, in _translate_newlines
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 22496141: invalid continuation byte
INFO: [qt4systrayicon] begin loop
INFO: [qt4systrayicon] end loop
mountpoint: /root/.local/share/backintime/mnt/B623A9EF/mountpoint: No such file or directory

Back In Time
Version: 1.1.2

A quick google search reveals that this might be related to the locale setting.

kdesudo locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=

Revision history for this message
Germar (germar) wrote :

First of all with 1.1.2 I switched from kdesudo to PolicyKit/pkexec to become root because kdesudo/gksudo is deprecated and shouldn't be used anymore. So please use 'pkexec backintime-qt4'.

I wasn't able to reproduce this here but I don't think this is related to your locale. At least not the locale on your local machine. It might be related to the locale on your remote host. But I think it's because of a special character in one filename. The command that was processed in there is something like (all in one line):

ssh -p 22 -o ServerAliveInterval=240 USER@REMOTEHOST find path/to/remote/backintime/einstein/root/1/new_snapshot/backup/ -name \* -print

The byte '0xe4' stated in the exception indicate that it is a Chinese/Japanese/Korean Character. Do you have (new) files with CJK Characters? If so, could you please post the filename so I could try to fix this?

Revision history for this message
Andreas Hohenegger (hohenegger) wrote :

Thanks. Yes, it is likely not related to the locale on the local machine. The remote host uses Busybox and I need to figure it out. I had meanwhile tried to "export LC_ALL="en_US.UTF-8"" and got the same result. I will try running with pkexec. It is possible that there is a bad character somewhere. How could I find it?

Revision history for this message
Germar (germar) wrote :

Was this your first snapshot with BIT version > 1.0.40? If not, the file with bad char must be new and you should easily find it in last snapshot log

Otherwise you could search for unusual chars with:

find ./ ! -regex ".*/[a-zA-Z0-9\.,:=_\!~+ \(\)\-]*"

Revision history for this message
Andreas Hohenegger (hohenegger) wrote :

It was my first snapshot with a newer version. Since the snapshot has been bigger I could unfortunately not identify a specific file which caused problems, but I fund that there were several ones which had a broken/wrong filename encoding. I identified these with

grep-invalid-utf8 () {
  perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8

and converted them to utf-8 with convmv -f iso-8859-1 -t utf8 or changed the name manually when the encoding was broken completely. Then it worked. Perhaps the conclusion is that backintime should output some information about which files caused the error.

Revision history for this message
toroettg (toroettg) wrote :

Hi,

it seems that I am also affected by this bug. With Andreas' Perl regex, I could determine the files with non-utf8-encoded names. Thank you! It looks like previous versions of backintime kinda skipped those files, since they weren't new.

Kind regards
Tobias

Revision history for this message
Germar (germar) wrote :

Hi,

the Perl regex seems to be from this site https://gist.github.com/ThomasG77/5971236
There is a command to rename the files automatically, too.

This bug is caused by the switch from Python 2.x to 3.x and the use of byte type instead of string type. Previous BIT version and also current version still backup those files. Only current version (>1.1.0) in combination with mode SSH will fail with above exception during backing up file permissions.

Could you please post some of those files (just 0 byte files with the problematic filename) so I can test and fix this?

Kind regards,
Germar

Changed in backintime:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Germar (germar)
milestone: none → 1.1.4
Revision history for this message
toroettg (toroettg) wrote :

Hi,

I already renamed the problematic files, but I could reproduce the error by renaming a newly created file with a tool called 'convmv' [1] (quick Google hit). My guess is, that in my case, the corrupt files were originally created on a Windows system and that their filenames contained German umlauts. Thus, I executed the following command to create/rename such a corrupt filename on my system: 'convmv -f utf-8 -t iso-8859-1 Glücklich.testfile --notest'. I've also attached an archive containing the test files for convenience. Unfortunately, I'm currently short of time and can't look into the code myself to suggest a fix. So, thank you for your efforts!

Kind regards
Tobias

[1] https://www.j3e.de/linux/convmv/man/

Revision history for this message
Germar (germar) wrote :

Thanks Tobias for the test files.

Yes, I'd now guess the same. My above assumption about CJK chars was incorrect. '0xe4' is lower case ä. I can remember some years ago that I was converting couple hundred files that were created through Samba, too. And I changed 'unix charset' and 'display charset' from ISO-8859-15 to UTF-8 in smb.conf global section which fixed this for the future.

Kind regards,
Germar

Revision history for this message
Andreas Hohenegger (hohenegger) wrote :

Great!
After successfully creating two backups to an external hard-disk connected locally on an USB-port I tried now to create the next snapshot on the same hard-disk connected to a remote server. In both cases the connection is through ssh (to localhost in the first case). Only the remote server runs a different system and different ssh server. Now I get the same error as before during "Saving permissions". Locally no files with wrong encoding are found anymore. Remotely at least the new file names are printed properly. Is it possible that this error is now caused by files from previous snapshots with older versions of backintime and broken encoding?

Revision history for this message
Germar (germar) wrote :

'Save permissions' only scan the files in new snapshot. Previous snapshots doesn't matter. I'm not sure why it was working through localhost but not on remote. But I'm working on a fix for this which will handle the path as bytes instead of string so it doesn't need to be encoded any more. Just give me some more days and I'll have a patch for you.

Changed in backintime:
status: Triaged → In Progress
Revision history for this message
Germar (germar) wrote :

Hi,
please try out attached patch with 'cd /usr/share/backintime; sudo patch -p0 < /path/to/broken_charset.patch'

Regards,
Germar

Changed in backintime:
status: In Progress → Fix Committed
Revision history for this message
Andreas Hohenegger (hohenegger) wrote :

The patch mostly solved my problem. I can now create snapshots to the remote server. Thanks! (I did not test yet if a new file with broken encoding will stay in the snapshot) . I get a new error which I have not seen before (but it may be unrelated to your patch because the server system is relatively new). That is

INFO: Keep min free disk space: 1024 Mb
Traceback (most recent call last):
  File "/usr/share/backintime/common/backintime.py", line 437, in <module>
    start_app()
  File "/usr/share/backintime/common/backintime.py", line 192, in start_app
    ret = take_snapshot( cfg, True )
  File "/usr/share/backintime/common/backintime.py", line 53, in take_snapshot
    ret = snapshots.Snapshots( cfg ).take_snapshot( force )
  File "/usr/share/backintime/common/snapshots.py", line 939, in take_snapshot
    self._free_space( now )
  File "/usr/share/backintime/common/snapshots.py", line 1520, in _free_space
    info = os.statvfs( self.config.get_snapshots_path() )
OSError: [Errno 95] Operation not supported: '/root/.local/share/backintime/mnt/1_2906'
INFO: [qt4systrayicon] end loop

Revision history for this message
Germar (germar) wrote :

Hi Andreas,

please create a new bug report for this and add these infos:
- remote system (distribution or embedded NAS?)
- remote sshd version
- local ssh version
- output of 'sudo stat /root/.local/share/backintime/mnt/1_*' while BackInTime is running

Germar (germar)
Changed in backintime:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.