SSH backup mode: option to launch smart removal "in the background"

Bug #1457210 reported by Mauro
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Back In Time
Fix Released
Wishlist
Germar

Bug Description

This is an enhancement idea.

I currently have a profile that backups my files on a home NAS, connecting to it via SSH. I also have smart removal enabled.
What I'm seeing is that the smart removal phase takes a LOT of time (I have many files, I know), much more than the (incremental) backup itself.
The backup process does not actually end until that phase terminates, so my BIT icon stays in the system tray for a very long time, giving the impression that the backup takes ages to complete. This also dissuades me from closing my laptop until the backup has finished, because I fear I may leave the current backup as "unfinished".

Actually, if I understand well how BIT works, probably it doesn't hurt if I shut does the system before this phase terminates (the last snapshot is shown as complete by BIT GUI and if I browse the backup target I see that the last_snapshot symlink correctly points to a complete snapshot).
However, shutting down my client system will certainly stop the smart removal process, with the risk to have to start again on the next backup. In the end, the number of snapshots to delete may grow a lot if the process is very slow and the uptime of the client system is relatively brief.

So, my idea is this: since we're talking about a backup to a remote system, which is most probably supposed to stay on 24/7, BIT may give you an option to launch the smart removal deletions in the background, so that:
- the backup process on my client system fully terminates
- the smart removal process actually continues on the NAS, without the need to keep my client system on in order to make it complete

The only challenge to make this work well, is that if a new backup starts before the smart deletion process has yet ended on the NAS, the new backup process should not launch a new smart removal job, otherwise this may interfere with the previous one (still running) and/or kill the NAS performance with many deletions to be performed in parallel. If a smart way to handle this is found, I think this feature would be very useful and relatively simple to implement.

Revision history for this message
Germar (germar) wrote :

:D Mauro, do you have some kind of telepathy skills? :D

I just added this feature two days ago! But didn't push it to the trunk yet cause I didn't solve the concurrent processes problem yet

Changed in backintime:
importance: Undecided → Wishlist
status: New → In Progress
assignee: nobody → Germar (germar)
milestone: none → 1.1.6
Revision history for this message
Mauro (mauromol) wrote :

:-D Eheh, if I had such kind of skills, I should better use them with women! :-P

I don't know if it's viable in some way: what about writing some sort of "pid" file where you save the process id of the smart removal job and then schedule a task so that such "pid" file is deleted only AFTER the deletion job has ended?
I think the challenge may be to find a way to do such a "deferred" deletion of that pid file.
Then, if a new backup starts, it should not perform any smart removal operation if that pid file exists AND the process id in it is still alive (and its command line matches some kind of pattern, like including the backup target folder). In this way, you may miss some removals that should have been performed by the second backup, but you'll certainly do them on any subsequent backup schedule once the very first removal job has ended.

Revision history for this message
Germar (germar) wrote :

:D

Jepp, that would be a way, but that's pretty much of code which all need to fit into one line plus the 'find ... -exec cmd {} +; rm ...' stuff for each smart remove. This will easily blow up the max command length (Error: argument list too long).

So by now I'm thinking about a blocking flock method which will start up one nohup command for every snapshot that should be removed. This command would wait until it's able to acquire the flock, check if the snapshot in question is still available and remove it.

Revision history for this message
Germar (germar) wrote :

I added this in our testing ppa (ppa:bit-team/testing). If you would like to check it out please make sure the command flock is available on your NAS (there is no test for this yet). You'll find the option in Auto-remove tab.

Revision history for this message
Mauro (mauromol) wrote :

I've installed this yesterday and checked the "background" option. The backup job it did yesterday seemed not to do any removal operation, but it may be just because there was no removal rule to apply.
How can I check on the NAS if a removal operation in the background is in progress? "ps -aux | grep <something>"? Or any log entries published anywhere in the NAS /var/log?

By the way, flock is already available on my NAS.

Revision history for this message
Germar (germar) wrote :

If smart remove did remove a snapshot you will find a line "INFO: [smart remove] remove snapshots in background: ['20150517-174502-378', '20150422-174501-641']" in your LOCAL /var/log/syslog.

On remote there is no logging. But you should see one nohup process for each snapshot to be removed with 'ps ax | grep nohup'

Revision history for this message
Mauro (mauromol) wrote :

Ok, so I found this in my local /var/log/syslog:
INFO: [smart remove] remove snapshots in background: ['20150426-220002-656', '20150426-210002-793', etc.

However, if I look at my backup target, those folders are still there. So I tried to launch one of the commands listed in the syslog after the above entry, I mean "ssh -p 22 ... user@server nohup sh -c etc. &".
The command is immediately sent to the background, but nothing happens on the NAS.
So, I retrieved the job with fg and... I was asked to enter the passphrase for the RSA key!!!
This might be related to bug 1431305, but the strange thing is that my current session has started with the workaround I have applied until now to the backintime autostart desktop file (where I specified to allocate the terminal - I mean, before applying your suggested fix given yesterday): this workaround has been working for me when backup schedules start, so I can't understand why it doesn't work when running this command.

Anyway, let's forget about the previous problem for a moment. After I typed "fg" and entered my RSA key passphrase, what I see is that the command is BLOCKING on the local system, while the removal process is running on the NAS. However, this is not what I meant in my original report.
I mean, even if I had not to call "fg", if I now shut down my client system, the removal process will end, because the command is running in the background, but in my CLIENT system background.
What would be needed, instead, is a command that is launched on the NAS and continues to work in the NAS background, while the call performed by the client system returns immediately. In this way, the client system could be shutdown with no problem, while the job is still running on the NAS.

Revision history for this message
Mauro (mauromol) wrote :

I'm not a Linux guru, but what I mean is something like running on the remote system a command with "&" followed by "disown", or something like that.

Revision history for this message
Germar (germar) wrote :

Jepp, thats exactly how this should work :D

nohup is used to run processes in background and to disconnect them from tty so the ssh connection can exit immediately.

Do you have nohup on your NAS? I thought this was shipped with busybox, but it's not. It's from package coreutils.

Revision history for this message
Mauro (mauromol) wrote :

Yes, the command nohup is available on the NAS.

Revision history for this message
Germar (germar) wrote :
Download full text (4.2 KiB)

I added some rudimentary logging for the scripts on remote host (just do an update from testing ppa). In your snapshot folder you'll find a file smartremove.lck which is actually the lock (no content) and smartremove.lck.log in where the scripts write all output and some useful informations

should look like this:
1432770262.184404847 - aaf/backintime/ubu1404vm/germar/3/20150525-184351-961 - start
1432770262.190179989 - aaf/backintime/ubu1404vm/germar/3/20150525-184351-961 - got ex_lock
1432770262.251283480 - aaf/backintime/ubu1404vm/germar/3/20150525-184351-961 - finished
1432770262.256490987 - aaf/backintime/ubu1404vm/germar/3/20150522-223717-756 - start
1432770262.260122429 - aaf/backintime/ubu1404vm/germar/3/20150318-223546-818 - start
1432770262.261456539 - aaf/backintime/ubu1404vm/germar/3/20150318-203546-818 - start
1432770262.263580746 - aaf/backintime/ubu1404vm/germar/3/20150318-213546-818 - start
1432770262.268218706 - aaf/backintime/ubu1404vm/germar/3/20150522-223717-756 - got ex_lock
1432770262.339806289 - aaf/backintime/ubu1404vm/germar/3/20150522-223717-756 - finished
1432770262.340950276 - aaf/backintime/ubu1404vm/germar/3/20150318-193546-818 - start
1432770262.348621431 - aaf/backintime/ubu1404vm/germar/3/20150318-223546-818 - got ex_lock
1432770262.350096243 - aaf/backintime/ubu1404vm/germar/3/20150318-173546-818 - start
1432770262.350581102 - aaf/backintime/ubu1404vm/germar/3/20150318-163546-818 - start
1432770262.351450745 - aaf/backintime/ubu1404vm/germar/3/20150318-153546-818 - start
1432770262.353158645 - aaf/backintime/ubu1404vm/germar/3/20150318-183546-818 - start
1432770262.361490961 - aaf/backintime/ubu1404vm/germar/3/20150318-223546-818 - finished
1432770262.364445462 - aaf/backintime/ubu1404vm/germar/3/20150318-203546-818 - got ex_lock
1432770262.382475434 - aaf/backintime/ubu1404vm/germar/3/20150318-203546-818 - finished
1432770262.396404909 - aaf/backintime/ubu1404vm/germar/3/20150318-213546-818 - got ex_lock
1432770262.411775369 - aaf/backintime/ubu1404vm/germar/3/20150318-123546-818 - start
1432770262.412837751 - aaf/backintime/ubu1404vm/germar/3/20150318-113546-818 - start
1432770262.413240759 - aaf/backintime/ubu1404vm/germar/3/20150318-103546-818 - start
1432770262.413891987 - aaf/backintime/ubu1404vm/germar/3/20150318-143546-818 - start
1432770262.414614064 - aaf/backintime/ubu1404vm/germar/3/20150318-133546-818 - start
1432770262.419740549 - aaf/backintime/ubu1404vm/germar/3/20150318-213546-818 - finished
1432770262.421974034 - aaf/backintime/ubu1404vm/germar/3/20150318-193546-818 - got ex_lock
1432770262.440160546 - aaf/backintime/ubu1404vm/germar/3/20150318-193546-818 - finished
1432770262.442458605 - aaf/backintime/ubu1404vm/germar/3/20150318-173546-818 - got ex_lock
1432770262.457332326 - aaf/backintime/ubu1404vm/germar/3/20150318-173546-818 - finished
1432770262.459507940 - aaf/backintime/ubu1404vm/germar/3/20150318-163546-818 - got ex_lock
1432770262.467164434 - aaf/backintime/ubu1404vm/germar/3/20150318-163546-818 - finished
1432770262.469374510 - aaf/backintime/ubu1404vm/germar/3/20150318-153546-818 - got ex_lock
1432770262.476176089 - aaf/backintime/ubu1404vm/germar/3/20150318-153546-818 - finishe...

Read more...

Revision history for this message
Mauro (mauromol) wrote :
Download full text (9.8 KiB)

Hi Germar,
here is the result of my test.

In /var/log/syslog of the client system:

May 29 23:11:23 hppb backintime (mauro): INFO: [smart remove] remove snapshots in background: ['20150429-220001-563', '20150426-210002-793', '20150426-180001-831', '20150426-170002-814', '20150426-160002-378', '20150426-150002-618', '20150426-130002-184', '20150426-120002-638', '20150327-000001-773']
May 29 23:11:23 hppb backintime (mauro): INFO: Command "ssh -p 22 -o ServerAliveInterval=240 -c arcfour <email address hidden> nohup sh -c ':; (echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563 - start"; flock -x 9; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563 - got ex_lock"; test -e /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563 || exit 0; find "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563" -type d -exec chmod u+wx "{}" \;; rm -rf "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563"; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150429-220001-563 - finished") 9>/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/smartremove.lck 2>&1 >>/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/smartremove.lck.log' &" returns 0
May 29 23:11:23 hppb backintime (mauro): INFO: Command "ssh -p 22 -o ServerAliveInterval=240 -c arcfour <email address hidden> nohup sh -c ':; (echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793 - start"; flock -x 9; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793 - got ex_lock"; test -e /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793 || exit 0; find "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793" -type d -exec chmod u+wx "{}" \;; rm -rf "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793"; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-210002-793 - finished") 9>/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/smartremove.lck 2>&1 >>/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/smartremove.lck.log' &" returns 0
May 29 23:11:23 hppb backintime (mauro): INFO: Command "ssh -p 22 -o ServerAliveInterval=240 -c arcfour <email address hidden> nohup sh -c ':; (echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831 - start"; flock -x 9; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831 - got ex_lock"; test -e /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831 || exit 0; find "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831" -type d -exec chmod u+wx "{}" \;; rm -rf "/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831"; echo "$(date +%s.%N) - /shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/20150426-180001-831 - finished") 9>/shares/BackupCustom/Dati/backintime/hppb/mauro/Documenti/smartremo...

Revision history for this message
Mauro (mauromol) wrote :

It remains true that, if I run one of those commands reported in /var/log/syslog and bring it back to foreground with fg, I am prompted for the rsa key passphrase... So, maybe this is the source of the problem?

Revision history for this message
Germar (germar) wrote :

No, the prompt for your passphrase is not the problem. That's only cause by a workaround I did in BIT for a bug in ssh-agent. BIT starts an own instance of ssh-agent and unlocks your private key in there so all commands executed by BIT will not ask for the passphrase. But this ssh-agent instance will close with BIT and you don't have access to it either. Thats why you get asked for the passphrase when running the command manually.

But while you wrote you had to use fg to bring it to foreground I realised that I didn't quote the command correctly. Not the ssh command should get send to background but the nohup command on remote. An update is pushed to the ppa and should be ready in about an hour.

Thanks for testing!

Revision history for this message
Germar (germar) wrote :

Hmmm. I cancelled building cause this doesn't work either. Need to take a deeper look tomorrow.

Revision history for this message
Mauro (mauromol) wrote :

Hi Germar,
I see there's a 1.1.5.5 version: does it have something for this?

Revision history for this message
Germar (germar) wrote :

Jepp. This should work now. I ditched nohup cause it didn't work like I thought. I also added debug logging to syslog. Please make sure screen, logger, bash and flock is available on remote. I'll add checks for those later on.

Revision history for this message
Mauro (mauromol) wrote :

Thank you Germar! I updated BIT on my client system.
On the NAS I had to install screen, because it was missing.

However, I'm currently in a condition in which I have no snapshot that smart removal must delete. So I can't test if it's working right... Anyway, I'll keep an eye on it and let you know how it goes.

Revision history for this message
Mauro (mauromol) wrote :

Hi Germar,
it seems like this is working perfectly now! Great job!! :-)

Revision history for this message
Germar (germar) wrote :

Thanks for testing!

Germar (germar)
Changed in backintime:
status: In Progress → Fix Committed
Germar (germar)
Changed in backintime:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.