Blog‎ > ‎

Recovering from an interrupted rdiff-backup

posted May 22, 2013, 4:53 AM by Warren Howard

I use rdiff-backup a bit to script nightly backups for the servers I set up. Good tool that does its job well… Except, when the backup has been interrupted mid-way. Across networks this happens occasionally and for me when it does happen rdiff-backup will not run normally again without administrator intervention.

Recovery steps in detail

I'm using rdiff-backup 1.1.5 and Python 2.4.4. The error I get looks like this

Exception '' raised of class 'exceptions.AssertionError':
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 295, in error_check_Main
    try: Main(arglist)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 315, in Main
    take_action(rps)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 273, in take_action
    elif action == "check-destination-dir": CheckDest(rps[0])
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 774, in CheckDest
    need_check = checkdest_need_check(dest_rp)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 810, in checkdest_need_check
    if not force: curmir_incs[0].conn.regress.check_pids(curmir_incs)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 448, in __call__
    return apply(self.connection.reval, (self.name,) + args)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 367, in reval
    for arg in args: self._put(arg, req_num)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 139, in _put
    else: self._putobj(obj, req_num)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 144, in _putobj
    self._write("o", pickle.dumps(obj, 1), req_num)
  File "pickle.py", line 1386, in dumps
    Pickler(file, protocol, bin).dump(obj)
  File "pickle.py", line 231, in dump
    self.save(obj)
  File "pickle.py", line 293, in save
    f(self, obj) # Call unbound method with explicit self
  File "pickle.py", line 614, in save_list
    self._batch_appends(iter(obj))
  File "pickle.py", line 647, in _batch_appends
    save(x)
  File "pickle.py", line 293, in save
    f(self, obj) # Call unbound method with explicit self
  File "pickle.py", line 737, in save_inst
    stuff = getstate()
  File "/var/lib/python-support/python2.4/rdiff_backup/rpath.py", line 754, in __getstate__
    assert self.conn is Globals.local_connection

Traceback (most recent call last):
  File "/usr/bin/rdiff-backup", line 23, in ?
    rdiff_backup.Main.error_check_Main(sys.argv[1:])
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 295, in error_check_Main
    try: Main(arglist)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 315, in Main
    take_action(rps)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 273, in take_action
    elif action == "check-destination-dir": CheckDest(rps[0])
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 774, in CheckDest
    need_check = checkdest_need_check(dest_rp)
  File "/var/lib/python-support/python2.4/rdiff_backup/Main.py", line 810, in checkdest_need_check
    if not force: curmir_incs[0].conn.regress.check_pids(curmir_incs)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 448, in __call__
    return apply(self.connection.reval, (self.name,) + args)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 367, in reval
    for arg in args: self._put(arg, req_num)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 139, in _put
    else: self._putobj(obj, req_num)
  File "/var/lib/python-support/python2.4/rdiff_backup/connection.py", line 144, in _putobj
    self._write("o", pickle.dumps(obj, 1), req_num)
  File "/usr/lib/python2.4/pickle.py", line 1386, in dumps
    Pickler(file, protocol, bin).dump(obj)
  File "/usr/lib/python2.4/pickle.py", line 231, in dump
    self.save(obj)
  File "/usr/lib/python2.4/pickle.py", line 293, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.4/pickle.py", line 614, in save_list
    self._batch_appends(iter(obj))
  File "/usr/lib/python2.4/pickle.py", line 647, in _batch_appends
    save(x)
  File "/usr/lib/python2.4/pickle.py", line 293, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.4/pickle.py", line 737, in save_inst
    stuff = getstate()
  File "/var/lib/python-support/python2.4/rdiff_backup/rpath.py", line 754, in __getstate__
    assert self.conn is Globals.local_connection
AssertionError
Fatal Error: Lost connection to the remote system

My first attempt at fixing this problem was to try using the check-destination-dir option since the rdiff-backup man page claims that “running rdiff-backup with this option on the destination dir will undo the failed directory”

/usr/bin/rdiff-backup --check-destination-dir root@remote.backup.host::/path/to/backup

But this didn't work and produced similar (or maybe even the same) error as the one above. Next I decided to try using the “force” option since I had read descriptions from others finding success in fixing rdiff-backp errors with this option. From the rdiff-backup man page the force option will “Authorize a more drastic modification of a directory than usual”.

# /usr/bin/rdiff-backup --force --check-destination-dir root@remote.backup.host::/path/to/backup
# echo $?
0

OK, that completed successfully, for completeness list the available backups.

# /usr/bin/rdiff-backup -l root@remote.backup.host::/path/to/backup

Found 239 increments:
    increments.2007-10-07T02:11:45-07:00.dir   Sun Oct  7 02:11:45 2007
    increments.2007-10-18T09:55:29-07:00.dir   Thu Oct 18 09:55:29 2007
    increments.2007-10-18T10:56:32-07:00.dir   Thu Oct 18 10:56:32 2007
.
.
.
    increments.2008-06-09T19:00:02-07:00.dir   Mon Jun  9 19:00:02 2008
    increments.2008-06-10T19:00:03-07:00.dir   Tue Jun 10 19:00:03 2008
    increments.2008-06-11T19:00:03-07:00.dir   Wed Jun 11 19:00:03 2008
Current mirror: Thu Jun 12 19:00:03 2008

To summarize, the error message was long and bewildering. However the only thing that had “changed” since the last successful backup was that a normal backup had been interrupted. The backup location had become corrupted requiring the use of the “force” option to make the backup location usable again.

Original post written and posted on Tue, 2008/10/21 - 10:38