www.postcogito.org
          ...sine propero notiones

Kiko
You are here: Kiko > SoftwareAndUtilities > DeltaCPProgram Printable | topic end


Start of topic | Skip to actions
Versão em Português

The deltacp.pl program

deltacp copies a file (usually a block device) to another machine while striving to send only the differences between them.

Discussion List

See the deltacp discussion list. Click here to subscribe.

License and Downloads

This program is licensed under the CC-GNU GPL 2.0.

Synopsis

  • Normal ("sender") mode:
    deltacp.pl options source destination

  • Patch ("receiver") mode:
    deltacp.pl options -O file

source is a local file, tipically a block device.

destination is of the form user@host:file and points to the remote machine and destination file.

Possible options are zero or more of the following:

-v Increase verbosity, repeat for more
-b value Set block size in bytes (default is 65536)
-L filename Sets the name of the log file in the remote end. None is generated if unspecified
-O filename Puts the program in patch mode and specifies the name of the file to be patched.
The patching instructions must come from the standard input.
-d filename Save original blocks in the specified filename (i.e., creates a reverse diff)
-D filename Save patched blocks in the specified filename (i.e., creates a forward diff)
-x args Pass args to ssh (such as -C to enable compression or -o 'something')
-F Force: disable safety checks (such as writing to a mounted device)
-q Quiet mode: supresses all output except error messages
-n Dry run: do it all except actually writing to the output file (but does generate diffs)
-c In patch mode, create the output file if it doesn't exist
-e Erase diff files if they turn out to be empty
-t Truncate the output file to be the exact same size as the input
-a Append to diff files
-s block Start at this block number
-r retries How many times to retry if the connections breaks. Default: 50
-p seconds How many seconds to wait between retries. Doubles (up to 300) each retry.

Description

Normal mode

This mode is invoked when you specify both the source and destination files. The program will spawn a ssh session to the remote host, logging on with the specified user and there it will run another instance of the program in patch mode. If you omit both host and user, the program will operate locally and will not use ssh.

When invoking the other instance, either locally or over ssh, the originating end passes the -D, -d, -n, -e, -t, -F and -b options to the destination end. Besides, it adds -qc and, of course, -O.

The local instance will read the source file block by block and send these hashes to the remote instance. The remote instance will perform the same hash calculations for the destination file and if its hashes hashes differ, it will request the full blocks. Upon receiving them, the remote instance will patch the destination file with the received blocks.

If the -d option has been specified, the remote instance will save the data blocks before patching them, so the resulting file will be a "binary reverse diff" that can be used to undo the operation.

If the -D option has been specified, the remote instance will save the data blocks after patching them, so the resulting file will be a "binary forward diff" that can be used to redo the operation.

By default, the program doesn't write to the diff files if they already exist, to prevent users from accidentally overwriting important diffs. This can be overridden with --force.

You can also append to an already existing diff files by using -a. This is useful to continue an interrupted session.

As the program runs it prints progress statistics including how many blocks have been transferred so far. If the file size is known in advance, it will also print the total number of blocks and a the current block as a percentage of the total. If more than 5 seconds have been elapsed since the beginning of the transfer, an estimate of the remaining time is shown.

Other stats include how many blocks have been really transferred in their entirety and how many have been skipped because they've been found to be the same in both machines.

If the remote destination file is an ordinary file and it's larger than the original, its size will be preserved: it will not be truncated to the size of the original, unless you explicitly specifiy the -t option.

If the source file is larger than the destination file and the latter is a normal file, it will grow to the exact size of the source. Of course, most block devices can't grow (logical volumes can, though not automatically), so if the destination is a block device smaller than the original, you will get an error.

The program will try to prevent the user from doing dangerous things such as writing to mounted devices or running several instances of the program operating at the same files/devices.

If the connection breaks during transfer, the program will call itself again to continue from where it stopped. It pauses between each retry. This provides resilience against links that go down and up again all the time, such as crappy broadband or dialup connections on hotels.

Patching mode

This mode is automaticaly invoked by the source instance, but can also be used by hand to process forward/reverse diff files.

What defines the patching mode is the presence of the -O option, which specifies the name of the file to be patched. In this case, the sequence of block commmands comes from the standard input.

The -d and -D options can be used for saving the blocks before and after pacthing, respectively, so as to generate diff files. If you specify '-' as the diff filename, a name including the current date will be automatically generated for you.

If -e is also specified, diff files that turn out to be empty are deleted.

If you specify -n, the file specified in -O will not be modified. This is useful when you just want to generate the diffs but not actually changing the image file.

The -t option forces the file specified in -O to be the same length as the original. This will work even if you say -n.

The -c option will create the file specified in -O if it does not exist already.

Examples

In the examples below, what you type is in bold. The rest are prompts and program outputs.

The examples require that your kernel support the loopback device so we can pretend a file is a block device. If your kernel doesn't, you can replace the *.img files with the block devices of real spare hard disks, although things will probably take a lot longer because your disks will almost certainly be much larger than the 16MB we use below.

Simple Volume Backup

Create a 16MB zero-filled file to serve as space for us to play ('local$ ' is the non-root prompt in the first machine; don't type it):

local$ dd if=/dev/zero of=example.img bs=16k count=1000
1000+0 records in
1000+0 records out

Do likewise in the remote machine. This space will be our "remote backup volume":

remote$ dd if=/dev/zero of=backup.img bs=16k count=1000
1000+0 records in
1000+0 records out
16384000 bytes (16 MB) copied, 0,584972 seconds, 28,0 MB/s

Create an ext2 filesytem on it (the mkfs.ext2 output may be a little different):

local$ yes | /sbin/mkfs.ext2 example.img
mke2fs 1.35 (28-Feb-2004)
example.img is not a block special device.
Proceed anyway? (y,n) Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
4000 inodes, 16000 blocks
800 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=16515072
2 block groups
8192 blocks per group, 8192 fragments per group
2000 inodes per group
Superblock backups stored on blocks:
        8193

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Mount it using the loopback device (notice the '#' prompt meaning you need root for this):

local$ /bin/su
Password: (your root password)
local# mkdir /mnt/example
local# mount -t ext2 -o loop example.img /mnt/example

Put some stuff in it:

local# cp /bin/b* /mnt/example
local# ls -l /mnt/example
total 733
-rwxr-xr-x  1 root root  15444 Feb 23 12:02 basename
-rwxr-xr-x  1 root root 616248 Feb 23 12:02 bash
-rwxr-xr-x  1 root root  98356 Feb 23 12:02 bsh
drwx------  2 root root  12288 Feb 23 12:01 lost+found

In my example I've got only three files because I was using a fresh, minimalistically installed machine for generating the example. Your will probable have some more.

Unmount it so that the image is synced and consistent:

local# umount /mnt/example
local# exit

Now let's diffsync them for the first time unsing deltacp.pl:

local$ deltacp.pl example.img user@remote:backup.img
  current/total   current% | skips%     skips/sent
      250/250       100.0% |  92.8%        18/232       ---d --:--:--

Only 18 blocks out of 250 were actually transferred; we avoided transmitting 92.8% of the 250 blocks (that is, 232 blocks). Your exact values will most likely be different, of course.

Let's get convinced that the two files are the same: in the local, do:

local$ md5sum example.img
6d7adfd8306c35ff6322a8060e0417f7  example.img

And in the remote:

remote$ md5sum backup.img
6d7adfd8306c35ff6322a8060e0417f7  backup.img

The hashes match, so the files are identical. (The exact value of your hashes will be different, of course).

Recovering overwritten files

Let's simulate a common disaster -- overwriting an important file -- and then use the reverse diff feature to restore the filesystem to a previous state so we can acess the original file before it was overwritten. For generality, this example uses bash and tar, but imagine bash was something very important to you such as your final report due tomorrow that too many sleepless nights made you mistakenly overwrite.

We will reuse the filesystem we created in the previous exaple. First we remount it. We then overwrite the super-important /bin/bash executable and finally unmonunt the filesystem.

local% /bin/su
Password: (your root password)
local# mount -t ext2 -o loop example.img /mnt/example
local# cp /bin/tar /mnt/example/bash
cp: overwrite `/mnt/example/bash'? y
local# umount /mnt/example
local# exit

Now we use deltacp.pl to diffsync it. This time, however, we will be creating a timestamped reverse diff:

local% <b>deltacp.pl -d backup.img-diff-`date +%y%m%d-%H%M%S` example.img user@remote:backup.img</b>
  current/total   current% | skips%     skips/sent
      250/250       100.0% |  98.0%         5/245       ---d --:--:--

Now, let's see what we have in the remote:

kiko@melee:~$ ls -l backup.img*
-rw-r--r-- 1 kiko kiko 16384000 2007-02-23 12:38 backup.img
-rw-r--r-- 1 kiko kiko   327805 2007-02-23 12:38 backup.img-diff-070223-113928

We have both the updated image and the reverse diff.

Let's mount the backup image in the remote in read-only mode and examine our 'bash' program:

remote% /bin/su
Password: (your root password)
remote# mkdir /mnt/backup
remote# mount -t ext2 -o ro,loop backup.img /mnt/backup
remote# ls -l /mnt/backup
total 286
-rwxr-xr-x 1 root root  15444 2007-02-23 11:27 basename
-rwxr-xr-x 1 root root 161188 2007-02-23 11:36 bash
-rwxr-xr-x 1 root root  98356 2007-02-23 11:27 bsh
drwx------ 2 root root  12288 2007-02-23 11:27 lost+found
remote# /mnt/backup/bash --version
tar (GNU tar) 1.14
Copyright (C) 2004 Free Software Foundation, Inc.
This program comes with NO WARRANTY, to the extent permitted by law.
You may redistribute it under the terms of the GNU General Public License;
see the file named COPYING for details.
Written by John Gilmore and Jay Fenlason.

Ooops! bash isn't bash, it's tar! As if we didn't know that. But in a real disaster situation, we'd probably be surprised or even shocked to discover your precious file was clobbered -- which means near zero chance of success for undeletion tools.

Let's unmount the filesystem, use the reverse diff to make the filesystem 'go back in time', remount it and see what we get:

remote# umount /mnt/backup
remote# exit
remote$ deltacp.pl -O backup.img < backup.img-rdiff-070223-113928
remote$ /bin/su
Password: (your root password)
remote# mount -t ext2 -o loop backup.img /mnt/backup
remote# /mnt/backup/bash --version
GNU bash, version 3.00.15(1)-release (i386-redhat-linux-gnu)
Copyright (C) 2004 Free Software Foundation, Inc.
remote# umount /mnt/backup

Now bash is bash again.

Coalescence

Suppose we have lots of reverse diffs generated by our daily backup routine and we want to go back a few days in time to recover some deleted stuff. Let's first take a look at them and mark the current state of our image.

local$ ls -l rdiff-07022*
-rw-r--r--   1 kiko kiko    6490539 2007-02-20 01:29 rdiff-070220
-rw-r--r--   1 kiko kiko   46220530 2007-02-21 23:37 rdiff-070221
-rw-r--r--   1 kiko kiko  102275185 2007-02-22 22:20 rdiff-070222
local$ md5sum /dev/sda3
99118e13e6e535bd4c3f5e0df8fbe2ba  /dev/sda3

Now let's apply them (in reverse order, of course), at the same time generating a coalesced reverse diff so we can come bring it again back to the present time:

local$ cat `ls rdiff-07222* | sort -r` | deltacp.pl -d rdiff-coalesced -O /dev/sda2

The coalesced diff, however, is smaller than the sum of the individual diffs because the program saves the changed block the first time it is changed:

local$ ls -l rdiff-coalesced
-rw-r--r--   1 kiko kiko  118009825 2007-02-23 09:11 rdiff-coalesced

Anyway, at that point we could mount the filesystem on /dev/sda3, recover our stuff and so on. When we're done, we unmount and restore it to the previous state using the coalesced diff:

local$ deltacp.pl -O /dev/sda3 < rdiff-coalesced
-rw-r--r--   1 kiko kiko  118009825 2007-02-23 09:11 rdiff-coalesced
local$ md5sum /dev/sda3
99118e13e6e535bd4c3f5e0df8fbe2ba  /dev/sda3
local$ rm rdiff-coalesced

Motivation

rsync is a full-featured differential backup/mirroring tool that does something like that and a lot more -- but it will neither read from nor write to block devices: you get a 'skipping non-regular file' if you try.

Browsing the rsync mailing list archives I learned that this issue pops up every now and then. A patch for that has been proposed but it's for an old 2.5.5 version lacking in many of the latest features and security updates. The consensus, however, seems to be that this feature is best left to a standalone utility and that rsync won't support it.

Protocol

The file is divided in equal sized blocks, possibly except for the last one if the size of the file isn't an exact multiple of the block size.

For each block, the sender transmits a 25-byte message to the receiver:

  • 1 byte command
  • 4-byte block number as a 32-bit long integer
  • 4-byte block length as a 32-bit long integer
  • 16-byte MD5 hash of the block

The receiver computes the hash for the specified block number. If the computed hash matches the received one, it responds with 0x0, meaning "don't send me the block". Otherwise, it responds with 0x1, meaning, "do send this block".

If the sender gets 0x01, it sends the whole block next. Otherwise it sends the next 25-byte message describing the next block.

Overhead

If source and destination files are totally different, the program adds and overhead of exactly

25*int((file_size+block_size-1)/block_size)

bytes. This amounts to about 15MB of overhead for each 40GB of data if we use the default 64KB block size or about 0,038%.

This also means that 15MB will be transferred even if nothing changes.

Dynamic Characterstics

Being a stop-and-wait protocol, its efficiency is dominated by latency: it works better in local area networks where latency is usually sub-milissecond.

On the other hand, the default large block size, which is much larger than single TCP packets, dissipates this on hundred-milissecond latency connections over the Internet or VPNs.

When doing online remote offiste backups, use a traffic shaper to keep bandwidth usage under tabs.

Prerequisites and Installation

The program requires Perl 5, preferably with large file support.

To check whether you perl installation satifies this requisite, type:

perl -V | grep -ic USE_LARGE_FILES.

If you get 1, your Perl is almost certainly fine. If you get 0, the program will still work but but the maximum file size it will be able to handle is only 2GB, which limits its usefulness.

By default the program uses SSH: if you can ssh to the remote user's account in the remote machine, so will the program.

It must be installed and in the default search path of the user in the remote end.

To install, unzip it to /usr/local/bin and set permissions to 755. This must be done in both the sender and the receiver. Make sure this directory is listed somewhare the remote user's PATH environment variable.
top


You are here: Kiko > SoftwareAndUtilities > DeltaCPProgram

top

Creative Commons License   The content of this site is made available under the terms of a Creative Commons License, except where otherwise noted.
  O conteúdo deste site está disponibilizado nos termos de uma Licença Creative Commons, exceto onde dito em contrário.