sections in this module City College of San Francisco - CS260A
Unix/Linux System Administration

Module: Backups
module list

rsync

rsync is a simple, fast way to synchronize a set of files and directories between two systems. It allows you to easily have a mirror of your work area in another location which you can synchronize with a simple command that can be automated using cron(1).

In our discussion, the system that you are actively working on (the local system) is the rsync client. This is the system you run rsync on. The system that holds the backup copy (the remote system) is the rsync server. To backup your current work, you run rsync (on the client) and send some data to the server with a request to write it to your backup repository. If you need to retrieve your backup (as you messed up the current version), you would use rsync to request to read the backup from the server. You can also use rsync to manage a repository of files for download.

rsync is a better solution than scp for transferring data to/from a remote repository for several reasons:

Let's look at a simple example before we continue:

The CS260A online notes consist of 2.5MB of html and jpeg files: 153 files and 17 directories. I have just updated a handful of them, and I don't remember which ones. I could:
  1. use scp to copy the entire tree to my remote repository OR
  2. use rsync to rsync the tree with my remote repository.

The first solution, of course, transfers 2.5MB and takes 67 seconds. The second solution transfered 32 kB and took 3.5 seconds. You can multiply this out to a larger tree to get an idea of the savings. (Besides this, the rsync command can delete files that no longer exist on the server.) The command was not much more complex than an scp command:

rsync -ravz * fog.ccsf.edu:public_html/cs260a/online

This light bandwidth makes it so you can even rsync your entire home directory - only the parts that change will be transferred. Pretty nice.

Besides this speed and bandwidth savings, it is fairly straightforward to setup rsync to do nightly transfers. This way you can go home and be assured that your remote repository matches your local one at the beginning of each day. 

Modes of operation

There are two ways to run rsync. Both involve running rsync on the client, albeit with slightly different syntax. They do, however, differ significantly in the server-side implementation. The two ways are differentiated on the command-line by whether the remote host is followed by a single or double colon.

The first way, which is simpler, uses ssh as the transport agent. Easy. The only downside here is automating nightly transfers: it would require using a passphrase-less key pair for the connection. As we have discussed, this is not the best idea - having a passphrase-less key pair lying around makes your remote account vulnerable if you leave yourself logged in or you have a root user willing to masquerade as you or if your private key is stolen. There are, however, a few extra security measures you can use to limit this exposure:

This would require prepending the following string to the public key (that correponds to your passphrase-less private key) in the .ssh/authorized_keys on the server:

from=ipaddr,no-port-forwarding,no-pty

Note that this still leaves your account open: although a masquerading user cannot start an interactive shell (due to the no-pty option) they can still issue a single command.

You can get even more tricky with this: openssh allows you to specify the command that is run on the server when the connection is established. You can create your own shell script, then add the option command=pathtomyshellscript. In your shell script you can examine the ssh command. It was saved for you in the environment variable SSH_ORIGINAL_COMMAND. After examination, you can decide whether to run the command or not.

The problem with using ssh and a passphrase-less key is that if someone co-opts your account you are really giving them the store. Since you only want to transfer files, it would be nice to have a way to limit any damage to just file transfer. In order to do this easily, you will have to use the second way to run rsync - by configuring an rsync server. Unfortunately, configuring an rsync server has become complicated in recent releases. Because of this, its use will be deferred to the later advanced class.

Synchronizing directories

You can synchronize the backup directory on the server with the one on the client by adding the --delete option to the rsync command. Then any files or directories that have been deleted on the client since the last rsync will also be deleted on the server. Note that this is a dangerous option if you are not careful when specifying your rsync path! If your client path is incorrect (and does not point to the root of the tree you are syncronizing) you will delete your entire rsync copy on the server!

Examples

Let's take a very simple example. This example is drawn from a current assignment - the sysmond program. This program is written on linux, and we will use rsync to keep a backup copy of all the files on our student server, hills. Note that there is not much data here, so scp would be a simple option - but we will keep track of the data transferred and show the smaller bandwidth requirements of rsync. You can generalize to a larger example.

The area consists of an asmt04 directory with a handful of files. Here is what it looks like on linux (the rsync client):

[gboyd@sideshowmel asmt04]$ ls -Rl
.:
total 44
-rwxr-xr-x. 1 gboyd users 2852 Aug 28 14:50 sysmond
-rw-r--r--. 1 gboyd users  247 Aug 28 18:30 sysmond.conf
-rw-r--r--. 1 gboyd users  404 Aug 28 14:34 sysmond-config
-rw-r--r--. 1 gboyd users 5571 Aug 28 14:34 sysmond.old
-rw-r--r--. 1 gboyd users 3851 Aug 28 14:34 sysmond.skel
[gboyd@sideshowmel asmt04]$

Then we start with the first backup of the entire area to hills. We will rsync the entire asmt04 directory to hills, placing it in our home directory:

[gboyd@sideshowmel asmt04]$  cd ..
[gboyd@sideshowmel asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file list
asmt04/
asmt04/sysmond
asmt04/sysmond-config
asmt04/sysmond.conf
asmt04/sysmond.old
asmt04/sysmond.skel

sent 13286 bytes  received 111 bytes  26794.00 bytes/sec
total size is 12925  speedup is 0.96
[gboyd@sideshowmel asmts]$

Back on hills, we list the asmt04 directory and its contents. Note that the dates have been reset on hills (the server) to conform to the dates on the original (the client)

gboyd@hills[~]$ ls -Rl asmt04
total 30
-rwxr-xr-x   1 gboyd      cisdept       2852 Aug 28 14:50 sysmond
-rw-r--r--   1 gboyd      cisdept        404 Aug 28 14:34 sysmond-config
-rw-r--r--   1 gboyd      cisdept        247 Aug 28 18:30 sysmond.conf
-rw-r--r--   1 gboyd      cisdept       5571 Aug 28 14:34 sysmond.old
-rw-r--r--   1 gboyd      cisdept       3851 Aug 28 14:34 sysmond.skel
gboyd@hills[~]$

Next, we will make a single-line change in the sysmond script and rsync the directory again. Note that only the differences in the single changed file are transferred.

[gboyd@sideshowmel asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file list
asmt04/
asmt04/sysmond

sent 943 bytes  received 65 bytes  2016.00 bytes/sec
total size is 12963  speedup is 12.86
[gboyd@sideshowmel asmts]$

Here, rsync reports that 943 bytes have been sent instead of 12963 which would have been sent using scp.

For our next change, we save a copy of our current sysmond version in a new directory old, and make a bigger change in the master version. We then rsync the area again. First, on the client:

[gboyd@sideshowmel asmts]$ ls -lR asmt04
asmt04:
total 52
drwxr-xr-x. 2 gboyd users 4096 Nov 13 12:12 old
-rwxr-xr-x. 1 gboyd users 3226 Nov 13 12:12 sysmond
-rw-r--r--. 1 gboyd users  247 Aug 28 18:30 sysmond.conf
-rw-r--r--. 1 gboyd users  404 Aug 28 14:34 sysmond-config
-rw-r--r--. 1 gboyd users 5571 Aug 28 14:34 sysmond.old
-rw-r--r--. 1 gboyd users 3851 Aug 28 14:34 sysmond.skel

asmt04/old:
total 8
-rwxr-xr-x. 1 gboyd users 2890 Nov 13 12:06 sysmond
[gboyd@sideshowmel asmts]$

Here's the rsync command and output

[gboyd@sideshowmel asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file list
asmt04/
asmt04/sysmond
asmt04/old/
asmt04/old/sysmond

sent 4230 bytes  received 88 bytes  8636.00 bytes/sec
total size is 16189  speedup is 3.75
[gboyd@sideshowmel asmts]$ 

and on the server we have

gboyd@hills[~]$ ls -Rl asmt04
total 32
drwxr-xr-x   2 gboyd      cisdept       4096 Nov 13 12:12 old
-rwxr-xr-x   1 gboyd      cisdept       3226 Nov 13 12:12 sysmond
-rw-r--r--   1 gboyd      cisdept        404 Aug 28 14:34 sysmond-config
-rw-r--r--   1 gboyd      cisdept        247 Aug 28 18:30 sysmond.conf
-rw-r--r--   1 gboyd      cisdept       5571 Aug 28 14:34 sysmond.old
-rw-r--r--   1 gboyd      cisdept       3851 Aug 28 14:34 sysmond.skel

asmt04/old:
total 6
-rwxr-xr-x   1 gboyd      cisdept       2890 Nov 13 12:06 sysmond
gboyd@hills[~]$ 

Later, we decide the old version is no longer necessary and delete it on the client. When we re-rsync the area, nothing happens

[gboyd@sideshowmel asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file list
asmt04/

sent 146 bytes  received 16 bytes  324.00 bytes/sec
total size is 13299  speedup is 82.09
[gboyd@sideshowmel asmts]$

But if we add the --delete option, files and directories no longer on the client are deleted on the server:

[gboyd@sideshowmel asmts]$ rsync -rav --delete asmt04 hills.ccsf.edu:
sending incremental file list
deleting asmt04/old/sysmond
deleting asmt04/old/

sent 147 bytes  received 13 bytes  320.00 bytes/sec
total size is 13299  speedup is 83.12
[gboyd@sideshowmel .asmts]$


Prev This page was made entirely with free software on linux:  
the Mozilla Project
and Openoffice.org    
Next

Copyright 2012 Greg Boyd - All Rights Reserved.