sections in this module | City
College of San Francisco - CS260A Unix/Linux System Administration Module: Backups |
module list |
rsync is a simple, fast way to synchronize a set of files and directories between two systems. It allows you to easily have a mirror of your work area in another location which you can synchronize with a simple command that can be automated using cron(1).
In our discussion, the system that you are actively working on (the local system) is the rsync client. This is the system you run rsync on. The system that holds the backup copy (the remote system) is the rsync server. To backup your current work, you run rsync (on the client) and send some data to the server with a request to write it to your backup repository. If you need to retrieve your backup (as you messed up the current version), you would use rsync to request to read the backup from the server. You can also use rsync to manage a repository of files for download.
rsync is a better solution than scp for transferring data to/from a remote repository for several reasons:
it has an archive mode (-a) that will preserve (as much as possible) dates, permissions, and links.
you can transfer compressed data (-z), reducing network bandwidth
rsync only transfers differences. It decides whether a file has changed based on the file's modification date and size
you can delete files on the server that no longer exist on the client using --delete
you can test your rsync runs before you commit them by adding the -n option (dry run). This is especially useful if you use the --delete option
Let's look at a simple example before we continue:
The CS260A online notes consist of 2.5MB of html and jpeg files: 153 files and 17 directories. I have just updated a handful of them, and I don't remember which ones. I could:The first solution, of course, transfers 2.5MB and takes 67 seconds. The second solution transfered 32 kB and took 3.5 seconds. You can multiply this out to a larger tree to get an idea of the savings. (Besides this, the rsync command can delete files that no longer exist on the server.) The command was not much more complex than an scp command:
rsync -ravz * fog.ccsf.edu:public_html/cs260a/online
This light bandwidth makes it so you can even rsync your entire home directory - only the parts that change will be transferred. Pretty nice.
Besides this speed and bandwidth savings, it is fairly straightforward to setup rsync to do nightly transfers. This way you can go home and be assured that your remote repository matches your local one at the beginning of each day.
Modes of operation
There are two ways to run rsync. Both involve running rsync on the client, albeit with slightly different syntax. They do, however, differ significantly in the server-side implementation. The two ways are differentiated on the command-line by whether the remote host is followed by a single or double colon.
The first way, which is simpler, uses ssh as the transport agent. Easy. The only downside here is automating nightly transfers: it would require using a passphrase-less key pair for the connection. As we have discussed, this is not the best idea - having a passphrase-less key pair lying around makes your remote account vulnerable if you leave yourself logged in or you have a root user willing to masquerade as you or if your private key is stolen. There are, however, a few extra security measures you can use to limit this exposure:
use a hidden key as the passphrase-less one. Your default key should always have a passphrase.
On the server, add a few options to limit the kinds of things that can be done with the [passphrase-less] ssh connection:
use the from=host option (as we discussed in the ssh section)
use the no-port-forwarding option to rule out jumping to another system
use the no-pty option to disallow starting an interactive session.
This would require prepending the following string to the public key (that correponds to your passphrase-less private key) in the .ssh/authorized_keys on the server:
from=ipaddr,no-port-forwarding,no-pty
Note that this still leaves your account open: although a masquerading user cannot start an interactive shell (due to the no-pty option) they can still issue a single command.
The problem with using ssh
and a passphrase-less key is that if someone co-opts your account
you are really giving them the store. Since you only want to
transfer files, it would be nice to have a way to limit any damage
to just file transfer. In order to do this easily, you will have
to use the second way to run rsync - by configuring an rsync server.
Unfortunately, configuring an rsync server has become complicated in recent
releases. Because of this, its use will be deferred to the later
advanced class.
Synchronizing directories
You can synchronize the backup directory on the server with the one on the client by adding the --delete option to the rsync command. Then any files or directories that have been deleted on the client since the last rsync will also be deleted on the server. Note that this is a dangerous option if you are not careful when specifying your rsync path! If your client path is incorrect (and does not point to the root of the tree you are syncronizing) you will delete your entire rsync copy on the server!
Examples
Let's take a very simple example. This example is drawn from a current assignment - the sysmond program. This program is written on linux, and we will use rsync to keep a backup copy of all the files on our student server, hills. Note that there is not much data here, so scp would be a simple option - but we will keep track of the data transferred and show the smaller bandwidth requirements of rsync. You can generalize to a larger example.
The area consists of an asmt04 directory with a handful of files. Here is what it looks like on linux (the rsync client):
[gboyd@sideshowmel
asmt04]$ ls -Rl
.:
total 44
-rwxr-xr-x. 1 gboyd users
2852 Aug 28 14:50 sysmond
-rw-r--r--. 1 gboyd
users 247 Aug 28 18:30 sysmond.conf
-rw-r--r--. 1 gboyd
users 404 Aug 28 14:34 sysmond-config
-rw-r--r--. 1 gboyd users
5571 Aug 28 14:34 sysmond.old
-rw-r--r--. 1 gboyd users
3851 Aug 28 14:34 sysmond.skel
[gboyd@sideshowmel asmt04]$
Then we start with the first backup of the entire area to hills. We will rsync the entire asmt04 directory to hills, placing it in our home directory:
[gboyd@sideshowmel
asmt04]$ cd ..
[gboyd@sideshowmel asmts]$
rsync -rav asmt04 hills.ccsf.edu:
sending incremental file
list
asmt04/
asmt04/sysmond
asmt04/sysmond-config
asmt04/sysmond.conf
asmt04/sysmond.old
asmt04/sysmond.skel
sent 13286 bytes
received 111 bytes 26794.00 bytes/sec
total size is 12925
speedup is 0.96
[gboyd@sideshowmel asmts]$
Back on hills, we list the asmt04
directory and its contents. Note that the dates have been reset on
hills (the server) to
conform to the dates on the original (the client)
gboyd@hills[~]$
ls -Rl asmt04
total 30
-rwxr-xr-x 1
gboyd
cisdept 2852 Aug 28 14:50
sysmond
-rw-r--r-- 1
gboyd
cisdept 404 Aug 28
14:34 sysmond-config
-rw-r--r-- 1
gboyd
cisdept 247 Aug 28
18:30 sysmond.conf
-rw-r--r-- 1
gboyd
cisdept 5571 Aug 28 14:34
sysmond.old
-rw-r--r-- 1
gboyd
cisdept 3851 Aug 28 14:34
sysmond.skel
gboyd@hills[~]$
Next, we will make a single-line change in the sysmond script and rsync the directory
again. Note that only the
differences in the single changed file are transferred.
[gboyd@sideshowmel
asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file
list
asmt04/
asmt04/sysmond
sent 943 bytes
received 65 bytes 2016.00 bytes/sec
total size is 12963
speedup is 12.86
[gboyd@sideshowmel asmts]$
Here, rsync reports that 943 bytes have been sent instead of 12963 which would have been sent using scp.
For our next change, we save a copy of our current sysmond version in a new directory old, and make a bigger change in the master version. We then rsync the area again. First, on the client:
[gboyd@sideshowmel
asmts]$ ls -lR asmt04
asmt04:
total 52
drwxr-xr-x. 2 gboyd users
4096 Nov 13 12:12 old
-rwxr-xr-x. 1 gboyd users
3226 Nov 13 12:12 sysmond
-rw-r--r--. 1 gboyd
users 247 Aug 28 18:30 sysmond.conf
-rw-r--r--. 1 gboyd
users 404 Aug 28 14:34 sysmond-config
-rw-r--r--. 1 gboyd users
5571 Aug 28 14:34 sysmond.old
-rw-r--r--. 1 gboyd users
3851 Aug 28 14:34 sysmond.skel
asmt04/old:
total 8
-rwxr-xr-x. 1 gboyd users
2890 Nov 13 12:06 sysmond
[gboyd@sideshowmel asmts]$
Here's the rsync command and output
[gboyd@sideshowmel
asmts]$ rsync -rav asmt04 hills.ccsf.edu:
sending incremental file list
asmt04/
asmt04/sysmond
asmt04/old/
asmt04/old/sysmond
sent 4230 bytes received 88 bytes 8636.00 bytes/sec
total size is 16189 speedup is 3.75
[gboyd@sideshowmel asmts]$
and on the server we have
gboyd@hills[~]$
ls -Rl asmt04
total 32
drwxr-xr-x 2 gboyd
cisdept 4096 Nov 13 12:12 old
-rwxr-xr-x 1 gboyd
cisdept 3226 Nov 13 12:12
sysmond
-rw-r--r-- 1 gboyd
cisdept 404 Aug 28 14:34
sysmond-config
-rw-r--r-- 1 gboyd
cisdept 247 Aug 28 18:30
sysmond.conf
-rw-r--r-- 1 gboyd
cisdept 5571 Aug 28 14:34
sysmond.old
-rw-r--r-- 1 gboyd
cisdept 3851 Aug 28 14:34
sysmond.skel
asmt04/old:
total 6
-rwxr-xr-x 1 gboyd
cisdept 2890 Nov 13 12:06
sysmond
gboyd@hills[~]$
Later, we decide the old
version is no longer necessary and delete it on the client. When
we re-rsync the area, nothing happens
But if we add the --delete option, files and directories no longer on the client are deleted on the server:
[gboyd@sideshowmel
asmts]$ rsync -rav --delete asmt04 hills.ccsf.edu:
sending incremental file list
deleting asmt04/old/sysmond
deleting asmt04/old/
sent 147 bytes received 13 bytes 320.00 bytes/sec
total size is 13299 speedup is 83.12
[gboyd@sideshowmel .asmts]$
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |