sections in this module City College of San Francisco - CS260A
Unix/Linux System Administration

Module: Backups
module list

tar

If you encounter Unix data on the Internet, it is probably provided as a tar archive. This is due to several reasons:

Although many modern Unix (and expecially linux) systems have versions of tar that are fully functional, with new options to govern how it performs, few users use them. We will cover tar in its most basic form and use only the options that everyone knows about. tar was intended to be simple: let it do what it does well! As we will see, we can use our other archive utility cpio as a good complement to round out our archiving capabilities.

tar is one of the oldest Unix programs. It is so old, in fact, that it is customarily used with the original Unix commandline format, where the first word on the commandline consisted of a sequence of single-letter options, without a preceding dash. Hence, tar cv dir1 is often used instead of the current tar -c -v dir1 or tar -cv dir1. tar, along with some other older Unix utilities, will accept either format, but adding the dashes may be taken as an indication that you are a less-seasoned Unix user.

tar syntax

tar { -c | -t | -x } [ -v ] [ -f pathtotarball ]  [ files and directories ]

tar, whose name stands for tape archiver, was meant to write to magnetic tapes. Thus, by default, it places its output on the default tape device. Today, most users choose to place the archive in a file (lovingly called a tarball), which can be specified on the commandline as -f path, where path is where to place the tar archive. (As with all Unix files, a .tar extension is not necessary, but is customary, and serves as a clue to the contents.)

one of the options -c (create new archive), -t (output a table of contents of an existing archive), or -x (extract from archive) must be given. There is also a less-used -r option to replace files in an existing archive. (-r actually appends new data to the end of an archive, essentially replacing the older version with the newer version when the data is extracted due to the way tar works)

-v tells tar to be verbose. Otherwise, tar silently does its work when using -c, -x, or -r. If -v is used with -t, a long-style listing (including dates and permissions as with ls -l) of the archive contents is produced.

Note: by default, tar is recursive.

Creating archives

tar cvf xxx.tar dir1 dir2 file1 file2 file3

creates a new archive named xxx.tar in the current directory and places copies of dir1 and dir2 (recursively) and the three files in it. Since the v option is used, tar will output a list of the paths archived. 

It is highly recommended that paths given to tar be relative. Old versions of tar accepted absolute paths, but most current versions will delete the leading / of an absolute path. In order to restore those files the user must extract the archive when connected to the root directory /.

Listing a table of contents

tar tf xxx.tar [ paths ]

list the paths of each item found in the archive. If paths are found on the commandline, only paths matching those are listed. If the v option is added, an ls -l style listing is produced.

Extracting data from an archive

tar xf xxx.tar [ paths ]

Extract the entire archive [the default] or only the pieces indicated by paths from the archive. The data is extracted unconditionally, creating missing directories if necessary, and the attributes contained on the archive are restored if possible. Thus, if you ask tar to extract xyz from the archive and there is an xyz already present (in the way), tar will simply overwrite it without asking and irrespective of whether the copy on the archive is newer than the existing one. tar will set the modification date of the data extracted to the one on the archive. It will try to restore the permissions, owner and group from the archive, but file creation rules for regular users cause the owner and group to be reset. (For root, the owner and group are restored from the archive). In addition, umask will alter the permissions while extracting if the user has not reset it.

tar can read/write an archive to standard input or output by using - in the position of the archive path.

Linux versions include a -z option to tar to essentially run gunzip before extracting the archive.

Examples (as a normal user):

creates an archive myproject.tar in the current directory and places a recursive copy of myproject in it.
lists the paths (only) in the archive myproject.tar
restores (unconditionally) myproject/Makefile from the archive, setting the permissions, owner and group the same way as a copy would. The modification date of Makefile is set to the modification date on the archive. The directory myproject would be created if necessary.

Note: remember that this syntax is shorthand for tar -x -v -f myproject.tar myproject/Makefile In our example, we have failed to place the f last in the options string so that it is followed immediately by the name of the archive. Some versions of tar will accept this and some will not.

extracts everything from the archive and places it relative to the current directory
is fine, but an incomplete copy of xxx.tar is placed in the archive. This is a problem when you extract the archive, as you overwrite xxx.tar while you are using it. You should never archive the current directory and place the archive in the current directory at the same time.

Don't forget, archives are not compressed! Remember to compress your archive by running it through a compression utility after creating it. tar archives are almost as large as the source data!

Using tar to restore a file from a simple backup

Jamal is using tar to periodically take a snapshot of his project. He keeps his tar archive in ~jamal/project.tar. It is an archive of his project created last week - the last time the project code was stable. It was made by connecting to the directory above his project directory and using tar cf ~/project.tar project    Unfortunately, Jamal has just deleted one of his project files, project/foo.java and he wants to restore the saved version. Thinking that tar is truly a backup utility and will not overwrite newer versions, Jamal types tar xf ~/project.tar and discovers that his entire project directory is overwritten, returning his project to where it was last week! Of course, Jamal angrily reports this bug in tar to the system administrator. Can you patiently tell Jamal what his misunderstanding was and how he can safely use tar to restore a single file?

Jamal should have specified the single file to restore on the tar command

tar xvf ~/project.tar project/foo.java 

Further, perhaps Jamal should add the v option in future so that he can try to limit the damage by quickly using control-C.

Using tar to copy a directory tree with attributes (as root)

You have discovered that your /usr partition is too small. Since you used hard partitions when you were configuring your system, this is a problem. However, one solution is to 'patch' a new partition into the /usr tree at a reasonable point. Examining the sizes of various directories beneath /usr, you discover that the /usr/src directory (which is being used to build linux installed executables from source for customization) is huge. So, you create a new partition large enough to hold /usr/src, create a filesystem on it, and mount it at /mnt/tmp temporarily. The goal is to copy all of /usr/src to the new partition without disturbing it, delete the original, then mount our new parition at /usr/src. This frees up space on /usr. The modern way to do the copy operation is with cp -p. You can, however, do it with tar. Both require you to be root.

umask 0
cd /usr/src
tar cvf - .  |  (cd /mnt/tmp; tar xf -)

Example:

You have downloaded a new version of firefox, as /tmp/firefox-2.0.23.tar.gz which is needed to fix a bug in the version installed on your system.  You notice that the archive contains a single directory named firefox, which contains an executable named firefox. You know that if you execute the firefox executable it runs the new version, but, since this is an archive rather than an RPM file, the firefox files do not replace the default ones on the system, such as /usr/bin/firefox. Thus, when someone (or some GUI launcher) runs firefox, it will run the old installed version rather than your new version. Without figuring out where to drop each of the 1000 new files to replace those currently on the system, how can you upgrade your version?

Solution:

Un-archive the replacement firefox in /usr/local/src. Then create a symbolic link as /usr/local/bin/firefox that points to the firefox executable in /usr/local/src/firefox. Since most linux systems are configured to look for binaries in /usr/local/bin before the system directories, your replacement firefox will be found before the installed firefox.

cd /usr/local/src
tar xzf /tmp/firefox-2.0.23.tar.gz
ln -s /usr/local/src/firefox/firefox /usr/local/bin/firefox

Preview question: In your examination of a linux download site in preparation for this section, you probably never encountered a cpio archive. However, you should have encountered many RPM files. Would it surprise you to know that these are cpio archives in disguise? Do you still think that tar files are the most popular download format for linux?

Prev This page was made entirely with free software on linux:  
Nvu, Kompozer
and Openoffice.org       
Next

Copyright 2008 Greg Boyd - All Rights Reserved.
Document made with Nvu