sections in this module City College of San Francisco - CS260A
Unix/Linux System Administration

Module: Backups
module list

Comparing tar and cpio

Simple backup plans can be implemented using either tar or cpio. Both are archive utilities. This means that they take, as input, some filesystem objects, and create a single output file containing everything necessary to recreate the filesystem objects, including their structure and attributes. Although many of us are used to using a single program to do both archiving and compression (e.g., zip), these are actually two separate functions, and are traditionally done in two separate steps on a Unix or linux system, although some versions of tar contain an option to run gzip as well.

Archive programs are used to make a copy of the specified data, they never delete the original. Compression programs, on the other hand, usually replace the original with the compressed version.

tar and cpio work differently, and each has its strengths and weaknesses, making them complimentary. Some applications naturally work better with cpio; some with tar. As is often the case in the Unix community, each program has its followers, and the users of tar greatly outnumber the users of cpio. I believe that is because cpio is harder to use, and most users don't understand its benefits. In this section we will prepare for our study of each of these tools by discussing the overall task of each.

Basic use - tar

tar is very useful for archiving and restoring complete directory trees. It uses a list of paths on the commandline to tell it what should be archived. Each item on the commandline is archived recursively - the user has no control over this. By default, tar, whose name stands for tape archiver, places its output on the default tape device. Today, most users choose to place the archive in a file (lovingly called a tarball), which can be specified on the commandline. 

tar is very easy to use, and everyone who uses Unix at all knows how to use it, making [gzipped] tar archives the universal format for transferring files between Unix systems. Although modern versions have numerous options, most users only know four: create archive (c), extract archive (x), create a tarball (f), and verbose (v). 

Basic use of tar is absolutely essential to anyone running their own linux system since the standard form of a non-distribution-specific release of software is a gzipped-tar-archive with suffix .tar.gz or .tgz

The real limitation of tar becomes apparent when it is used to extract data selectively. tar's job is to restore the entire archive to where it should be, according to the archived paths. It makes no attempt to look at what is there. Thus, it is not a true backup utility, as it happily overwrites new data with old.

Basic use - cpio

cpio is useful for archiving bits and pieces of data. In order to tell it to archive every file in a directory, it must be given the path of each file in the directory. Because of this large amount of input, cpio takes its list of things to archive from standard input instead of the command-line. For consistency with its name (copy-in, copy-out), cpio also outputs the resulting archive to standard output. This means that you normally create a cpio archive in concert with another program. The other program produces the list of things to archive. In the case of archiving a directory recursively, the other program would be find.

cpio behaves more like a backup utility that does tar. Using its defaults, cpio refuses to overwrite newer data, refuses to create new directories (unless the archive contains the directory entry itself), and resets the attributes of the data as if it was just created. 

As you might expect, cpio is not as simple to use as tar, and this is the reason that not as many Unix users know how to use it. However, its strengths for certain kinds of tasks cannot be ignored. In fact, two very important linux data formats use cpio underneath them: RPM and initrd. As we will see, cpio can also easily be used to implement a simple incremental backup scheme.

Preview question: Go to a Unix or Linux site where programs are available for download. What are the data formats you find the downloadable files in, as indicated by their extensions?

Prev This page was made entirely with free software on linux:  
Kompozer
and Openoffice.org        
Next

Copyright 2011 Greg Boyd - All Rights Reserved.
Document made with Kompozer