sections in this module | City
College of San Francisco - CS260A Unix/Linux System Administration Module: Administration Basics I |
module list |
Preview question: If you want to download some data from a Unix system, you will often encounter the extension .tar.gz (or .tgz). Would you know what to do with the file? |
Most of us have used zip files. The zip file format is a compressed archive.
Traditionally, these two operations (archiving and compressing)
were separate, and they continue to be separate on Unix systems
today. One program performs the archiving function (tar or cpio) and another
performs the compression (gzip,
or bzip2). (zip
programs exist on many versions of Unix now, but they are not often
used, except to share data with non-Unix systems. Rather, the newer xz
compression is used, which can be opened on a non-Unix system using
7zip.)
We will cover archiving programs in detail much later. However, tar files are so common on Unix that we should learn about its simple use much sooner, so we will use it to explain archiving.
Simple archiving with tar
Archiving is the process of combining (archiving) the contents of one (or more) directory structures and all the files in it into one archive file, affectionately called a tarball. The resulting tarball has everything needed to reproduce the source structure and contents. Contrary to zip, archive programs on Unix do not delete the source data.Obviously, the archive file should be placed in outside that being archived.
Example: create an archive of the directory Doc and all its contents. Name the archive Doc.tar
tar -cf Doc.tar Doc
After this command is finished, Doc.tar can be moved to a new location (or new system) and re-expanded to reproduce the Doc directory using
tar -xf Doc.tar
An archive file created by tar is affectionately called a tarball.
(Note: -cf is a
combination of -c
and -f. -f takes an argument:
the name of the archive, hence, Doc.tar must come immediately after the f. Although it should be
acceptable for the options to be separated, some older vesions of
tar insist there is a single - with all the options after it. Linux tar
is not so fussy. Also, tar is so old that it is often used without the leading dash for
the options, thus tar xf
Doc.tar is acceptable as well.)
Information on tar options
option |
meaning |
-v |
verbose (gives list of files
archived or extracted (-c
or -x)) OR gives ls -l type output when using -t |
-c |
create archive |
-x |
extract archive |
-t |
list archive (t stands for
table of contents) |
-f archivepath |
path to tarball |
-z |
archive is compressed with gzip |
-j | archive is compressed with bzip2 |
-J |
archive is compressed with xz |
One of the options -c, -x, or -t must be given. The -f archivepath option must also be given, since the default location of the archive is usually a device that does not exist (it used to be the default tape drive)
You can combine the options in a single string, but, for portability, the f option must come last, so that it is immediately followed by the archivepath.
When extracting, the results are placed relative to the current
directory. When archiving, the path to the directory to archive
should be given relative to the current directory. tar overwrites anything that is in
its way even if the object in its way is read-only.
Examples
tar xvf x.tar
Extract the archive x.tar,
giving a list of data extracted. If the archive contained a file abc/foo, a directory abc would be created in
the current directory and foo
placed in it. If foo
previously existed, it would be overwritten. The archive is not
deleted.
tar -czf abc.tgz abc
Create the archive abc.tgz
to contain the directory abc
and all its contents. The resulting archive is gzipped. Here, tar
is silent. The directory abc is not removed.
tar -t -v -f x.tar
Output a table of contents of the archive x.tar, using an ls -l style format.
Note that the presence of the leading dash when using a single
string of options is redundant
Since an archive contains all of the information in the original data, it is approximately the same size:
$
du -sk Doc
25508 Doc
$ du -sk Doc.tar
21064 Doc.tar
$
Because of this, archives are often compressed before they are transported to another system.
Compression
Compression programs work by finding redundant patterns in data and replacing them with information about them. There are several compression utilities on Linux, but gzip is now the most commonly used. All of them compress the input, then replace it with the compressed version. By default, the name is changed by adding an extension when the compressor is run.
compressor
input -->
produces input.xx (and deletes input!)
compressor | xx | uncompressor |
gzip | gz | gunzip |
bzip2 | bz2 | bunzip2 |
xz |
xz |
unxz |
The command
gzip Doc.tar
compresses Doc.tar and replaces it with Doc.tar.gz. When uncompressing Doc.tar.gz, the base name is given as input, although the entire name is acceptable:
gunzip
Doc.tar or gunzip Doc.tar.gz
looks for Doc.tar.gz, uncompresses it and replaces it with Doc.tar
uncompress works similarly, but bunzip2 requires the entire name of the compressed file when uncompressing it.
You can avoid this naming restriction by using standard input and standard output:
Question: You find xxx.tar.gz on the Internet. How do you extract it?
Answer: The extensions show that this is a gzipped tarball. First, uncompress it using gunzip xxx.tar Then extract the archive using tar -x -f xxx.tar (remember, gunzip removed the .gz extension).
gunzip xxx.tar
tar -x -f xxx.tar
If the input was named xxx.tgz, you can standard input and output
gunzip < xxx.tgz > xxx.tar
tar
-x -f xxx.tar
On Linux, tar has
the gzip compressor
built into it. You can skip the gzip/gunzip step by adding the z option to tar:
tar
-cvzf Doc.tgz Doc
produces a gzipped tarball of the Doc directory. It can be uncompressed and restored using
tar
-xvzf Doc.tgz
Notes for the compression programs
gzip - has
compression levels 1 through 9, where 9 is the most compression.
(You specify this using -N,
where N is the
compression level desired.) As you might expect, the more
compression you do, the longer it takes and the more space it
saves. The default level is the best tradeoff between time and
space. gzip can also
be used to uncompress (rather than gunzip), as gunzip simply calls gzip!
$
ls -l /bin/{gzip,gunzip}
-rwxr-xr-x. 1 root
root 61 Jun 29 2010 /bin/gunzip
-rwxr-xr-x. 1 root root 68616 Jun 29 2010 /bin/gzip
$ file /bin/gunzip
/bin/gunzip: POSIX shell script text executable
$ cat /bin/gunzip
#!/bin/sh
PATH=${GZIP_BINDIR-'/bin'}:$PATH
exec gzip -d "$@"
compress - older, faster compressor (not available on linux) (gzip can now uncompress files that have been compressed with compress)
bzip2 - the best, but slowest, compressor. You will see .tar.bz2 (or .tbz2) files on the Internet, but not nearly as often as .tar.gz files.
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |