sections in this module | City
College of San Francisco - CS260A Unix/Linux System Administration Module: Administration Basics I |
module list |
Preview question: You are preparing for a new release and encounter a configuration file that you modified. As you should, you saved the original version, but you cannot remember what changes you made. The files are 246 lines each. How do you compare them so that you can decide if the changes are needed when you reinstall the system? |
diff is used to compare two text files
diff first second
outputs information about what changes need to be made to first so that it matches second. For reasons we will see later, we will refer to the file that we want to change (first in the example above) as the file on the left. Similarly, the file we are comparing it against (second in the example above) will be referred to as the file on the right.
diff distinguishes three types of alterations to the file on the left
Before we look at what diff outputs for each of these three types of differences, let's consider what information we would need in order to implement an add block. In this alteration, we need the following information:
This is exactly the information that diff outputs. To enable the viewer to decide whether she wants to incorporate the alteration, diff also outputs a copy of the lines from the file on the right that should be copied.
The example below shows part of the comparison of two files currently on our linux boxes:
Here is how to interpret this difference:
We will now just reverse the comparison to create a delete block:
In this example, the d indicates delete, the line numbers to delete from the file on the left are indicated to the left of the d, and the number to the right of the d indicates the line number in the file on the right that you will be at after the deletion (i.e., these lines would have appeared at this line in the file on the right). Again, the left-pointing symbol indicates the lines shown are from the file on the left.
In this case, the fact that diff outputs copies of the lines in question
in addition to their line numbers allows us to decide upon
scanning that this change is inconsequential, as the changes
appear in a comment.
Last, we will look at a change
block from a different file (in /etc)
The change block
above indicates that one line has changed. The c indicates a change block. The line number
to the left of the c indicate the number of the line changed in
the file on the left,
and the line itself is output below with the left-facing symbol.
The line number to the right of the c indicate the number of the
line changed in the file on the
right.
Again, the line itself is output below with the right-facing symbols.
The line from the two files are separated by a dashed line. Of course,
this change block consists of a single line, and the line change did
not insert or delete lines from the original, so the line number in
each file is the same. If it consisted of multiple sequential lines,
each line number indicator would be in the form startline,stopline (see a later example).
Similarly, looking at a set of change blocks from a file comparison can show a pattern of changes:
-bash$
diff classify1.bash classify1.bash.sv
10c10
< echo "$progname: need exactly one
argument" >&2
---
> echo "$progname: need exactly one
argument"
15c15
< echo -e "$progname: \"$thisfile\" is
neither a directory nor a file." >&2
---
> echo -e "$progname: \"$thisfile\" is
neither a directory nor a file."
19c19
< echo -e "$progname: \"$thisfile\" is not
readable." >&2
---
> echo -e "$progname: \"$thisfile\" is not
readable."
Here we can see that the author edited classify1.bash to send error messages to standard error instead of standard output. Once you become comfortable with the output format of diff you can easily spot patterns of changes, even when they are mixed in with other output.
Options to diff
As with most Unix programs, diff has a number of options. There are only four that are important enough to mention here. We will illustrate the first two options and how they differ (!) with an example:
-bash$
diff classify1.bash classify1.bash.sv
26,30c26,28
<
*directory*)
echo "$thisfile is a directory" ;;
<
*ascii*) echo
"$thisfile is an ascii file" ;;
<
<
*commands*) echo
"$thisfile is a commands file" ;;
<
---
>
*directory*) echo "$thisfile is a
directory" ;;
>
*ascii*) echo
"$thisfile is an ascii file" ;;
>
*commands*) echo "$thisfile is a
commands file" ;;
-bash$
A quick look at this change block shows that the differences here are all due to whitespace. This is the most important type of information to suppress when using diff, as any change in indentation can pollute the output with a deluge of useless information.
There are two options to diff that suppress the output of differences that are only due to whitespace:
Let's look at the above change block when we use each of these options:
With -b the differences due to amounts of whitespace between words are suppressed.
diff
-w classify1.bash classify1.bash.sv
28d27
<
30d28
<
-bash$
With -w, the differences due to the the amount of leading whitespace is suppressed as well. Note that neither -w nor -b suppress the appearance of additional blank lines in classify1.bash.
Another option we will quickly show adds lines of context to the output of diff, so that you can review a few lines before and after the difference. This option (-CN, where N is the number of lines of context you want), also changes the output format:
Personally, I think this output format is more complicated, but here we go:
The last option -s (that's lower-case s) has diff report files that are identical. Normally, running diff on two identical files produces no output. If you add the -s option, however, diff tells you the files are identical:
$
diff x.html x.html
$
$ diff -s x.html x.html
Files x.html and x.html are
identical
$
Although getting used to even the basic output format of diff can be confusing, it is an extremely useful tool and is essential for keeping track of such things as changes to configuration files or differing configurations between releases of some package.
Using diff on a pair of directories
Running diff on a pair of directories reports the following
No information is output about a file that is the same in both directories.
The comm utility
comm [option] file1 file2
comm compares sorted files line by line, outputting three columns of information
The options -1 -2 and -3 can be used to suppress the corresponding column. Thus,
comm -12 file1 file2
outputs lines that are common to both file1 and file2 (i.e.
columns 1 and 2 are suppressed, leaving only column 3)
Remember, file1 and file2 must be sorted.
comm is very useful when comparing two directories to find missing or common files. For example, consider the ls listing of two similar directories:
Obviously, these two directories have some common files. If you produce a listing of each directory
then run comm on the listing, you can see the comparison.
The directories have the same files except for one, which is only in the directory on the left. (It is not obvious from this listing, but column 2 is missing, which indicates that the directory on the right does not contain any files that the directory on the left does not contain.)
(Note that in this case, it would've probably been easier to run diff on the two listings.)
If you want to list only the common files, use comm -12
$
comm -12 a.ls a.sv.ls
compression.html
diff.html
extracting_info.html
file_info.html
gathering_info.html
index.html
madewithNvu80x15clear.png
template.html
transferring_files.html
$
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |