The while read loop
In introductory Unix, we experimented with standard input and
standard output. Probably, one of the experiments you did was with
the
cat command:
cat < file1
> file2
Here,
cat reads each line
from standard input and writes the line to standard
output. It continues until the end-of-file condition is reached on
file1, then it stops.
In shell programming, we can read a line
from standard input using the read command. The read command succeeds
so long as a line can be read. If the input reaches end-of-file,
however, the next read
fails. This is exactly what happens in the cat
command.
This 'failure' status can be tested in a while statement like
this:
while read line; do
echo "standard input has not reached end-of-file"
done
If
you type this at the terminal, it is kind of silly, as the read command
will read a line from the keyboard. Suppose we want to attach this
to a
file. One way to do this is:
while read
line < file1; do
echo "'file1' has not reached
end-of-file"
done
This
is perfectly legal. Here, however, file1 is re-opened each time the
read command is
encountered. We want to attach file1 to the loop so
that it is opened when the loop starts, and each successive
iteration
reads another line from the file. We could do this
cat file1 | while read line; do
...
but
if you recall, there are some issues with piping to loops and read
commands, since the loop will now be executed in a child process.
So
long as you do not set any variables in the loop that you need
after
the loop has completed, this is fine. Alternately, we could
redirect
standard input to the loop:
while read line; do
...
done < file1
This
looks strange, but it is consistent with other unix commands:
redirection normally appears at the end of the command, and the
end of
the while command is
done.
I can't emphasize the importance of
this, folks: this allows us to process anything we want, so long
as we
can create a file of the things to process, one-per-line.
Processing files with while read
We said previously that a for
loop is useful for processing files, but that the list must be
created
with a wildcard, and may not be "too large". We can also process
the
files in a directory using a while loop.
Example: Ensure that each text file in the directory $dir has the extension .txt
We will iterate over the files in the directory. For each file
that is a text file, we'll see if the filename ends in .txt (using grep to match the
pattern), and, if it doesn't, rename it.
# note that ls outputs names one-per-line if it is
writing to a file
ls "$dir" > $$tmp
while read file ; do
# remember, the filename does not have the
directory component!
# we must add it
if file "$dir/$file" |
grep ':.*text' > /dev/null; then
if ! echo "$file" | grep '\.txt$'; then
mv "$dir/$file" "$dir/$file.txt"
fi
fi
done < $$tmp
rm -f $$tmp
Note
that we could have used a pipe here, as we do not set any
variables in
our loop that are needed when it exits. The loop would then look
like:
# note that ls outputs names one-per-line if it is
writing to a file
ls "$dir" | while
read file ; do
# remember, the filename does not have the
directory component!
# we must add it
if file "$dir/$file" |
grep ':.*text' > /dev/null; then
if ! echo "$file" | grep '\.txt$'; then
mv "$dir/$file" "$dir/$file.txt"
fi
fi
done
There are two problems with our code:
- we should ensure that the directory is writable before
attempting the loop. If it isn't writable, we cannot rename any
files!
- we did not check for a name collision before performing the mv. Thus, if we had two
files, file1 and file1.txt when the loop
starts, we just deleted file1.txt
and renamed file1
to file1.txt!
How would you fix these problems? (see the Examples for this module for
the solution.)
You may say: it was easier with a while
loop. That is correct. But, suppose we wanted to do a job like
this
recursively to, say, all the files beneath our home directory. We
could
modify this loop easily for the task. You should not use a for loop for this,
because the find
command may output "too much" data for a for loop:
find "$dir" -type f > $$tmp
# then remove the value for
dir, as find outputs paths with the directory prefix
dir=
# the loop is the same as
that above!
Processing command-line arguments
The most general loop for processing command-line arguments is
the while loop. Most
shell scripts do not have complex command-line arguments, and, if
they do, they use the facility called getopts.
getopts is covered pretty well in your book, so we won't cover it
in the notes.
We will, instead, go through a few examples of processing complex
command-line arguments by hand. First, let's discuss the problems:
- options may be in any order, but they must precede filenames.
- some options may have an argument. Examples you may be
familiar with are the -o
option to sort,
the -t option to sort, and the -f option to cut.
- some options are mutually exclusive. For example, the -f and -c options to cut are mutually
exclusive. This means if you have one, you can't have the other.
Our example will be for the fictitious command moo. moo has the following
syntax
- it can have any combination of the options -a, -b, and -c
- it must have one of the options -x or -y, but not both.
- it can have the option -o,
which must be followed by a filename. The filename may or may
not exist.
- the options must be followed by at least one input filename.
Here is how we will handle these options:
- each of -a, -b, -c, -x,
and -y will have a
flag variable (aflag,
etc), which we will set to the string "true" if the option was seen.
- if we encounter -x
or -y,
we will check if the other one was seen already. If so, we will
output
a warning, then turn off the previous one, and turn on the new
one.
(you could, alternately, only honor the first one)
- if we encounter -o
the argument following it will be placed in the variable ofile.
- redundant options are ignored.
- after the options are processed, we will check that one of -x and -y was seen. $1
will point to the first filename in the input files. (Note that
we
cannot run through the input files, since doing so would remove
them
from the arguments, and we would need an array to hold them for
later.)
Here is the code for moo's
argument handling. It is in this module's directory on hills.
#!/bin/bash
#
#
# moo - sample of
command-line argument handling
#
# this fictitious program
accepts the following arguments
# options a,b,c, which can
appear in any combination
# options x and y that are
mutually exclusive. One of them is required.
# option o has a following
argument
# input files must follow
the options.
#
error () {
# error:
output an error message and return
# the
error messages contain the program name
# and are
written to standard error
#
modifies the global variable errors
local
prog=$(basename $0)
echo -e
"$prog:ERROR: $*">&2
errors=true
}
fatal() {
# fatal:
output an error message and then exit
# the
error() function is used.
error
"$*"
exit 1
}
#
# your program goes here
#
while [ $# -gt 0 ]; do
case "$1"
in
-a) aflag=true;;
-b) bflag=true;;
-c) cflag=true;;
-x) xflag=true
if [ "$yflag" = true ]; then
error
"-x conflicts with -y. -y ignored"
yflag=
fi;;
-y) yflag=true
if [ "$xflag" = true ]; then
error
"-y conflicts with -x. -x ignored"
xflag=
fi;;
-o) if [ $# -eq 1 ]; then
fatal "-o requires an argument"
fi
shift
ofile="$1";;
-*) error "illegal option '$1'" ;;
*) break;;
esac
shift
done
if [ "$xflag" != true -a
"$yflag" != true ]; then
error
"one of -x or -y must be specified"
fi
echo "aflag=$aflag"
echo "bflag=$bflag"
echo "cflag=$cflag"
echo "xflag=$xflag"
echo "yflag=$yflag"
echo "ofile=$ofile"
echo "filelist=$*"
echo "errors=$errors"
if [ "$errors" = true
];then
exit 1
fi
exit 0
| Preview question:
In the next module we will discuss aliases, shell options,
and
substrings. Preview the information on aliases by looking at
the bash man
page and searching for the string Alias. (You can do this by typing /Alias followed by
the ENTER key after you type man bash) |
| Prev |
This page was made entirely
with free software on linux:
Kompozer and Openoffice.org
|
Next |
Copyright 2010 Greg Boyd - All Rights Reserved.
