The while read loop

City College of San Francisco - CS160B
Linux Shell Scripting
Module: Loops2

The while read loop

In introductory Linux, we experimented with standard input and standard output. Probably, one of the experiments you did was with the cat command:

cat < file1 > file2

Here, cat reads each line from standard input and writes the line to standard output. It continues until the end-of-file condition is reached on file1, then it stops.

In shell programming, we can read a line from standard input using the read command. The read command succeeds so long as a line can be read. If the input reaches end-of-file, however, the next read fails. This is exactly what happens in the cat command.

This 'failure' status can be tested in a while statement like this:

while read line; do

echo "standard input has not reached end-of-file"

done

If you type this at the terminal, it is kind of silly, as the read command will read a line from the keyboard. Suppose we want to attach this to a file. One way to do this is:

while read
      line < file1; do
echo "'file1' has not reached
        end-of-file"

      

      done

This is perfectly legal. Here, however, file1 is re-opened each time the read command is encountered. We want to attach file1 to the loop so that it is opened when the loop starts, and each successive iteration reads another line from the file. We could do this

cat file1 | while read line; do
...

but if you recall, there are some issues with piping to loops and read commands, since the loop will now be executed in a child process. So long as you do not set any variables in the loop that you need after the loop has completed, this is fine. Alternately, we could redirect standard input to the loop:

while read line; do
...
done < file1

This looks strange, but it is consistent with other unix commands: redirection normally appears at the end of the command, and the end of the while command is done.

I can't emphasize the importance of this, folks: this allows us to process anything we want, so long as we can create a file of the things to process, one-per-line.

Processing files with while read

We said previously that a for loop is useful for processing files, but that the list must be created with a wildcard, and may not be "too large". We can also process the files in a directory using a while loop.

Example: Ensure that each text file in the directory $dir has the extension .txt

We will iterate over the files in the directory. For each file that is a text file, we'll see if the filename ends in .txt, and, if it doesn't, rename it. We will use the file utility to tell us whether the file is a text file or not. file outputs the word text as part of the description of any file that is text, but some things (such as bash shell scripts) are identified as text and should not be renamed. We will discriminate between 'pure text' and 'other text' by saying that if the description's last word is text, the file should have the extension .txt

# note that ls outputs names one-per-line if it is writing to a file
ls "$dir" > $$tmp
while read file ; do

# remember, the filename does not have the directory component!
# we must add it
if file "$dir/$file" | grep ':.*text$' > /dev/null; then

if ! [[ "$file" =~ .txt$ ]]; then

mv "$dir/$file" "$dir/$file.txt"

done < $$tmp
rm -f $$tmp

Note that we could have used a pipe here, as we do not set any variables in our loop that are needed when it exits. The loop would then look like:

# note that ls outputs names one-per-line if it is writing to a file
ls "$dir" | while read file ; do

# remember, the filename does not have the directory component!
# we must add it
if file "$dir/$file" | grep ':.*text' > /dev/null; then

if ! [[ "$file" =~ .txt$ ]]; then

mv "$dir/$file" "$dir/$file.txt"

done

There are two problems with our code:

we should ensure that the directory is writable before attempting the loop. If it isn't writable, we cannot rename any files!
we did not check for a name collision before performing the mv. Thus, if we had two files, file1 and file1.txt when the loop starts, we just deleted file1.txt and renamed file1 to file1.txt!

How would you fix these problems? (see the Examples for this module for the solution.)

You may say: it was easier with a for loop. That is correct. But, suppose we wanted to do a job like this recursively to, say, all the files beneath our home directory. We could modify this loop easily for the task. You should not use a for loop for this for two reasons: 1) since you cannot use a single wildcard to generate the list, you command-substitute the result of the find command as the list of a for loop. Then you lose the difference between embedded whitespace and whitespace that separates the names. 2) because the find command may output "too much" data for a for loop.

find "$dir" -type f > $$tmp
# then remove the value for dir, as find outputs paths with the directory prefix
dir=
# the loop is the same as that above!

An alternate method

Recently (or as recently as bash gets), an extra syntax has been added to bash that allows you to read data into a loop without using a pipe. This newer syntax allows you to use command-substitution to generate a fake file (akin to a here document) that is placed on standard input for the loop to read. This avoids the issue of the pipe in our loop hiding the variables. The loop above is rewritten to use this new syntax, which is a triple input-redirect operator <<<

while read file ; do

# remember, the filename does not have the directory component!
# we must add it
if file "$dir/$file" | grep ':.*text' > /dev/null; then

if ! [[ "$file" =~ .txt$ ]]; then

mv "$dir/$file" "$dir/$file.txt"

done <<< "$(ls "$dir")"

Note the quotes around the command-substitution, which seems counter-intuitive. This is to preserve the line-breaks in ls' output and avoid the list "appearing" as if it was on a single line.

Processing command-line arguments

The most general loop for processing command-line arguments is the while loop. Most shell scripts do not have complex command-line arguments, and, if they do, they use the facility called getopts. getopts is covered pretty well in your book, so we won't cover it in the notes. We will, instead, go through a few examples of processing complex command-line arguments by hand. First, let's discuss the problems:

options may be in any order, but they must precede filenames.
some options may have an argument. Examples you may be familiar with are the -o option to sort, the -t option to sort, and the -f option to cut.
some options are mutually exclusive. For example, the -f and -c options to cut are mutually exclusive. This means if you have one, you can't have the other.

Our example will be for the fictitious command moo. moo has the following syntax

it can have any combination of the options -a, -b, and -c
it must have one of the options -x or -y, but not both.
it can have the option -o, which must be followed by a filename. The filename may or may not exist.
the options must be followed by at least one input filename.

Here is how we will handle these options:

each of -a, -b, -c, -x, and -y will have a flag variable (aflag, etc), which we will set to the string "true" if the option was seen.
if we encounter -x or -y, we will check if the other one was seen already. If so, we will output a warning, then turn off the previous one, and turn on the new one. (you could, alternately, only honor the first one)
if we encounter -o the argument following it will be placed in the variable ofile.
redundant options are ignored.
after the options are processed, we will check that one of -x and -y was seen. $1 will point to the first filename in the input files. (Note that we cannot run through the input files, since doing so would remove them from the arguments, and we would need an array to hold them for later.)

Here is the code for moo's argument handling. It is in this module's directory on hills.

#!/bin/bash
#
#
# moo - sample of command-line argument handling
#
# this fictitious program accepts the following arguments
# options a,b,c, which can appear in any combination
# options x and y that are mutually exclusive. One of them is required.
# option o has a following argument
# input files must follow the options.
#
error () {
    # error: output an error message and return
    # the error messages contain the program name
    # and are written to standard error
    # modifies the global variable errors
    local prog=$(basename $0)
    echo -e "$prog:ERROR: $*">&2
    errors=true
}

fatal() {
    # fatal: output an error message and then exit
    # the error() function is used.
    error "$*"
    exit 1
}

#
# your program goes here
#
while [ $# -gt 0 ]; do
    case "$1" in
    -a)     aflag=true;;
    -b)     bflag=true;;
    -c)     cflag=true;;
    -x)     xflag=true
            if [ "$yflag" = true ]; then
                error "-x conflicts with -y. -y ignored"
                yflag=
            fi;;
    -y)     yflag=true
            if [ "$xflag" = true ]; then
                error "-y conflicts with -x. -x ignored"
                xflag=
            fi;;
    -o)     if [ $# -eq 1 ]; then
                fatal "-o requires an argument"
            fi
            shift
            ofile="$1";;
    -*)     error "illegal option '$1'" ;;
    *)      break;;
    esac
    shift
done
if [ "$xflag" != true -a "$yflag" != true ]; then
    error "one of -x or -y must be specified"
fi
echo "aflag=$aflag"
echo "bflag=$bflag"
echo "cflag=$cflag"
echo "xflag=$xflag"
echo "yflag=$yflag"
echo "ofile=$ofile"
echo "filelist=$*"
echo "errors=$errors"
if [ "$errors" = true ];then
    exit 1
fi
exit 0

This page was made entirely with free software on Linux:
Kompozer and LibreOffice