sections in this module City College of San Francisco - CS160B
Linux Shell Scripting
Module: Loops1
module list

for loop

for loops rely on generating a list of items to work on. Before we discuss for loops, then, we must define a term that has appeared before, but is crucial to understanding for loop - a list.

Lists

When these notes refer to a list, they mean a sequence of text words. You can think of a list as what the shell creates from the Linux command-line.

Example: if the command-line arguments were "hello there" shell script, the list created from it has three "words": hello there, shell, and script.

arguments: "hello there" shell script list (3 items): hello there, shell, script
A listing of the current directory shows:
$ ls -1
file1
hello
this file
to day
If this list is created using *, the list items
correspond exactly to the filenames
(four items): file1, hello, this file, and to day

If the list is created using command-substitution,
we may have some problems. This is discusssed soon.

The for loop

The for loop is the loop of choice if you need to process a list of items. The most common use of a for loop is to process a set of files:

for file in *; do
do something to "$file"
done

In this loop, the next item in the list is assigned to the variable file (the iteration variable), each time through the loop. In this example, the list is generated by the wildcard *, which expands to every non-hidden name in the current directory. The loop stops when the list is exhausted, thus the for loop is the loop of choice if you can list the things you want to process, as there is no danger of the loop being infinite. There is one caveat: the list cannot be too big. The "too big" limitation is because there is a maximum length for a Linux command-line which must fit the for loop and its body. This limit is system-dependent. A conservative rule of thumb is if you can fit the list in a standard-sized terminal window, you should be safe. We will discuss a more general loop for processing unlimited amounts of data in the next module.

Exit condition

The for loop stops when the list is exhausted. You can obtain further control of the loop by using break and continue:

Example:

Copy all regular files from the current directory to the directory $TGT

for file in *; do
[ ! -f "$file" ] && continue
if cp "$file" "$TGT" 2>/dev/null; then
echo "'$file' copied"
fi
done

List generation

The most important thing about the for loop is the generation of the list. Going over a few suggestions can ease this process and make your for loops easier to write. Remember, the list must be tokenized (i.e., divided into list items), so you should not enclose it in quotes:

for item in a b c; do      processes three items: a, b, and c

for item in "a b c"; do   processes one item: a b c

The list can, of course, just be text, like the examples above. There are few rare cases when this is useful, but it happens:

for item in item1 item2 item3 .... ; do

here item gets set successively to item1, item2, item3, etc.

The list is usually generated by command- or variable substitution. The list is then broken apart using the delimiters in $IFS. For example, if the third field of a colon-delimited file $datafile contains an integer and you want to add together all the integers in this field, you could command-substitute the  command cut -d: -f3 "$datafile" to generate the list:

for num in $(cut -d: -f3 "$datafile")

The list here contains the third field of each line in $datafile. Each field is followed by a newline, which is an IFS delimiter. 

Since the result of command- or variable- substitution is broken apart on the delimiters in $IFS, you cannot easily substitute names that contain embedded spaces. This creates issues with filenames imported from other systems. One easy solution to this is to rely on wildcards to generate filenames. Wildcards are expanded after substituted text is retokenized. When the wildcard is expanded, the boundaries between the generated filenames are preserved, rather than being subject to retokenization. Let's look at a simple example of this to see what I mean.

The current directory contains:

$ ls -1
file2
file3
the other file
$

We will now process the list when it is generated using command-substitution:

$ for file in $(ls); do
> echo "$file"
> done
file2
file3
the
other
file
$

Now we will process the list when it is generated by a wildcard:

$ for file in *; do
> echo "$file"
> done
file2
file3
the other file
$

When the list of files was generated using command-substitution, the resulting text is broken apart using the delimiters in $IFS. When it is generated using the wildcard, the boundaries between the filesnames are preserved. Note that the quotes around the iteration variable "$file" in the echo command are very important.

I know this is confusing. So just remember: use a wildcard to generate a list of files.

(Note: If  a command- or variable- substitution is used to generate the list, the list is affected by changes in the $IFS characters. In the example above, if IFS had contained a newline only, the for loop using * and the for loop using $(ls) would have functioned the same. (Yes, I know, now you're really confused. If this is the case, just use wildcards when you can.))

Processing command-line arguments

for loop can easily be used to process command-line arguments, since the default list is the command-line arguments. This means you do not have to specify a list.

Suppose our shell script myss was invoked like this:

myss 'these are the' command-line arguments

Inside myss, of course, there are three arguments. The most obvious way to write a for loop would be

for arg in $*; do
echo "$arg"
done

but since $* is a variable that contains all the command-line arguments as a string, the result of the substitution is broken up on spaces giving the result

these
are
the
command-line
arguments

The simplest way around this is to just delete the list:

for arg; do
echo "$arg"
done

The result now is

these are the
command-line
arguments

You can do this yourself using the other positional parameter $@.

using $@

$@ is similar to $*. It is all the command-line arguments in one variable. However, $* is a string (or a list) that contains the text of each command-line argument separated from the next by a space. $@ is an array of the command-line arguments that preserves the actual boundaries between arguments. The trick is how to preserve this information when you substitute its value.

By default, substituting $@ is exactly the same as substituting $* - the result of the substitution is broken up using $IFS delimiters. Thus, the two loops 

for arg in $*; do

and

for arg in $@; do

produce exactly the same results. If you want to preserve the boundaries between the command-line arguments (in other words, its quality as an array), you need to use a counter-intuitive trick: put double-quotes around $@

for arg in $@; do
echo "$arg"
done

generates

these
are
the
command-line
arguments

while 

for arg in "$@"; do
echo "$arg"
done

generates

these are the
command-line
arguments

I know this is counter-intuitive, but think of $@ as being

$1 $2 $3 $4 $5 ...

and "$@" as being

"$1" "$2" "$3" "$4" "$5" ...

$@ is actually an array. The book covers arrays in general, but they are not very useful in the shell and are really ugly. Since you can gain access to the array $@ easily using set --, you can use it as the only array you really need. Returning to our example

$ ls -1
file2
file3
the other file
$

The command 

set -- *

fills the positional parameters correctly:

$ echo "$3"
the other file
$

Iterating using an integer sequence

In some cases, you really do want to iterate a for loop through an integer sequence. As long as you really have a reason to do this, and are not just a Java or C++ programmer trying to get the shell for loop to work like a for loop in those languages, this is fine. There is a C-style for loop available in the shell. I have never used it, so you'll have to research it on your own.

You can, however, generate a list of integers to iterate on and use a standard for loop to process the list. You can so this using the seq command (see the man page) or just using {x..y} where x and y are integers:

$ for s in {1..5}; do
> echo "$s"
> done
1
2
3
4
5
$

The sequence does not have to increase, and you can specify the interval between each list item:

$ for s in {10..1..-2}; do
> echo "$s"
> done
10
8
6
4
2
$


Prev This page was made entirely with free software on Linux:  
Kompozer
and LibreOffice
Next

Copyright 2016 Greg Boyd - All Rights Reserved.

Document made with Kompozer