Quoting

City College of San Francisco - CS160B
Unix/Linux Shell Scripting
Module: Scripting Basics2

Quoting

Command-line processing

Before we go into a discussion of quoting, we must talk a bit about how the shell processes the command line. This process is quite complex, so we will simplify it, which, unfortunately involves a bit of hand-waving:

every character the the shell processes may be classified as a delimiter or as a metacharacter, both, or neither. A metacharacter has special meaning to the shell. A delimiter marks the end of one thing and the start of another.

Processing of commands begins by breaking the command into tokens using a fixed set of delimiters: space, tab, newline, semi-colon, parenthesis, < > pipe and &.
Tokens are examined for quoting. Tokens are merged as indicated by quoting into larger tokens. Quoted tokens skip part of the remainder of the processing. Aliases and compound command keywords (like if) are handled specially. Likewise, shell directives (like standard output redirection) and the tokens that accompany them are merged (or at least this is the way we will think of it)
brace and tilde expansion are performed
variable, command, and arithmetic substitution are performed.
The results of expansions are re-tokenized using the delimiters in $IFS (by default, space, tab and newline)
Tokens that are shell directives are run, setting up output redirection and possibly backgrounding
The first remaining token is used as the name (or path) of the command, and the command is found if possible.
The command is started, giving it the list of tokens

Let's go through a simple example:

FILE1="my file"

cat "$FILE1" > cat.out

(step one ) The following tokens are found using the standard delimiters

cat

          "$FILE1"
        

>

          cat.out
        

(step two) Examination of the second token reveals double-quote. This means that this token will skip steps 3,5, and 6.
Examination of the third and fourth token reveals an output redirection operator. These two tokens are merged. The result is now

cat

          "$FILE1"
        

          > cat.out
        

(step four) The variable is substituted. Since it is surrounded by double-quotes, it is not scanned and retokenized, so step five is skipped.
(step six) The output redirection is processed, opening standard output to cat.out. This token is then removed. The result is now

cat

          my file
        

(step seven) Finally, the cat program is found using the $PATH variable. It is started and given the single argument my file. cat dutifully reads the file and writes it to standard output, not knowing that standard output has been redirected to the file cat.out

Notice that the result would be exactly the same if the command was rewritten in either of these two forms:

cat > cat.out "$FILE1"

> cat.out cat "$FILE1"

How would this sequence be different if "$FILE1" had not been quoted? The only difference would be that the middle token would be re-split after the variable was substituted, as there were no quotes around it to "hold it together" or "protect it". In this case, the final command after evaluation would be

cat

my

          file
        

          > cat.out
        

which is certainly a valid command, but not what the user intended. (cat will attempt to concatenate two files into cat.out)

Quoting

Rather than go into an explanation of quoting rules, which your book does, we will discuss common situations to using quoting rules and which types of quotes should be used:

As you could see from the treatment of the space in my file above, quotes hide characters that otherwise might be interpreted. When you have control over the characters being exposed to the shell, it is fairly obvious when quoting should be used. For example, the following command:

echo Enter name (or id):

has an obvious problem: parenthesis are special characters to the shell. If this command is executed on bash, we get a syntax error

$ echo Enter name (or id):
bash: syntax error near unexpected token `('
$

Most of you learned early that messages output using echo should be quoted, so this may not have been a surprise. In fact, that's a good start for a first rule

you should always enclose any data given to echo in some kind of quotes

In fact, most of you can determine just by looking at a string of text whether it should be quoted. However, if some text is contained in a variable, you don't necessarily know what is in it. Consider the example of a variable that contains the path to a file. You execute the command

cp $FILE1 /tmp

This command may succeed every time you run your program. Of course, when you turn it in and I run it, I get a failure:

$ cp $FILE1 /tmp
cp: cannot stat `my': No such file or directory
cp: cannot stat `file': No such file or directory
$

This was using a filename that contained a special character - in this case a space. Because Unix often shares data with other systems, and other systems allow certain characters (like whitespace) in filenames that cause problems on Unix, we come to our second rule:

Ensure every variable is expanded within double-quotes.

There are exceptions to these rule, as we will see later, but it is correct in the great majority of cases. If we had used quotes in the example above, the file my file would have been correctly copied to /tmp.

From time to time, using quotes gets in your way. You can control this with a little thought. Suppose you wanted to output the following messages

can't format device xxx

you earned $xxx this month

where the string xxx is the contents of a variable VAR. The first one is easy: since you must use double-quotes around the message in order to expand $VAR, the single-quote doesn't get in the way:

$ VAR=/dev/ft0
$ echo "can't format device $VAR"
can't format device /dev/ft0

The second one is a little harder. You want to prefix the substituted value of $VAR with a dollar sign. If you try it the simple way, you get a surprise:

$ VAR=400.00
$ echo "you earned $$VAR this month"
you earned 3595VAR this month
$

Can you explain this? You might even try a little harder:

$ echo "you earned $${VAR} this month"
you earned 3595{VAR} this month
$

The problem is that the first $, which you want to be literal, is being interpreted as the $ of a substitution, and the name of the variable to substitute is then $, which is your current process id. The solution of course is to use a backslash before the first $, as you want it to be a literal $ rather than to start a substitution.

$ echo "you earned \$$VAR this month"
you earned $400.00 this month
$

You could alternately do it like this, but most would it consider it uglier:

$ echo 'you earned $'"$VAR this month"

but an interesting thing to learn from this example is that quotes are not delimiters. This means that a quote character is not used for the shell to break the input line into tokens the same way that a space is.

Tricks

If you understand quoting, you can use quoting rules to simplify your work. For example, suppose you are processing the output of a unix command that has an unfortunate output format. Below is a sample line from the output of the ps command on another system and on linux.

linux:
29839 ? 00:00:00 sshd

another system:
26042 ? 0:07 sshd

If your job is to extract the process id from this line of text and you want your code to work on both platforms, you have a problem! Process ids on both systems have a maximum of five digits and are right-justified in the output of ps. Unfortunately, on this other system, they are right-justified in a six-character field and on linux they are right-justified in a five character field.

You can use the shell to remove whitespace for you. If you place this line of text in a variable PIDLINE using command substitution and then echo $PIDLINE (without quotes!) you get

linux:
29839 ? 00:00:00 sshd

another system:
26042 ? 0:07 sshd

Now the process id can be extracted using simple tools. (Note: you must be careful here as ? will be interpreted as a wildcard. We will learn how to turn off wildcard expansion later.)

Preview question: Command substitution can be done two ways: using the old backquotes or the new POSIX syntax. Are there any differences?

This page was made entirely with free software on linux:
the Mozilla Project and Openoffice.org