sections in this module City College of San Francisco - CS160B
Unix/Linux Shell Scripting
Module: Scripting Basics2
module list

Quoting

Command-line processing

Before we go into a discussion of quoting, we must talk a bit about how the shell processes the command line. This process is quite complex, so we will simplify it, which, unfortunately involves a bit of hand-waving:

every character the the shell processes may be classified as a delimiter or as a metacharacter, both, or neither. A metacharacter has special meaning to the shell. A delimiter marks the end of one thing and the start of another. 

  1. Processing of commands begins by breaking the command into tokens using a fixed set of delimiters: space, tab, newline, semi-colon, parenthesis, < > pipe and &.

  2. Tokens are examined for quoting. Tokens are merged as indicated by quoting into larger tokens. Quoted tokens skip part of the remainder of the processing. Aliases and compound command keywords (like if) are handled specially. Likewise, shell directives (like standard output redirection) and the tokens that accompany them are merged (or at least this is the way we will think of it)

  3. brace and tilde expansion are performed

  4. variable, command, and arithmetic substitution are performed. 

  5. The results of expansions are re-tokenized using the delimiters in $IFS (by default, space, tab and newline)

  6. Tokens that are shell directives are run, setting up output redirection and possibly backgrounding

  7. The first remaining token is used as the name (or path) of the command, and the command is found if possible.

  8. The command is started, giving it the list of tokens

Let's go through a simple example:

FILE1="my file"

cat "$FILE1" > cat.out

cat
"$FILE1"
>
cat.out
cat
"$FILE1"
> cat.out
cat
my file

Notice that the result would be exactly the same if the command was rewritten in either of these two forms:

cat > cat.out "$FILE1" 

> cat.out cat "$FILE1"

How would this sequence be different if "$FILE1" had not been quoted? The only difference would be that the middle token would be re-split after the variable was substituted, as there were no quotes around it to "hold it together" or "protect it". In this case, the final command after evaluation would be

cat
my
file
> cat.out

which is certainly a valid command, but not what the user intended. (cat will attempt to concatenate two files into cat.out)

Quoting

Rather than go into an explanation of quoting rules, which your book does, we will discuss common situations to using quoting rules and which types of quotes should be used:

As you could see from the treatment of the space in my file above, quotes hide characters that otherwise might be interpreted. When you have control over the characters being exposed to the shell, it is fairly obvious when quoting should be used. For example, the following command:

echo Enter name (or id):

has an obvious problem: parenthesis are special characters to the shell. If this command is executed on bash, we get a syntax error

$ echo Enter name (or id):
bash: syntax error near unexpected token `('

Most of you learned early that messages output using echo should be quoted, so this may not have been a surprise. In fact, that's a good start for a first rule

you should always enclose any data given to echo in some kind of quotes

In fact, most of you can determine just by looking at a string of text whether it should be quoted. However, if some text is contained in a variable, you don't necessarily know what is in it. Consider the example of a variable that contains the path to a file. You execute the command

cp $FILE1 /tmp

This command may succeed every time you run your program. Of course, when you turn it in and I run it, I get a failure:

$ cp $FILE1 /tmp
cp: cannot stat `my': No such file or directory
cp: cannot stat `file': No such file or directory

This was using a filename that contained a special character - in this case a space. Because Unix often shares data with other systems, and other systems allow certain characters (like whitespace) in filenames that cause problems on Unix, we come to our second rule:

Ensure every variable is expanded within double-quotes.

There are exceptions to these rule, as we will see later, but it is correct in the great majority of cases. If we had used quotes in the example above, the file my file would have been correctly copied to /tmp.

From time to time, using quotes gets in your way. You can control this with a little thought. Suppose you wanted to output the following messages

can't format device xxx

you earned $xxx this month

where the string xxx is the contents of a variable VAR. The first one is easy: since you must use double-quotes around the message in order to expand $VAR, the single-quote doesn't get in the way:

$ VAR=/dev/ft0
$ echo "can't format device $VAR"
can't format device /dev/ft0

The second one is a little harder. You want to prefix the substituted value of $VAR with a dollar sign. If you try it the simple way, you get a surprise:

$ VAR=400.00
$ echo "you earned $$VAR this month"
you earned 3595VAR this month
$

Can you explain this? You might even try a little harder:

$ echo "you earned $${VAR} this month"
you earned 3595{VAR} this month
$

The problem is that the first $, which you want to be literal, is being interpreted as the $ of a substitution, and the name of the variable to substitute is then $, which is your current process id. The solution of course is to use a backslash before the first $, as you want it to be a literal $ rather than to start a substitution.

$ echo "you earned \$$VAR this month"
you earned $400.00 this month
$

You could alternately do it like this, but most would it consider it uglier:

$ echo 'you earned $'"$VAR this month"

but an interesting thing to learn from this example is that quotes are not delimiters. This means that a quote character is not used for the shell to break the input line into tokens the same way that a space is.

Tricks

If you understand quoting, you can use quoting rules to simplify your work. For example, suppose you are processing the output of a unix command that has an unfortunate output format. Below is a sample line from the output of the ps command on another system and on linux.

linux:
29839 ?        00:00:00 sshd


another system:
 26042 ?         0:07 sshd

If your job is to extract the process id from this line of text and you want your code to work on both platforms, you have a problem! Process ids on both systems have a maximum of five digits and are right-justified in the output of ps. Unfortunately, on this other system, they are right-justified in a six-character field and on linux they are right-justified in a five character field. 

You can use the shell to remove whitespace for you. If you place this line of text in a variable PIDLINE using command substitution and then echo $PIDLINE (without quotes!) you get

linux:
29839 ? 00:00:00 sshd

another system:
26042 ? 0:07 sshd

Now the process id can be extracted using simple tools. (Note: you must be careful here as ? will be interpreted as a wildcard. We will learn how to turn off wildcard expansion later.)

Preview question: Command substitution can be done two ways: using the old backquotes or the new POSIX syntax. Are there any differences?

Prev This page was made entirely with free software on linux:  
the Mozilla Project
and Openoffice.org    
Next

Copyright 2012 Greg Boyd - All Rights Reserved.