sections in this module | City
College of San Francisco - CS160B Unix/Linux Shell Scripting Module: Scripting Basics2 |
module list |
Command-line processing
Before we go into a discussion of quoting, we must talk a bit about how the shell processes the command line. This process is quite complex, so we will simplify it, which, unfortunately involves a bit of hand-waving:
every character the the shell processes may be classified as a delimiter or as a metacharacter, both, or neither. A metacharacter has special meaning to the shell. A delimiter marks the end of one thing and the start of another.
Processing of commands begins by breaking the command into tokens using a fixed set of delimiters: space, tab, newline, semi-colon, parenthesis, < > pipe and &.
Tokens are examined for quoting. Tokens are merged as indicated by quoting into larger tokens. Quoted tokens skip part of the remainder of the processing. Aliases and compound command keywords (like if) are handled specially. Likewise, shell directives (like standard output redirection) and the tokens that accompany them are merged (or at least this is the way we will think of it)
brace and tilde expansion are performed
variable, command, and arithmetic substitution are performed.
The results of expansions are re-tokenized using the delimiters in $IFS (by default, space, tab and newline)
Tokens that are shell directives are run, setting up output redirection and possibly backgrounding
The first remaining token is used as the name (or path) of the command, and the command is found if possible.
The command is started, giving it the list of tokens
Let's go through a simple example:
FILE1="my file"
cat "$FILE1" > cat.out
cat |
"$FILE1" |
> |
cat.out |
Examination of the third and fourth token reveals an output redirection operator. These two tokens are merged. The result is now
cat |
"$FILE1" |
> cat.out |
(step four) The variable is substituted. Since it is surrounded by double-quotes, it is not scanned and retokenized, so step five is skipped.
(step six) The output redirection is processed, opening standard output to cat.out. This token is then removed. The result is now
cat |
my file |
Notice that the result would be exactly the same if the command was rewritten in either of these two forms:
cat > cat.out "$FILE1"
> cat.out cat "$FILE1"
How would this sequence be different if "$FILE1" had not been quoted? The only difference would be that the middle token would be re-split after the variable was substituted, as there were no quotes around it to "hold it together" or "protect it". In this case, the final command after evaluation would be
cat |
my |
file |
> cat.out |
which is certainly a valid command, but not what the user intended. (cat will attempt to concatenate two files into cat.out)
Quoting
Rather than go into an explanation of quoting rules, which your book does, we will discuss common situations to using quoting rules and which types of quotes should be used:
As you could see from the treatment of the space in my file above, quotes hide characters that otherwise might be interpreted. When you have control over the characters being exposed to the shell, it is fairly obvious when quoting should be used. For example, the following command:
echo Enter name (or id):
has an obvious problem: parenthesis are special characters to the shell. If this command is executed on bash, we get a syntax error
$ echo Enter
name (or id):
bash: syntax error near unexpected token `('
$
Most of you learned early that messages output using echo should be quoted, so this may not have been a surprise. In fact, that's a good start for a first rule
you should always enclose any data given to echo in some kind of quotes |
In fact, most of you can determine just by looking at a string of text whether it should be quoted. However, if some text is contained in a variable, you don't necessarily know what is in it. Consider the example of a variable that contains the path to a file. You execute the command
cp $FILE1 /tmp
This command may succeed every time you run your program. Of course, when you turn it in and I run it, I get a failure:
$ cp $FILE1
/tmp
cp: cannot stat `my': No such file or directory
cp: cannot stat `file': No such file or directory
$
This was using a filename that contained a special character - in this case a space. Because Unix often shares data with other systems, and other systems allow certain characters (like whitespace) in filenames that cause problems on Unix, we come to our second rule:
Ensure every variable is expanded within double-quotes. |
There are exceptions to these rule, as we will see later, but it is correct in the great majority of cases. If we had used quotes in the example above, the file my file would have been correctly copied to /tmp.
From
time to time, using quotes gets in your way. You can control this
with
a little thought. Suppose you wanted to output the following
messages
you earned $xxx this month
where the string xxx is the contents of a variable VAR. The first one is easy: since you must use double-quotes around the message in order to expand $VAR, the single-quote doesn't get in the way:
$ VAR=/dev/ft0
$ echo "can't format device $VAR"
can't format device /dev/ft0
The second one is a little harder. You want to prefix the substituted value of $VAR with a dollar sign. If you try it the simple way, you get a surprise:
$
VAR=400.00
$ echo "you earned $$VAR
this month"
you earned 3595VAR this
month
$
Can you explain this? You might even try a little harder:
$
echo "you earned $${VAR} this month"
you earned 3595{VAR} this
month
$
The problem is that the first $, which you want to be literal, is being interpreted as the $ of a substitution, and the name of the variable to substitute is then $, which is your current process id. The solution of course is to use a backslash before the first $, as you want it to be a literal $ rather than to start a substitution.
$ echo "you
earned \$$VAR this month"
you earned $400.00 this month
$
You could alternately do it like this, but most would it consider it uglier:
$ echo 'you earned $'"$VAR this month"
but an interesting thing to learn from this example is that quotes are not delimiters. This means that a quote character is not used for the shell to break the input line into tokens the same way that a space is.
Tricks
If you understand quoting, you can use quoting rules to simplify your work. For example, suppose you are processing the output of a unix command that has an unfortunate output format. Below is a sample line from the output of the ps command on another system and on linux.
linux:If your job is to extract the process id from this line of text and you want your code to work on both platforms, you have a problem! Process ids on both systems have a maximum of five digits and are right-justified in the output of ps. Unfortunately, on this other system, they are right-justified in a six-character field and on linux they are right-justified in a five character field.
You can use the shell to remove whitespace for you. If you place this line of text in a variable PIDLINE using command substitution and then echo $PIDLINE (without quotes!) you get
linux:Now the process id can be extracted using simple tools. (Note: you must be careful here as ? will be interpreted as a wildcard. We will learn how to turn off wildcard expansion later.)
Preview question: Command substitution can be done two ways: using the old backquotes or the new POSIX syntax. Are there any differences? |
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |