substring operators

sections in this module

City College of San Francisco - CS160B
Unix/Linux Shell Scripting
Module: Advanced Topics

module list

Substrings

In previous sections, we learned about some operators on the value of a variable, specifically

${VAR:?message}

${VAR:-value}

and

${#VAR}

In this section we will add new operators - the substring operators. These operators output the variable's value after extracting, deleting, or modifying part of it. In all cases, the value of the variable is unchanged unless you reassign to it. Also, in all cases, the result is substituted on the command-line, so you must do something with it, either use it in another command or assign it to a variable.

Extract characters by offset and length

The simplest substring operator allows you to output certain characters from the value of a variable. The characters output are indicated by the starting character and a length:

${VAR:offset:length}

As in all of these variables, VAR is the name of the variable. Here, offset and length are integers. offset indicates the first character to output, where characters are numbered starting at 0. length indicates how many characters to output.

$ numbers=0123456789
$ echo ${numbers:2:4}
2345

If length is missing, the remainder of the value is output:

$ echo ${numbers:2}
23456789

The offset and/or length can be variables themselves:

$ off=2
$ echo ${numbers:$off:4}
2345

VAR, however, must be a variable name. You cannot use the output of command-substitution, for example, unless you first save the result in a variable. For example

$ echo ${$(ls -ld $file):0:1}
bash: ${$(ls -ld $file):0:1}: bad substitution

does not work, but

$ lsout=$(ls -ld $file)
$ echo ${lsout:0:1}
-

does.

Substitute text in variable's value

The substring operator

${VAR/pat/str}

can be used to substitute str for the first instance of pat in $line. Here, pat is a wildcard pattern, and str is a string. Let's look at a few examples:

$ line=abcdefdefabc
$ echo ${line/abcd/X}
Xefdefabc

Here, of course, pat is just a string. It could just as easily be a wildcard:

$ echo ${line/*d/X}
Xefabc

Note that there were two choices to match the pattern *d: abcd and abcdefd. The substring operator, like grep, chooses the longest match. Another way to remember this is patterns are greedy.

You can modify this simple behavior of substituting the first match by a few changes to your substring operators:

${VAR//pat/str}

changes all occurrances of pat to str.

${VAR/#pat/str}

still substituted str for pat, but pat must be anchored at the left end of the string. Similarly, pat in

${VAR/%pat/str}

must be anchored at the right end of the string. Let's look at a few examples:

$ echo ${line/??c/}
defdefabc

This deletes the first instance of ??c in line. It is the same as

$ echo ${line/#??c/}
defdefabc

but

$ echo ${line/%??c/}
abcdefdef

only works if the last character in the value is c.

The hardest thing about these operators is remembering which of # and % anchors on which end. You can easily remember them as the % symbol always comes to the right of a number. Similarly, # always comes to the left of a comment! (These characters and their meaning will be more important in other substring operators.)

Delete pattern operators

The last set of substring operators take a little getting used to. These operators delete part of a variable's value using a pattern, either on the left or on the right end, then substitute what remains. This result is often compared against the original value to see if the deletion worked, or, in other words, to see if the variable contained the pattern. Although these operators are very useful, the convoluted logic is difficult at first. Let's look at an example:

Suppose you have a path in $file and want to know if the path ends in .pdf Here is what you do:

try to delete .pdf from the right end of the variable's value
compare it to original value
if the comparison is not equal, the deletion succeeded and $file ended in .pdf

variable $file	after deleting .pdf on right	is it different than the original?
ends in .pdf	value changed	yes
doesn't end in .pdf	value unchanged	no

In words:

if [ $file after trying to delete .pdf != $file ]; then

echo $file ended in .pdf

These deletion operators have the following form

${VARoppat}

where VAR is the variable's name, op is the operator (see below), and pat is a wildcard pattern that matches the text to be deleted. If pat does not match, nothing is deleted, and the original variable's value is substituted.

operator	meaning
#	delete shortest match anchored on left
##	delete longest match anchored on left
%	delete shortest match anchored on right
%%	delete longest match anchored on right

Before we take up our .pdf problem, let's use our new operators on our variable VAR:

$ var=abcdefabcdef
$ echo ${var#*c} # delete shortest match of *c anchored on left
defabcdef
$
$ echo ${var##*c} #delete longest match of *c anchored on left
def
$

Since the variable's value does not end in c, it is silly to use the same pattern and variable when the pattern is anchored on the right. Instead we will use c*

$ echo ${var%c*}
abcdefab
$
$ echo ${var%%c*}
ab

Returning to our .pdf problem, let's fill out our code:

if [ "${file%.pdf}" != "$file" ]; then

echo "'$file' ended in .pdf"

Again, here is what is happening:

we try to delete .pdf from the right end of the value
we compare the result to the original. If they differ, the deletion worked, so
we conclude the value ended in .pdf

Examples:

1. If $name contains a person's name in the form last,first or last,first middle(s), write a sequence using substring operators to output their name as first last or first middle(s) last

First, we separate the last name from the rest:

last=${name%,*}

and the rest from the last name

rest=${name#*,}

then output the result

echo "$rest $last"

Alternately, we could do this in one step

echo "${name#*,} ${name%,*}"

2. Mimick the function basename using substring operators, using the variable $path

The simplest solution for this is ${path##*/}, but there is a caveat:

$ basename /
/

So you would need an if statement besides the substring operator

3. Mimick the function dirname using substring operators, using the variable $path

The simplest solution for this is ${path%/*}, but, again, there are special cases:

$ dirname /file1
/
$ dirname file1
.

so you would, again, need a couple of if statements.

4. In Example 1 from module 7, two lines of our shell script were:

if file "$dir/$file" | grep ':.*text' > /dev/null; then
if ! echo "$file" | grep -q '\.txt$'; then

Rewrite this code using substring operators.

Assuming the string text output by file always comes at the end of the description, we can rewrite this code as follows:

ftype=$(file "$dir/$file")
if [ "${ftype%text}" != "$ftype" ]; then

if [ "${file%.txt}" = "$file" ]; then

Conclusion

After looking at these substring operators, you might ask: Why would I want to use them? You are correct: they are confusing. Even seasoned shell programmers must stop when they encounter substring operators so they can decipher them. There are three reasons we learn them:

they are faster than calling an external process like grep. In fact, a simple test I performed showed them to be 40 times faster. This is not an issue, however, unless your shell script processes a lot of data.
they are cryptic, and real shell programmers love cryptic code. Just like the && operator is preferred over a simple if statement, shell programmers love short code sequences, even if it's cryptic.
that last reason actually sums it up: you must understand these operators because they appear in real shell scripts. If you haven't ever practiced with them, figuring them out when you come across them can be daunting, and can cripple your understanding of how code works.

This page was made entirely with free software on linux:
Kompozer and Openoffice.org