Extracting Information from Unix Commands

City College of San Francisco - CS260A
Unix/Linux System Administration
Module: Review

Extracting Information

If you think of running a Unix command as asking a question, the answer is often a lot more information than you want. However, since Unix commands output information in a standard way, we can use simple tools to extract the information we want.

Isolating lines

The single most useful tool for isolating information is our old friend grep. Some of you may not think of grep as a friend. You should start. It is one of the most useful tools on Unix.

If you are working at the command-line and examining the output by eye, it is usually sufficient to use a simple string with grep:

-bash$ grep "gboyd" /etc/passwd
gboyd:x:3496:208:God of Pluto,L413,,home:/users/gboyd:/bin/bash
gboyd02:x:22200:551:Programmer Extraordinaire,LinuxCentral,,:/students/gboyd02:/bin/bash
-bash$

To get more exact output, you must be more careful in specifying a regular expression as your pattern. In our example above, ensuring that a single record is found requires use of the beginning of line anchor and the field delimiter:

-bash$ grep "^gboyd:" /etc/passwd
gboyd:x:3496:208:God of Pluto,L413,,home:/users/gboyd:/bin/bash
-bash$

You can read the above command as output lines whose first field is exactly gboyd

As you know, regular expressions can be very complex. In the great majority of cases, however, patterns like the one above are sufficient for a task. Of course you must be more careful in the specification of your regular expression if you are writing a program than if you are examining the result at the command-line.

If you know there must be two separate strings on the line you want you could write a complex regular expression to describe the line. It is easier to use two instances of grep:

grep 'string1' file1 | grep 'string2'

One useful tool for isolating the first or last line in some output are our friends head -1 and tail -1

Isolating part of a line of output

In a shell program you often need to extract part of an isolated line. There are several tools to accomplish this. In order of increasing complexity they are

cut and tr
awk
shell substring operators

Of this set, we will cover a few of the shell substring operators. These operators are cryptic, terse, counter-intuitive and powerful. These qualities make them a favorite among shell script writers, and you will often see them in system shell scripts.

Watching for changes in files and directories

Often a system administrator needs to monitor a file and watch it as it is appended. The primary example of this is log files. For this purpose the option -f to tail is very useful.

tail -f /var/log/messages

The above command will output the last portion of the file /var/log/messages and monitor the file (until interrupted by ^C), outputting any new lines that are added. (Of course, you must be root to look at this system log file, but tail -f can be used on any file that might be appended.)

Similarly, the system administrator might be interested in which file(s) in a directory changed most recently. This is provided by the -t option to ls, which sorts on a date. Of course, to make this option useful, there must be a date in the output, so -l is required. ls -lt, then, outputs a list of the items in the current directory in ls -l format, sorted with most recently changed first. If you want to sort on access date, simply add the -u option.

Shell substring operators

All of the shell substring operators work on a copy of the value of a variable, substituting the result back on the command-line. Thus, they are usually part of an assignment operation. The operators appear in the general form of substitute a variable's value syntax, ${VAR}, where the operator and its arguments appear to the right of the variable's name, before the right brace.

We will not cover all of the shell substring operators: just the most-often used ones. These allow us to

delete a pattern at the start or end of a string
to isolate a sequence of characters from a string given the offset of the start of the sequence, and its length.

Isolating a sequence of characters

${VAR:offset:length}

substitutes length characters of "$VAR", beginning at character offset (counting from zero). Thus

firstthree=${VAR:0:3}

assigns the first three characters of "$VAR" to firstthree

You can use embedded substitutions to perform calculations if you like. For example,

lastthree=${VAR:$((${#VAR}-3)):3}

assigns the last three characters of "$VAR" to lastthree. This could also be accomplished by

lastthree=${VAR:(-3):3}

Here, the negative number for offset means offset from the last character

You could use this operator in a shell program like this:

# ensure the path is absolute
[ "${path:0:1}" != '/' ] && path="$PWD/$path"

(by the way, in keeping with their affinity for cryptic constructs, shell programmers love && and ||. A single && or || with a single statement attached is fine; however, multiple && or || operators can obscure quickly. Part of your grade is writing clear code.)

Deleting a pattern on one end of a variable's value

Just the title of this set of substring operators sounds strange. These are, however, the operators that are most often used for matching patterns in variables. Read on.

Each of these operators uses a wildcard pattern to describe the right- or left- end of a variable's value. When the operator is applied to the value, the result substituted is the variable's value after deleting the part that matches. If the pattern did not match, nothing is deleted, and the variable's original value is substituted. The operators are

# - match on the left of the value
% - match on the right of the value

Let's look at a few examples

$ string=abcdbc
$ echo ${string#?} # delete any one character on the left
bcdbc
$ echo ${string%?} # delete any one character on the right
abcdb
$

Note that we have not modified the contents of the variable string.

Most often a sequence of characters must be deleted from the left edge up to a particular character. For example, we could delete all the characters up to the d character like this

$ echo ${string#*d}
bc
$

Alternately, a sequence to be deleted from the right begins with a particular character and continues to the end of the string. We could delete all the characters beginning with the d like this

$ echo ${string%d*}
abc
$

If we wanted to use the b character to anchor the pattern, we have an issue. If we delete all the characters on the left side up until the b character, and our string is abcdbc, one must ask up to which b? This is determined by the operator. A single operator deletes the shortest match. A double operator deletes the longest match. Thus,

$ echo ${string%b*}
abcd
$ echo ${string%%b*}
a
$ echo ${string#*b}
cdbc
$ echo ${string##*b}
c
$

As if specifying the pattern you want to delete wasn't bad enough, let's take a look at how these substring operators are used in an actual system shell program:

[
      "${dst#\#}" != "$dst" ] && continue

This says if $dst starts with a #, continue. However, we must read this more carefully:

Try to delete a # from the left of $dst, substituting the result. If the result is different than the original, the deletion was successful, and the # was there! In this case, the continue statement is executed.

A system administration example

Let's look at a line from a function used in all subsystem scripts on linux: (Subsystem scripts control subsystems. We will learn about subsystems later, but an example subsystem is httpd, the web-server.)

[ "${1#*$2*}" = "$1" ] && return 1

Can you tell what this line does? Perhaps the context will help:

# succeeds if $1 contains $2
strstr() {
[ "${1#*$2*}" = "$1" ] && return 1
return 0
}

This page was made entirely with free software on linux:
the Mozilla Project and Openoffice.org