Contents
Introduction
Awk can be used to process text files and can be used to format reports, perform string and arithmetic operations.The structure of an awk program is do something in the beginning, read a line from a file( or from a pipe), perform operations on it, read the next line and then when the file is empty perform an end operation.
Sample awk command:
$ ls -l | awk '{ sum += $5 } END { print sum }'
783
In the above lines we are using "ls -l" to list
all the files in the folder and the fifth column
has the size of the file in bytes. The "$5"
signifies the 5th field and we are adding that
to a variable named "sum" . Once the input
is processed line by line; then the sum of
the bytes is printed out. The statement
after "END" is executed once all the lines
have been processed.
The word "awk" does not come from the word "awkward" but rather from the authors "Alfred Aho", "Peter J. Weinberger" and "Brian Kernighan" . Awk has it's own scripting language that may look similar to the "C" programming language. It can use the regular expressions for the pattern matching parts of the program. The regular expression is the same as the Unix re that were covered in a previous section. The awk language though is not the same as the Unix shell language. It is it's own language.
The structure of an awk program is usually:
Do the BEGIN section
For every line read from a file
Pattern match Execute the command
Do the END section
The command section can consist of a
pattern/action sequence. The BEGIN section
is optional as is the END section. The pattern
can be optional with just the command or
we can have only the pattern and not the
command. Both can be absent in which case
nothing is printed out.
Empty awk statement. $ ls -l | awk ' 'Let us modify our awk command to include more stuff.
$ ls -l
total 10
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:23 1.txt
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:24 2.txt
-rwxrwx---+ 1 Deller None 1465 Mar 23 11:38 awk.html
-rwxrwx---+ 1 Deller None 3723 Mar 24 09:23 awk11.html
$ ls -l | awk 'BEGIN { print "Sum of the files
modified in March." } $6 == "Mar" { sum += $5 }
END { print sum }'
Sum of the files modified in March.
5248
The "BEGIN" section has a command that prints a
line and then we have a pattern that checks if
the 6th field is "Mar" and the rest of the line
is the same as before.
We can have multiple pattern / action statements.
The below is a single line but broken for readability.
ls -l | awk 'BEGIN { print "Sum of the files modified
in Mar Apr." } $6 == "Mar" { sum += $5 }
$6 == "Apr" { sum += $5 } END { print sum }'
$ ls -l | awk 'BEGIN { print "Sum of the files modified in Mar Apr." } $6 == "Mar" { sum += $5 } $6 == "Apr" { sum += $5 } END { print sum }'
Sum of the files modified in Mar Apr.
6048
Another awk example. Let's say we want to kill
a Unix process depending on a value that the
elapsed time is greater than and a pattern matching
the process name.
kill -9 $( ps -eo comm,pid,etimes | awk '/main/ {if( $3 > 20) { print $2 }}')
In the above command we check if the process contains the word "main"
and if so it checks if it has been running for more than 20 minutes
and if so kills it.
Running awk
There are different ways to run awk commands.Command Line
We can run awk from the command line.
ls -l | awk '{ print $0 }'
Remember the awk structure is:
Do the BEGIN section
For every line read from a file
Pattern match Execute the command
Do the END section
Our command:
ls -l | awk '{ print $0 }'
does not have the optional BEGIN or END sections.
It also does not have the pattern match ( optional)
and the command in the curly parentheses is executed.
The "$0" means the whole of the input line.
$ ls -l | awk '{ print $0 }'
total 14
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:23 1.txt
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:24 2.txt
-rwxrwx---+ 1 Deller None 1465 Mar 23 11:38 awk.html
-rwxrwx---+ 1 Deller None 5690 Mar 24 19:55 awk11.html
We can also have awk work with an input file
instead of piping in data.
$ awk '{ print $0 }' ls.txt
total 16
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:23 1.txt
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:24 2.txt
-rwxrwx---+ 1 Deller None 1465 Mar 23 11:38 awk.html
-rwxrwx---+ 1 Deller None 132 Mar 24 20:08 awk1_commands
-rwxrwx---+ 1 Deller None 6598 Mar 24 20:04 awk11.html
-rwxrwx---+ 1 Deller None 80 Mar 24 20:13 awk2_commands
-rw-rw-r--+ 1 Deller None 0 Mar 24 22:31 ls.txt
If we don't provide an input to "awk" then it takes
it's input from the command line. This is similar
to the way "grep" and "sed" behave.
$ awk '{ print $0 }'
Test
Test
First
First
Ctrl-d to exit awk command.
Commands in a file
We can place the commands in a file. In this example we place the commands in a file called "awk1_commands" .
File: awk1_commands
BEGIN { print "Sum of the files modified in March April." } $6 == "Mar" { sum += $5 } $6 == "Apr" { sum += $5 } END { print sum } $ ls -l | awk -f awk1_commands Sum of the files modified in March April. 9394 We do not have to put the single quote and can place statements on different lines. To run the commands we can use the "-f" option. This is actually the easiest way to run awk. We do not need to put the single quote around the command. However we do have to be careful as to where we break the lines. BEGIN {print "Printing the sum of file sizes"} { sum += $5 } END { print sum } The above works fine as each line is terminated by the curly brace. But the below does not work.
File: awk2_commands
BEGIN {print "Printing the sum of file sizes"} { sum += $5 } END { print sum } $ ls -l | awk -f awk2_commands awk: awk2_commands:4: END blocks must have an action part We can place back slashes at the end of each line except the last to take care of this problem.
File: awk2_commands
BEGIN {print "Printing the sum of file sizes"}\ { sum += $5 }\ END\ { print sum } $ ls -l | awk -f awk2_commands $ ls -l | awk -f awk2_commands Printing the sum of file sizes 10288 Let' look at the following command:
File: awk3_commands
BEGIN \ { print "Sum of the files modified in October November."; sum = 0 } $6 == "Nov" { sum += $5 } $6 == "Oct" { sum += $5 } END\ { print sum } $ ls -l | awk -f awk3_commands awk: awk3_commands:6: END blocks must have an action part We have a condition "$6 == "Nov" and the action associated with this condition is "{ sum += $5 } ". However that's not how awk reads it. The pattern and action statements are optional. The $6 == "Nov" is read as a pattern and awk does not take an action based on this pattern match. The next line is an action statement and is executed by itself regardless of the pattern. What we wanted was the action should only be executed if the pattern matched.
File: awk4_commands
BEGIN \ { print "Sum of the files modified in October November."; sum = 0 } $6 == "Nov" \ { sum += $5 } $6 == "Oct" { sum += $5 } END \ { print sum } $ ls -l | awk -f awk4_commands Sum of the files modified in October November. 0 The "\" forces awk to consider the $6 == "Nov" and { sum += $5} as one line . Now the action is executed only of the pattern matches.
Shell Script
We can place all the commands in a shell script, make the shell script executable and then run it.
File: awk5.sh
ls -l | awk 'BEGIN { print "Sum of the files modified in March April." } $6 == "Mar" { sum += $5 } $6 == "Apr" { sum += $5 } END { print sum }' $ ./awk5.sh Sum of the files modified in March April. 12572 The same rules apply for breaking lines. If the line does not terminate with "}" then we need to place a backward slash. ls -l | awk 'BEGIN { print "Sum of the files modified in October November." } $6 == "Nov" { sum += $5 } $6 == "Oct" { sum += $5 } END \ { print sum }' We can run awk with multiple files as input.
File: awk6_commands
BEGIN \ { print "Sum of the files modified in March April."; sum = 0 } $6 == "Mar" \ { sum += $5 } $6 == "Apr" { sum += $5 } END \ { print sum } $ awk -f awk6_commands ls.txt Sum of the files modified in March April. 8335 $ awk -f awk6_commands ls.txt ls1.txt Sum of the files modified in March April. 16670 The files information in "ls.txt" and "ls1.txt" is treated as 1 data file. If we do not pipe anything to the awk command and do not specify any files either then awk will take the input from the console. $ awk /test/ testing testing rose home this is a test this is a test The above does not have a begin, end or an action part but does have the pattern part.
Pattern Action
We have studied that in addition to the normal Regular Expressions we also have Extended Regular Expressions. Awk works with Extended Regular Expressions by default.
We can omit the pattern or the action .
The "print $5" prints the 5th field.
ls -l | awk '/Mar/ { print $5 }'
The above contains both a pattern and an action. The
command will look files that contain the word
"Mar" and if so print the 5th field.
Output:
0
19
45
49
Using only the pattern.
$ ls -l | awk '/Mar/'
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:23 1.txt
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:24 2.txt
-rwxrwx---+ 1 Deller None 1465 Mar 23 11:38 awk.html
Exercise
Ex1:
Create a folder called "awk" and in this folder
create 2 files: "notes1.txt" and "notes2.txt".
Remember "$" means the end of line anchor. Use
the regular expression "txt$" in awk to print
out the listing of the text files.
ls -l | awk TODO
Ex2:
In the same folder "awk" create another
file "notestxt" ( a name that ends in txt but
does not have a dot . Write the regular
expression in the awk command that will
list only the files with ".txt" at the end.
Ex3:
Print only the first field of the "ls -l" output.
This is the permissions field.
Solutions
Soln 1:
ls -l | awk '/txt$/'
Soln 2:
ls -l | awk '/\.txt$/'
Soln 3:
$ ls -l | awk '{ print $1 }'
Using only the action
ls -l | awk '{ print $0 }'
Output:
$ ls -l | awk '{ print $0 }'
total 31
-rw-r--r--+ 1 Deller None 30 Mar 25 18:04 1.txt
-rw-rw-r--+ 1 Deller None 30 Mar 24 09:24 2.txt
-rwxrwx---+ 1 Deller None 1465 Mar 23 11:38 awk.html
...
The above prints the every line in the input.
Patterns
The awk command's structure is awk Begin Pattern Command End For the example below we will be using a data file called "marks.txt" . 1) John 80 2) Peter 90 3) David 47 4) James 25 5) Lisa 89 6) Kenny 56 7) Sam 95 8) Julia 74 9) Cassie 66 10) Marelena 45 Let's look at the different ways we can input the pattern command. We shall place the commands in the file "awk1" and then run the awk command from the command line as: awk -f awk1 marks1.txt We can place our pattern or regular expressions using the forward slash "/ /" and placing the letter "l" inside it.
File: awk1
/l/ {print $0} $ awk -f awk1 marks.txt 8) Julia 74 10) Marelena 45 We could also have done this from the command line. awk '/l/ {print $0}' marks.txt Since the command is short; it does not matter in the above case but we could have lots of lines as an awk command. All the lines with the letter "l" are printed out. Using a regular expression: /a?e/ {print $0} Let's put the above in a file called "awk2" . awk -f awk2 marks.txt
File: awk2
/a?e/ {print $0} $ awk -f awk2 marks.txt 2) Peter 90 4) James 25 6) Kenny 56 9) Cassie 66 10) Marelena 45 Essentially all the lines with 0 or 1 occurence of "a" followed by the occurence of "e" are printed out. Two patterns separated by a comma signify a range: /is/,/am/ {print $0} Let's put these in a file called "awk3" . awk -f awk3 marks.txt
File: awk3
/is/,/am/ {print $0} $ awk -f awk3 marks.txt 5) Lisa 89 6) Kenny 56 7) Sam 95 The above will start the pattern match with the line that matches the first pattern and stop till the next pattern is matched. We can use an expression as a pattern also. In the below example we are using "&&" to and the expressions. ($3 > 50 && $3 < 60 ) {print $0} Lets put these in the file: "awk4"
File: awk4
($3 > 50 && $3 < 60 ) {print $0} awk -f awk4 marks.txt $ awk -f awk4 marks.txt 6) Kenny 56 Exercises Ex1: Write an awk command in a file called "ex1.cmd" that will print the id, name and a letter grade of "A" to a student whose score is above 50. It should also print the title Id, Name and Grade. Solution 1) ex1.cmd ($3 > 50 ) { print $1" "$2" ""A" } awk -f ex1.cmd marks.txt $ awk -f ex1.cmd marks.txt 1) John A 2) Peter A 5) Lisa A 6) Kenny A 7) Sam A 8) Julia A 9) Cassie A