Home Scripting Introduction Basics  Redirect Cut/Paste Quoting Regular Expressions Sed Awk Scripts Books

Contents

Using BEGIN and END

We shall use the same technique of placing the awk commands in a file .



File: awk7
BEGIN { print "BEGIN" } { } END { print "END" } In the above before awk processes the input it prints the word "BEGIN" . Now data is processed one line at a time but the action that we have is blank so nothing is done. After the data has been processed the "END" statement which prints "END" to the console is executed. $ awk -f awk7 marks.txt BEGIN END
File: awk8
BEGIN { sum=0 ; average=0; noPeople=0 } { sum += $3 ; noPeople++ } END { print "Average marks:", sum/noPeople } $ awk -f awk8 marks.txt Average marks: 66.7 We can use user defined variables in our awk script. The variables are similar to the variables in Shell . We do not have to declare the type. The increment operator "++" increases the value by 1. In the above we initialize the variables "sum" , "average" and "noPeople" to 0. Then our action works for each line. It sums up the marks and counts the people. Exercise: 1) Modify the command file to print: Number of people: (Actual number of people) Total: ( Actual total marks ) Average marks: 66.7 BEGIN { sum=0 ; average=0; noPeople=0 } { sum += $3 ; noPeople++ } END { average = sum/noPeople; print "Average marks:", average } We are going to change the above command slightly by removing the "{" at the end of "END" . File: "awk9" BEGIN { sum=0 ; average=0; noPeople=0 } { sum += $3 ; noPeople++ } END { average = sum/noPeople print "Average marks:", average }
File: awk9
$ awk -f awk9 marks.txt awk: awk9:4: END blocks must have an action part We need the "{" or "}: at the end of BEGIN or END block or we can use the slash "\" to correct the syntax.
File: awk10
BEGIN { sum=0 ; average=0; noPeople=0 } { sum += $3 ; noPeople++ } END \ { average = sum/noPeople print "Average marks:", average } $ awk -f awk10 marks.txt Average marks: 66.7

Fields

Assume we have a data file called "marks.txt"
File: "marks1.txt"

1) Amit     Physics    80
2) Rahul    Maths      90
3) Shyam    Biology    87
4) Kedar    English    85
5) Hari     History    89

The fields are labelled as $1, $2
and so on to represent first, second
fields and so on. The "$0" represents
the whole line. The "print" command
without any arguments will print the whole line.

awk '{print}' marks1.txt

or the equivalent statement:

awk '{print $0}' marks1.txt

The field no does not have to be a constant

awk 'BEGIN {var1=3}  {print $var1}' marks1.txt
$ awk 'BEGIN {var1=3}  {print $var1}' marks1.txt
Physics
Maths
Biology
English
History


We can separate the items in a print statement
with a comma. The string literals must be quoted.

awk 'BEGIN {var1=3}  {print $var1, "  " , $(var1-1)}' marks1.txt



Built In Variables

ENVIRON


ENVIRON is an associative array holding
info about environment variables.

$ awk 'BEGIN { print ENVIRON["USER"] }'
amittal

$ awk 'BEGIN { print ENVIRON["PATH"] }'

/usr/local/bin:/usr/bin:/usr/local/sbin:
/usr/sbin:/sbin:/users/amittal/.local/bin:
/users/amittal/bin

Notice since the awk command is small and
does not take a data file we can use a
single command.

FS

This is the field separator. By default
it's value is space but we can change that.

$ echo "first:second:third" | awk 'BEGIN { FS=":" } { print $1,$2,$3 }'
first second third

RS

RS is the record separator. Usually this is the new line but we can change that.

$ echo "first:second:third" | awk 'BEGIN { RS=":" } { print $1 }'
first
second
third

$ echo "1) Amit     Physics    80:2) Rahul    Maths      90" | awk 'BEGIN { RS=":" } { print $2,$3 }'
Amit Physics
Rahul Maths
In the above the record separator is the ":"
instead of the new line separator.

Exercise:

echo "line1a:line1b:line1c&line2a:line2b:line2c&" | awk -f f1.cmd

Write ":f1.cmd" to have the RS
as & and FS as : to print the output as:

line1a:line1b:line1c
line2a:line2b:line2c


NR

The "NR" field represents the record number.

$  echo "first:scond:third" | awk 'BEGIN { RS=":" } { print $1, NR }'
first 1
scond 2
third 3

$ cat marks1.txt | awk '{ print $2, NR }'
Amit 1
Rahul 2
Shyam 3
Kedar 4
Hari 5

Exercise

1) Modify the original example with:

BEGIN { sum=0 ; average=0; noPeople=0 }  { sum += $3 ; noPeople++ }
END { print "Average marks:", sum/noPeople  }

Take out the "noPeople" and instead use NR .
awk -f nr.cmd marks.txt

File: "marks2.txt"

Id Name Grade
---------------------
1)    John        80
2)    Peter       90
3)    David       47
4)    James       25
5)    Lisa        89
6)      Kenny       56
7)      Sam         95
8)      Julia       74
9)      Cassie      66
10)     Marelena    45

BEGIN { sum=0 ; average=0; noPeople=0 }
{ sum += $3 ; noPeople++ }
END { print "Average marks:", sum/noPeople  }

Add the condition ( NR > 2) to the
above command so that the first 2 lines
are skipped when doing the calculations.


Solution
1)
BEGIN { sum=0 ; average=0 }  { sum += $3  }
END { print "Average marks:", sum/NR  }

NF

The "NF" represents the number of fields
in a record. We can use this to grab the
last field from a record.

$ echo "first scond third" | awk '{ print $NF }'
third

Exercise

1)Use NF and the condition ( NR > 2 )
to just print the grade from the previous example.



printf

The "printf" function allows us to specify format specifiers. The "printf" function is very powerful and has extra features that are not there in the " "print" function.


File: p1.cmd
{ printf( "%10s%10s%10s\n", $1 , $2 , $3 ) } cat marks.txt | awk -f p1.cmd $ cat marks.txt | awk -f p1.cmd 1) John 80 2) Peter 90 3) David 47 4) James 25 5) Lisa 89 6) Kenny 56 7) Sam 95 8) Julia 74 9) Cassie 66 10) Marelena 45 We can specify a place holder in the first argument by using the percent symbol. Then we specify the value after the first argument. In the above we are stating that the first argument be used for "%10s" . The "s" means the value is a string, We must have have the same number of variables as the place holders. The "10" means reserve 10 spaces for the string. If the string is smaller then it is padded with spaces. This can help in aligning the values. We do not need to specify a format string.
File: p2.cmd
{ printf( "Id Name Marks\n" ) } $ echo "" | awk -f p2.cmd Id Name Marks The function "print" will print a new line by default but "printf" does not do that . We can use the usual backspace characters of "\n" to represent new line and "\t" to represent tabs. Format Specifiers %c ASCII Character %d Decimal integer %e Floating Point number %f Floating Point number %g The shorter of e or f, %o Octal %s String %x Hexadecimal %% Literal % We do not have types in the awk language but a variable can be assigned a value and then we can print that value out if it contains the same type that we are specifying in the "printf" string. If we state the "%s" then we need to supply a string. We saw how the statement: { printf( "%10s%10s%10s\n", $1 , $2 , $3 ) } allocated a width of 10 for the string. The spaces are padded on the left. If we want the string to be on the left hand side with the spaces padded on the right then we use the "-10" notation. File: "p2.cmd" { printf( "%-10s%-10s%\n", $1 , $2 ) } $ awk -f p2.cmd marks.txt 1) John 2) Peter 3) David 4) James 5) Lisa 6) Kenny 7) Sam 8) Julia 9) Cassie 10) Marelena We can also restrict the number of decimal points with the ".2f" kind spedifier. $ echo "" | awk '{ printf("%.2f" , 3.41256) }' 3.41 In the above we are stating that the floating point value should only have 2 fraction digits at most. Exercise: Exercise: 1)Write an awk command in file "pr4.cmd" . Create a file "pr4.sh" that will have the following line. File: "pr4.sh" cat marks.txt | awk -f pr4.cmd Run the file "./pr4.sh" to produce the output: $ ./pr4.sh Id Name Marks 1)--John--80 2)--Peter--90 3)--David--47 4)--James--25 5)--Lisa--89 6)--Kenny--56 7)--Sam--95 8)--Julia--74 9)--Cassie--66 10)--Marelena--45 2) The int function can be used to retain the number and throw away the fractional part. It can be used as int( 3.142 ). Use the printf to change the following file. File: "data1.txt" 1.5 3.1425 14.23 7.5678 3.7 8.6523 4.9 9.4567 to 1 3.14 14 7.57 3 8.65 4 9.46

Strings

Concatenation of strings.
There is no explicit operation to join strings. All we have to do is write the strings next to each other.


File: s1.cmd
{ str1="Ajay" "Mittal" print str1 str1="Ajay" str1 = str1 " " "Ajay" print str1 } $ echo "" | awk -f s1.cmd AjayMittal Ajay Ajay Even though the "s1.cmd" does not really need an input we need to give something to the awk command and we give a blank string. The expression str1 " " "Ajay" joins 3 strings. The contents of the string str1 and a blank space and the string "Ajay" .
File: s2.cmd
{ str1="table" str2 = "" l1 = length( str1 ) for( i1=l1; i1 > 0 ; i1-- ) { #print i1 str2 = str2 substr( str1, i1, 1 ) #print str2 } print str2 } $ echo "" | awk -f s2.cmd elbat The above code reverses the word in the variable "str1". The function substr has 3 arguments. The first argument is the string. The second argument is the position that we need to grab the sub string from and the third argument is the number of characters we need to grab. If "str1" contains the string "table" then some possible examples are: substr( str1, 1, 1 ) Result is "t" Position is 1 and we grab 1 character. substr( str1, 1, 3 ) Result is "tab" Position is 1 and we grab 3 characters. substr( str1, 2 ) Result is "able" Position is 2 and we grab rest of the characters in the string. We are also using a "for" loop in this example: for( i1=l1; i1 > 0 ; i1-- ) A loop repeats the statements inside it's block. We have the initialization statement: "i1=l1" Then we have the check "i1>0" The loop executes the statements till the condition is true and then we have the update statement: i1-- This runs after the block of the loop has been executed.
File: s3.cmd
{ str1="wood table" split ( str1 , arr1, " " ) print arr1[1] print arr1[2] } $ echo "" | awk -f s3.cmd wood table The "split" function splits the input string and places the split strings into an array that can be indexed by numbers.
Comments
Comments in awk are preceded by the hash symbol.
Exercise
1)

Add some comments after the BEGIN part
but before the action part in any of
the previous exercises.

Control Flow

If condition

Let us modify our "marks2.txt" slightly .

File: "marks2.txt"

1)    John      M  80
2)    Peter     M  90
3)    David     M  47
4)    James     M  25
5)    Lisa      F  89
6)    Kenny     M  56
7)    Sam       M  95
8)    Julia     F  74
9)    Cassie    F  66
10)   Marelena  F  45

and our awk command:


File: awk11
BEGIN { sum=0 ; average=0; noPeople=0 } { if ( $3 == "F" ) { print $0 noPeople++ sum += $4 ; } } END { print "Average marks:", sum/noPeople } awk -f awk11 marks2.txt $ awk -f awk11 marks2.txt 5) Lisa F 89 8) Julia F 74 9) Cassie F 66 10) Marelena F 45 Average marks: 68.5 We can use the semicolon to separate each statement. If a statement is on a line and the next statement is on another line then the semicolon is not necessary. Using the "if" with "else if"
File: awk12
BEGIN { sum1=0 ; average1=0; noPeople1=0 sum2=0 ; average2=0; noPeople2=0 } { if ( $3 == "F" ) { print $0 noPeople1++ sum1 += $4 ; } else if ( $3 == "M" ) { print $0 noPeople2++ sum2 += $4 ; } } END { print "Average marks for F:", sum1/noPeople1 print "Average marks for M:", sum2/noPeople2 } $ awk -f awk12 marks2.txt 1) John M 80 2) Peter M 90 3) David M 47 4) James M 25 5) Lisa F 89 6) Kenny M 56 7) Sam M 95 8) Julia F 74 9) Cassie F 66 10) Marelena F 45 Average marks for F: 68.5 Average marks for M: 65.5 Exercise Using the above data file determine the person with the highest marks and the person with the lowest mark. John has the highest mark of 90. James has the lowest mark of 25.

loops

The for loop has the strucure:

for(  Initial ; Condtion ; Post )
    {
        //Body of the loop
   }

The "initialization" part is run once
and can be used to initialize variables.
The condition part is tested and if
true the body of the loop is executed.
After the body has been executed
the post statementis run. After
which the condition is tested
again and so on till the condition
becomes false.

Ex:


File: loop1.cmd
{ print "For loop" for( i1=0 ; i1<3 ; i1++) print i1 } echo "" | awk -f loop1.cmd $ echo "" | awk -f loop1.cmd For loop 0 1 2 Ex:
File: loop2.cmd
{ ind1 = 2 ; ind2 = $0 - 1 #print ind2 ; isPrime = 1 ; for ( ; ind1 <= ind2 ; ind1++ ) { if ( $0 % ind1 == 0 ) isPrime = 0 ; } if ( isPrime == 1 && length( $0 ) > 0 ) { printf "%d is a prime number\n", $1 } } data_prime.txt: 20 21 23 17 7 8 9 $ awk -f loop2.cmd data_prime.txt 23 is a prime number 17 is a prime number 7 is a prime number There is another notation for through the array and that is: for( i1 in array ) do something For each item in the array the variable "i1" will take on the value of the "index" element and we can access the array value with the notation "array[i1]" .
File: awk_states
BEGIN { state["Dublin"] = "California"; state["Reno"] = "Nevada" state["San Jose"] = "California" state["Oakland"] = "California" state["Las Vegas"] = "Nevada" for( str1 in state ) print str1 , state[str1] } awk -f awk_states $ awk -f awk_states Reno Nevada Dublin California Las Vegas Nevada San Jose California Oakland California While loops The structure of the while loop is while ( condition ) { Body } As long as the "condition" is true the body of the loop is executed. It is similar to the for loop.
File: while1.cmd
{ print "While loop" i1=0 while ( i1 < 3 ) { print i1 i1++ } } awk -f while1.cmd $ echo "" | awk -f while1.cmd While loop 0 1 2 Exercise 1) Modify the prime no example to use "while" loop instead of "for" loop.

Arrays

Let's assume we have a file called "cities.txt" File: "cities.txt" 1) "Dublin" 2) "Reno" 3) "San Jose" 4) "Oakland" 5) "Las Vegas" File: "cities" { print $2 } awk -f cities cities.txt $ awk -f cities cities.txt "Dublin" "Reno" "San "Oakland" "Las We see that the output is not what we want. There isn't any easy way to fix this in awk. What we can do is change the field separator with our sed command. File: "convert_cities.sh" cat cities.txt | sed -r 's/[ ]+/|/' > cities1.txt
File: cities1.txt
1)|"Dublin" 2)|"Reno" 3)|"San Jose" 4)|"Oakland" 5)|"Las Vegas" We have replaced the first series of spaces with the pipeline character "|" . Another way of getting around this problem is to use the quotation as the field separator character.
File: cities1
BEGIN { FS="|" } { print $2 } In the BEGIN section we list our file separator as "|" with the command: FS="|" $ awk -f cities1 cities1.txt "Dublin" "Reno" "San Jose" "Oakland" "Las Vegas We do not have to declare the array or it's size and the arrays are associative which means it's subscript value could be a number or string. Exercise: Write a command in file that prints the cities using the quotation mark as the separator. We can of course use the awk arrays in the traditional sense: The Fibonacci series is of the form 1,1,2,3,5,8 We start out with 2 numbers ( 1 and 1 ) and the next number is the sum of the previous 2 numbers.
File: fib1
BEGIN { #holder 1 to 10 for fibonacci number holder[1] = 1 holder[2] = 1 for( i1=3 ; i1<=10 ; i1++ ) { holder[ i1 ] = holder[i1-1] + holder[i1-2] } for( i1=1 ; i1<=10 ; i1++ ) { print holder[i1] } } awk -f fib1 $ awk -f fib1 1 1 2 3 5 8 13 21 34 55

Awk functions

We have awk built in functions that are provided to us and we can also define our own functions if we wish.


File: awk13
BEGIN { arr[0] = "Three" arr[1] = "One" arr[2] = "Two" print "Array elements before sorting:" for (i1 in arr) { print arr[i1] } asort(arr) print "Array elements after sorting:" for (i1 in arr) { print arr[i1] , length( arr[i1] ) } } $ awk -f awk13 Array elements before sorting: Three One Two Array elements after sorting: One 3 Three 5 Two 3 We are using the "asort" function to sort and the "length" function to obtain the length of the string. In the below example we have written a function that returns a 1 if the number passed to it in the argument is a prime number.
File: awk14
function isPrimeNo( num1 ) { ind1 = 2 ; ind2 = $0 - 1 #print ind2 ; isPrime = 1 ; for ( ; ind1 <= ind2 ; ind1++ ) { if ( $0 % ind1 == 0 ) isPrime = 0 ; } if ( isPrime == 1 && length( $0 ) > 0 ) { #print $0, " is a prime number." return 1 } return 0 } { if ( isPrimeNo( $0 ) == 1 ) printf $0 " is a prime number." }
File: data14.txt
20 21 23 17 7 8 9 $ awk -f awk14 data14.txt 23 is a prime number. 17 is a prime number. 7 is a prime number. Exercise 1) File: "data15.txt" 2 3 4 2 3 3 10 3 Use the file "power1.cmd" to fill in the function for power. File: "power1.cmd" function power( num1 , num2 ) { //TO DO } { value=power($1, $2) printf $1, $2 , value } $ cat data.txt | awk -f power1.cmd 2 3 8 4 2 16 3 3 27 10 3 1000