Home Scripting Introduction Basics  Redirect Cut/Paste Quoting Regular Expressions Sed Awk Scripts Books

Contents

Introduction

Sed stands for Stream Editor. It is a powerful utility that can be used for manipulating text and files. An example of what "sed" can do. Let's consider the file: "6_29.cpp" . There are line numbers at the beginning of this file We can use the following command to remove the line numbers.
File: hello.cpp
 // Your First C++ Program

3 #include <iostream>

5 int main()
6  {
7     std::cout << "Hello World!";
8     return 0;
9 }
cat hello.cpp | sed -r 's/^ *[0-9]+//g'


$ cat hello.cpp | sed -r 's/^ *[0-9]+//g'
 // Your First C++ Program

 #include 

 int main()
  {
     std::cout << "Hello World!";
     return 0;
 }
The re "^ *[0-9]+" is stating that we can have any number of spaces at the beginning followed by at least one numerical digit and if so we remove that match.The "-r" means use extended regular expressions.
File: hello1.cpp
 // Your First C++ Program
 /* A
  Multi line comment
 */

3 #include <iostream>

5 int main()
6  {
7     std::cout << "Hello World!";
8     return 0;
9 }
$ cat hello1.cpp | sed -r '/\/\*/,/\/*\//d'
 // Your First C++ Program

3 #include 

5 int main()
6  {
7     std::cout << "Hello World!";
8     return 0;
9 }


The command "sed -r '/\/\*/,/\/*\//d'" removes all the lines with the beginning pattern of "/*" up to the end pattern of "*/" .

Syntax


The sed command takes a string

/../../

The "s" states that we are using the substitution command. We specify the pattern and what to replace the pattern with. The part between the first 2 slashes is the pattern and the part between the second and third slash is the replacement string. This is one use of sed and we shall see other ways that sed can manipulate text.
[amittal@hills sed]$ echo "Lemon tree" | sed 's/tree/juice/'
Lemon juice

[amittal@hills sed]$
[amittal@hills sed]$

We do not have to use the forward slash as a separator and can essentially use any character. Using the question mark:
[amittal@hills sed]$ echo "Lemon tree" | sed 's?tree?juice?'
Lemon juice

[amittal@hills sed]$
[amittal@hills sed]$

$ echo "Lemon tree" | sed 's_tree_juice_'
Lemon juice

The below expression replaces any word
starting with t or a word that has a t inside it.

$ echo "Lemon tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
Lemon juice  juice  juice

$ echo "Lemon atree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
Lemon ajuice  juice  juice
The "g" at the end signifies global substituition. Otherwise only the first match is replaced.
$ echo "Lemon atree tank top" | sed -r 's/t[a-zA-Z]+/juice /'
Lemon ajuice  tank top
Exercises

1) What does the following do ?
$ echo "this is something for tom." | sed -r 's/^t/T/' | sed -r 's/ t/ T/'

2)
The problem with the below command is that it
changes the words beginning with "t" but also
changes a word if t is in the middle of the word.
Change it so that only words that begin with the
letter "t"  are modified. Spaces should be preserved
as in the original string.

echo "temon its tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
juice  ijuice  juice  juice  juice

Solutions
1)
sed -r 's/^t/T/'
This will replace the small "t" at the beginning of the string
with a capitol "T".

sed -r 's/ t/ T/'
This will replace a small "t" if there is a space in front of it
with a space and a capitol "T" .

2)
$ echo "temon its tree tank top" | sed -r 's/^t[a-zA-Z]+/juice/g' | sed -r 's/ t[a-zA-Z]+/ juice/g'
juice its juice juice juice

$ echo "temon its tree tank top" | sed -r 's/(^t[a-zA-Z]+| t[a-zA-Z]+)/ juice/g'
 juice its juice juice juice

We can also use the pipe symbol as an or in the match expression.
However we see that the first word has an extra space because of that.

& Symbol

$ echo "Lemon tree" | sed -r 's/tree/& &/'
Lemon tree tree

Rest of the string that is not matched stays the same.
$ echo "Lemon 5-6" | sed -r 's/[+,-]/ & /'
Lemon 5 - 6

In the above whenever we see a "+" or a "-"
symbol in the input string we place spaces
around it.


[amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/& &/'

123 123 abc

The pattern that was matched was "123" and that
got repeated with "& &" .

[amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/(&)/'
(123) abc

The above line puts brackets around the number
"123".  What if we wanted to get rid of the words
"abc" and only have "(123)" as the output.
We could do something like :

echo "123 abc" | sed -r 's/ [a-zA-Z]+//'  |   sed -r 's/[0-9][0-9]*/& &/'
123 123

We can do this in a better way because sed
allows us to specify a particular pattern
in our regular expression string.

Exercises:

1)  Place the command
echo "123 abc" | sed -r 's/[0-9]+/& &/'

in a shell script and then run the shell
script. This method has the advantage of
being able to edit the text file and the
command is saved for future reference.

Using () and \1

We can use "() \number" syntax to further
isolate patterns and select particular patterns.

[amittal@hills sed]$ echo "123 abc" | sed -r 's/(^[0-9]+) .*/\1/'
123

In the above example the brackets match the number and
the rest of the line is matched by the pattern " .*" .
The substitute section only has "\1" and the pattern
in bracket is matched while the rest of the line is truncated.

The brackets "()" match the pattern "\1" and the
next brackets will match "\2". We will get an
error if the round brackets do not match the
pattern number.

$ echo "123 abc" | sed -r 's/^[0-9]+ .*/\1/'
sed: -e expression #1, char 16: invalid reference \1 on
`s' command's RHS

We are missing the round brackets in the pattern.

$  echo "This is a lemon tree" | sed -r 's/(is) (a)/\2 \1/'
This a is lemon tree

In the above the patterns are "is" and "a" .

$  echo "This is a lemon tree" | sed -r 's/(is)/(\1)/'
Th(is) is a lemon tree


The below line shows how we can switch the first
and the second word.

$ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2 \1/'
are We in a unix scripting class.

What if we wanted to grab the second word only from
the above example:

$ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+).*/\2/'
are

There is usually more than one way to write something.

$ echo "We are in a unix scripting class." | sed -r 's/^[A-Za-z]+ ([A-Za-z]+).*/\1/'
are

$  echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2/'
are in a unix scripting class.

We are replacing the first 2 words by just the second word.

We can place "\1" on the left hand side also .

$ echo "This This contains a mistake." | sed -r 's/([A-Za-z]+) \1/\1/'
This contains a mistake.

Removing duplicated words at the beginning and end of the line:
$ echo "This contains a mistake. This" | sed -r 's/(^[A-Za-z]+)(.*)\1$/\1\2/'
This contains a mistake.


Removing duplicated words.
$ echo "This contains This a mistake." | sed -r 's/(^[A-Za-z]+)(.*)\1/\1\2/'
This contains  a mistake.

$ echo "This contains This a mistake." | sed -r 's/(^[A-Za-z]+)(.*)\1/\1\2/'
This contains  a mistake.

In the above our pattern matches "This contains This" .
The first match is "This" and the second pattern
match is " contains ". So our replacement string is
"\1\2" which will take out the second duplicate "This".
Since we didn't match " a mistake." that is printed as is.


Exercises

1)
Assume we have a string  "We are in a unix scripting class." |
 Switch the first and last word.
 Switch the first and third word.
 Switch the first and third word and remove the second word.
 echo "We are in a unix scripting class." | sed -r 'TODO'

Output should be as:
class. are in a unix scripting We
in are We a unix scripting class.
in We a unix scripting class.

Flags -n and p

The flag -n means that lines will not be output to the console.
Ex:

data.txt
This is a test.
The dog is chasing the cat.
A test is coming up.
Are we having fun in this class ?

The "-n" option suppresses the output so we don't
get any output printed to the console at all. If
we use the "p" flag then the lines that match will
get printed out.

$ sed -n 's/test/Test/p' data.txt

This is a Test.
A Test is coming up.

The "-n" option suppressed the lines that
would normally get printed out and the "p"
option prints out the lines that match. What
if we have only the "p" option and not the
"-n" option.

$ sed 's/test/Test/p' data.txt
This is a Test.
This is a Test.
The dog is chasing the cat.
A Test is coming up.
A Test is coming up.
Are we having fun in this class ?

All the lines in the file "data.txt" get
printed out and the lines matching the
pattern also get printed out.

$ sed 's/test/Test/' data.txt
This is a Test.
The dog is chasing the cat.
A Test is coming up.
Are we having fun in this class ?

In the above we print all the lines of the file
and the ones that have "test" in the line
will have it replaced with "Test" and the
lines that don't have the "test" will be printed
as they are.




We can use both the -n and -p flag to simply
print the lines that match and not replace anything.

$ sed -rn '/([a-z]+) \1/p' data.txt
This is a test.

The above will print the lines that contain
a duplicate word. In this way the sed command
is working like a grep.

$ cat data.txt | sed -rn '/fun/p'
Are we having fun in this class ?

The above command prints the lines that
have the word "fun" in them.

Exercises

1) What does the below print ?
 cat hello2.cpp | sed -nr 's/([0-9]+)/\1/p'