Introduction
Sed stands for Stream Editor. It is a powerful utility that can be used for manipulating text and files. An example of what "sed" can do. Let's consider the file: "6_29.cpp" . There are line numbers at the beginning of this file We can use the following command to remove the line numbers.File: hello.cpp
// Your First C++ Program 3 #include <iostream> 5 int main() 6 { 7 std::cout << "Hello World!"; 8 return 0; 9 }
cat hello.cpp | sed -r 's/^ *[0-9]+//g' $ cat hello.cpp | sed -r 's/^ *[0-9]+//g' // Your First C++ Program #includeThe re "^ *[0-9]+" is stating that we can have any number of spaces at the beginning followed by at least one numerical digit and if so we remove that match.The "-r" means use extended regular expressions.int main() { std::cout << "Hello World!"; return 0; }
File: hello1.cpp
// Your First C++ Program /* A Multi line comment */ 3 #include <iostream> 5 int main() 6 { 7 std::cout << "Hello World!"; 8 return 0; 9 }
$ cat hello1.cpp | sed -r '/\/\*/,/\/*\//d' // Your First C++ Program 3 #includeThe command "sed -r '/\/\*/,/\/*\//d'" removes all the lines with the beginning pattern of "/*" up to the end pattern of "*/" .5 int main() 6 { 7 std::cout << "Hello World!"; 8 return 0; 9 }
Syntax
The sed command takes a string
/../../
The "s" states that we are using the substitution command. We specify the pattern and what to replace the pattern with. The part between the first 2 slashes is the pattern and the part between the second and third slash is the replacement string. This is one use of sed and we shall see other ways that sed can manipulate text.
[amittal@hills sed]$ echo "Lemon tree" | sed 's/tree/juice/' Lemon juice [amittal@hills sed]$ [amittal@hills sed]$We do not have to use the forward slash as a separator and can essentially use any character. Using the question mark:
[amittal@hills sed]$ echo "Lemon tree" | sed 's?tree?juice?' Lemon juice [amittal@hills sed]$ [amittal@hills sed]$ $ echo "Lemon tree" | sed 's_tree_juice_' Lemon juice The below expression replaces any word starting with t or a word that has a t inside it. $ echo "Lemon tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g' Lemon juice juice juice $ echo "Lemon atree tank top" | sed -r 's/t[a-zA-Z]+/juice /g' Lemon ajuice juice juiceThe "g" at the end signifies global substituition. Otherwise only the first match is replaced.
$ echo "Lemon atree tank top" | sed -r 's/t[a-zA-Z]+/juice /' Lemon ajuice tank top
Exercises 1) What does the following do ? $ echo "this is something for tom." | sed -r 's/^t/T/' | sed -r 's/ t/ T/' 2) The problem with the below command is that it changes the words beginning with "t" but also changes a word if t is in the middle of the word. Change it so that only words that begin with the letter "t" are modified. Spaces should be preserved as in the original string. echo "temon its tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g' juice ijuice juice juice juice Solutions 1) sed -r 's/^t/T/' This will replace the small "t" at the beginning of the string with a capitol "T". sed -r 's/ t/ T/' This will replace a small "t" if there is a space in front of it with a space and a capitol "T" . 2) $ echo "temon its tree tank top" | sed -r 's/^t[a-zA-Z]+/juice/g' | sed -r 's/ t[a-zA-Z]+/ juice/g' juice its juice juice juice $ echo "temon its tree tank top" | sed -r 's/(^t[a-zA-Z]+| t[a-zA-Z]+)/ juice/g' juice its juice juice juice We can also use the pipe symbol as an or in the match expression. However we see that the first word has an extra space because of that.
& Symbol
$ echo "Lemon tree" | sed -r 's/tree/& &/' Lemon tree tree Rest of the string that is not matched stays the same. $ echo "Lemon 5-6" | sed -r 's/[+,-]/ & /' Lemon 5 - 6 In the above whenever we see a "+" or a "-" symbol in the input string we place spaces around it. [amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/& &/' 123 123 abc The pattern that was matched was "123" and that got repeated with "& &" . [amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/(&)/' (123) abc The above line puts brackets around the number "123". What if we wanted to get rid of the words "abc" and only have "(123)" as the output. We could do something like : echo "123 abc" | sed -r 's/ [a-zA-Z]+//' | sed -r 's/[0-9][0-9]*/& &/' 123 123 We can do this in a better way because sed allows us to specify a particular pattern in our regular expression string. Exercises: 1) Place the command echo "123 abc" | sed -r 's/[0-9]+/& &/' in a shell script and then run the shell script. This method has the advantage of being able to edit the text file and the command is saved for future reference.
Using () and \1
We can use "() \number" syntax to further isolate patterns and select particular patterns. [amittal@hills sed]$ echo "123 abc" | sed -r 's/(^[0-9]+) .*/\1/' 123 In the above example the brackets match the number and the rest of the line is matched by the pattern " .*" . The substitute section only has "\1" and the pattern in bracket is matched while the rest of the line is truncated. The brackets "()" match the pattern "\1" and the next brackets will match "\2". We will get an error if the round brackets do not match the pattern number. $ echo "123 abc" | sed -r 's/^[0-9]+ .*/\1/' sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS We are missing the round brackets in the pattern. $ echo "This is a lemon tree" | sed -r 's/(is) (a)/\2 \1/' This a is lemon tree In the above the patterns are "is" and "a" . $ echo "This is a lemon tree" | sed -r 's/(is)/(\1)/' Th(is) is a lemon tree The below line shows how we can switch the first and the second word. $ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2 \1/' are We in a unix scripting class. What if we wanted to grab the second word only from the above example: $ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+).*/\2/' are There is usually more than one way to write something. $ echo "We are in a unix scripting class." | sed -r 's/^[A-Za-z]+ ([A-Za-z]+).*/\1/' are $ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2/' are in a unix scripting class. We are replacing the first 2 words by just the second word. We can place "\1" on the left hand side also . $ echo "This This contains a mistake." | sed -r 's/([A-Za-z]+) \1/\1/' This contains a mistake. Removing duplicated words at the beginning and end of the line: $ echo "This contains a mistake. This" | sed -r 's/(^[A-Za-z]+)(.*)\1$/\1\2/' This contains a mistake. Removing duplicated words. $ echo "This contains This a mistake." | sed -r 's/(^[A-Za-z]+)(.*)\1/\1\2/' This contains a mistake. $ echo "This contains This a mistake." | sed -r 's/(^[A-Za-z]+)(.*)\1/\1\2/' This contains a mistake. In the above our pattern matches "This contains This" . The first match is "This" and the second pattern match is " contains ". So our replacement string is "\1\2" which will take out the second duplicate "This". Since we didn't match " a mistake." that is printed as is. Exercises 1) Assume we have a string "We are in a unix scripting class." | Switch the first and last word. Switch the first and third word. Switch the first and third word and remove the second word. echo "We are in a unix scripting class." | sed -r 'TODO' Output should be as: class. are in a unix scripting We in are We a unix scripting class. in We a unix scripting class.
Flags -n and p
The flag -n means that lines will not be output to the console.Ex: data.txt This is a test. The dog is chasing the cat. A test is coming up. Are we having fun in this class ? The "-n" option suppresses the output so we don't get any output printed to the console at all. If we use the "p" flag then the lines that match will get printed out. $ sed -n 's/test/Test/p' data.txt This is a Test. A Test is coming up. The "-n" option suppressed the lines that would normally get printed out and the "p" option prints out the lines that match. What if we have only the "p" option and not the "-n" option. $ sed 's/test/Test/p' data.txt This is a Test. This is a Test. The dog is chasing the cat. A Test is coming up. A Test is coming up. Are we having fun in this class ? All the lines in the file "data.txt" get printed out and the lines matching the pattern also get printed out. $ sed 's/test/Test/' data.txt This is a Test. The dog is chasing the cat. A Test is coming up. Are we having fun in this class ? In the above we print all the lines of the file and the ones that have "test" in the line will have it replaced with "Test" and the lines that don't have the "test" will be printed as they are. We can use both the -n and -p flag to simply print the lines that match and not replace anything. $ sed -rn '/([a-z]+) \1/p' data.txt This is a test. The above will print the lines that contain a duplicate word. In this way the sed command is working like a grep. $ cat data.txt | sed -rn '/fun/p' Are we having fun in this class ? The above command prints the lines that have the word "fun" in them. Exercises 1) What does the below print ? cat hello2.cpp | sed -nr 's/([0-9]+)/\1/p'