sections in this module City College of San Francisco - CS160B
Unix/Linux Shell Scripting
Module: Unix Review
module list

sed

sed is the stream-oriented-editor. It is a front-end to older Unix line-by-line editors that accepts one or more editing commands on the command-line and applies them to input file(s) or standard input.

sed is complex and arcane. We will not cover it completely. However, it is extremely useful for performing simple edits on data from the command-line, so we will cover this most useful feature and enough basic sed so that you can learn more if you wish. We will concentrate on using a single editing command with sed, which is its most common use. You can, however, give sed a file of editing commands or multiple editing commands on the commandline.

Basic Operation

sed simply applies an editing command to each line of one [or more] input file(s) (or to standard input) and sends the resulting text to standard output. The most common edit to do is a string substitution. If you don't learn anything else about sed, learn how to do a simple string substitution. Here is an example:

$ cat input
hello my hero
goodbye
$ sed 's/h/H/' input
Hello my hero
goodbye
$

In this example, sed changes the first occurance of a lower-case h on each line to an upper-case H. Notice that it applies the edit to each line and sends the result to standard output. Simple. Note that the input file was not changed! You must save the standard output and copy it back to the original to change the original file.

We will first learn how to control this basic operation of sending all the resulting lines to standard output, whether or not an edit was applied. This is controlled by use of the -n option and the p function. Look at the result of each of these controls applied individually on our simple input file:

$ sed -n 's/h/H/' input
$ sed 's/h/H/p' input   
Hello my hero
Hello my hero
goodbye

The -n option says dont output the result to standard output by default. The p flag says output the line if an edit is applied. In the first example above, no lines were output to standard output. In the second example, one copy of every line was output by default, and another copy of each line that was edited was output by the p flag. 

(Using the -n option or p flag alone is not generally useful.) When you use -n and p together, the result is to only output the changed lines:

$ sed -n 's/h/H/p' input
Hello my hero

Now that we understand how to control its default behavior, we can learn a bit about sed commands. (Note: p can either be a function or a flag, but it always means output the line if selected)

sed command format

[addresses]function[flags]

here, addresses indicates the lines to apply the editing command to. We will learn two forms of addresses:

n[,m]  a line number or line number range

/pattern/ all lines that match a basic regular expression pattern

function is the command to apply. We will only learn three functions

p output the line

d delete (dont output the line)

s/pattern/replacement/[pg]  perform the substitution on the line. The flags p and g are

p output the line if the substitution succeeds

g perform the substitution for each instance of pattern found on the line. (The default is to perform the substitution for only the first instance on the line.)

Using p and d functions

You can use the p and d functions in conjunction with -n to mimick a combination of the commands head and tail. We have a file of 6 lines, each line's contents are simply lineN:

$ sed -n '1,3p' lines
line1
line2
line3

Here we are using the p function to output only the lines selected by our address range. Let's try the opposite: delete lines that would normally be output:

$ sed  '1,3d' lines
line4
line5
line6

Do you see how we must control the default behavior with the -n option? 

Selecting lines using a pattern

You can use a regular expression to select lines. (Note: you may want to review the section on regular expressions in this module before continuing.) For example, what do you suppose these sed commands do?

sed -n '/^[A-Z]/p' 

sed '/^[A-Z]/d'

The first outputs only the lines that start with a capital letter. The second outputs every line that does not start with a capital letter.

The substitute command

By far the most used sed command is substitute, as the most-needed feature of sed is to apply a simple edit on an input file. Here are some common types of substitute commands

sed 's/Greg/greg/g'

substitutes greg for Greg on each occurrance in the line. Every line is output to standard output, whether it matched or not.

sed 's%/var/log/messages%/var/log/Messages%'

edits the path /var/log/messages, changing it to /var/log/Messages. Since the patterns contain the / character, a different character (here %) is used for the delimiter for the substitute command. This is much simpler than escaping the slashes in the patterns:

sed 's/\/var\/log\/messages/\/var\/log\/Messages/'

We can combine our address selection and our substitution command. For example, this changes the shell of the user gboyd from ksh to bash:

sed '/^gboyd:/s%/usr/bin/ksh$%/usr/bin/bash%'

Note the different delimiter on the 'select lines by address' part and the 'substitution' part.

Capturing and replaying subexpressions

You can use sed to mimick the behavior of cut and paste, although its syntax is cryptic. Suppose you have a file with three fields separated by colons and you want to rearrange the fields. Instead of using cut and paste, you could just use a single sed command with 'capture and replay'

capture involves denoting part of the text matched by a RE as a subexpression by enclosing it in escaped parenthesis. The 'captured subexpression(s)' can then be 'replayed' in the substitute text by referring to it as an escaped integer, where \1 denotes 'replay subexpression #1', etc. Let's look at a simple example:

$ cat input
hello my hero
goodbye
$ sed 's/\(.\)/\1\1/' input
hhello my hero
ggoodbye

Here we have denoted the first character on each line as a subexpression, and doubled the character (by replaying the subexpression twice) in the output. The unmatched part of the input line is not modified.

Suppose we have a file named Data that has three fields separated by a :, and we want to exchange the first two fields. This is fairly easy, although the syntax looks very ugly:

sed 's/\(.*\):\(.*\):/\2:\1:/' Data

Note that we have taken a liberality here. Since there are only three fields we can use the three colons to limit the greediness of the .* operator. If there are more than three fields and we want to exchange the first two, we must be more careful. In this case, the 'any one character' operator must be replaced by 'one character that is not a colon':

sed 's/\([^:]*\):\([^:]*\):/\2:\1:/' Data

If you want to exchange other fields, you just capture each section of the line using captures and replay them however you wish.

Preview question: Review the syntax used to specify sort keys that you learned for sort. Do you know the difference between its traditional syntax to specify sort keys and the POSIX standard syntax?

Prev This page was made entirely with free software on linux:  
Kompozer
and Openoffice.org    
Next

Copyright 2011 Greg Boyd - All Rights Reserved.

Document made with Kompozer