sections in this module | City
College of San Francisco - CS260A Unix/Linux System Administration Module: Administration Basics I |
module list |
Binary Files
Dealing with text files is easy, and Linux is awash with tools to handle them. Besides that, you can look at them to help the analysis. Binary files are another matter, and most files on a Linux system are, indeed, binary files for which the cat command just produces a mess.
In this section we will discuss two tools for dealing with binary files - the od command, which can be used to display the contents of binary files in text for examination, and the dd command, which can be used to take apart binary files. First, however, we will visit the minimally-useful program cmp
cmp can be used to compare two files - text or binary files. It compares them character by character and stops when it finds the first difference:
$ cmp /bin/cat /bin/ln
/bin/cat /bin/ln differ: byte 25, line 1
$ echo $?
1
$
In this example, two obviously different binary files were compared. Surprisingly, the first 24 bytes were identical! cmp has options to actually output the differing bytes, but it is most useful for verifying that two binary files are identical:
$ cp /bin/cat /tmp
$ cmp /bin/cat /tmp/cat
$ echo $?
0
$
Examining binary data
The od command is used to examine binary data, displaying it in text. Unfortunately, its basic output format is somewhat scary. Let's take a look:
$ od /bin/cat | more
0000000 042577 043114 000402 000001 000000 000000 000000 000000
0000020 000002 000076 000001 000000 014120 000100 000000 000000
0000040 000100 000000 000000 000000 132670 000000 000000 000000
0000060 000000 000000 000100 000070 000011 000100 000040 000037
0000100 000006 000000 000005 000000 000100 000000 000000 000000
0000120 000100 000100 000000 000000 000100 000100 000000 000000
0000140 000770 000000 000000 000000 000770 000000 000000 000000
0000160 000010 000000 000000 000000 000003 000000 000004 000000
0000200 001070 000000 000000 000000 001070 000100 000000 000000
0000220 001070 000100 000000 000000 000034 000000 000000 000000
This default output format shows od's heritage from the earliest
days of Unix. In fact, od (and dd) are some of the oldest programs on
Unix.
The first column in od's output indicates the address (or byte
offset) of the following line of data. The remainder of that line
consists of data converted to numeric text. The annoying thing about
this output is that it is in octal (base 8) - hence od's name - octal dump.
The format of the data dump is equally annoying - it is output in two-byte quantities as octal numbers. Thus, bytes 0 and 1 read from the file /bin/cat and interpreted as a two-byte (16-bit) integer is 042577 base 8, or 0100010101111111 in binary or 457F in hexadecimal.
At this point, a few of my readers may be seriously considering dropping the class. Don't worry - we will not spend too long on this topic. But you probably think that the output format above will only be worth interpreting in the most dire of circumstances. Fortunately, the modern version of od allows you to change its output format (and how it interprets the data) with the use of a few options. These options govern how the data is output (the data type) and, thankfully, with recent version of od, what radix is used to output the addresses:
radixes: d for decimal, o for octal (default), x for
hexadecimal. The -Ar option is used to indicate the radix for the
address column, where r is the radix
types: c for character, d for integers output as decimal, x for integers output as hexadecimal, etc. The type must be followed by a size (how many bytes to output at a time), except in the case of character type where 1 is implied. The -tT[S] option is used to indicate a type T and size S. Thus, in our example above, we could output 2-byte integers in hexadecimal with a hexadecimal address like this
If you are not convinced yet, let's consider a problem: You are supposed to extract the number from the output of this command
Seems pretty straightforward, so you write a simple sed command to delete everything on the line beginning with the first space character. Unfortunately, it doesn't work
$ du -sk ~ | sed 's/ .*//'
7142980 /home/gboyd
What the heck? After re-issuing the same command to ensure it doesn't work (as most of us do - admit it!), you examine the output of the du command using od, telling it to output the data as individual characters:
This immediately reveals that the problem is that du uses a tab as a separator (highlighted) after its size field. Then you adjust your sed command appropriately!
Here's another problem:
$ ./test.sh
bash: ./test.sh: cannot execute binary file
$
$ od -tc test.sh
0000000 # ! \0
/ b i n /
b a s h \n
# t
0000020 h
i s s
h e l
l s c
r i p t
0000040
\n # b
l a h
b l a
h b l
0000060 a h \n \n
The bottom line here is that od is not needed very often - but when it is, nothing else will work. Get comfortable with simple od commands such as those above to examine the bytes in some data output or, possibly, to examine a binary data value at a particular location in a file. A little bit of od in your arsenal will serve you well.
As a review, try out the following od commands. Explain what each does:
od -Ax -tx4 /bin/cat
od -Ad -tx4 /bin/cat
od -Ax -tc /etc/fstab
Taking apart binary files
Also seldom-used but indispensible when it is needed is the dd
command. dd stands for disk-to-disk copy in reference to its most
famous use - to make a carbon copy of the data on a device. You can
actually copy an entire [unmounted] device to a disk file for a fairly
quick carbon-copy backup.
One of dd's more common uses is to create a placeholder file of a certain size, filling it with constant or random data. We will see it used like this to create a swap file later. It can also be used to extract part of a file or device, such as the boot sector on a disk, for backup or for offline examination. One last, probably dated, use of dd is to divide one large file into several smaller files for transport on a limited capacity device or across the internet, then put the pieces together afterwards.
dd's options are unusual. They consist of an option word followed by an equal sign, then the option parameters. The option words are not prefixed with a dash. Here are the most commonly-used options
Example: to backup the [512 byte] boot sector of a hard disk at /dev/sda and place the sector in a file /xyz/bootsector
dd if=/dev/sda of=/xyz/bootsector bs=512 count=1
Example: to create a file of size 1MiB that contains zeroes
dd if=/dev/zero of=/xyz/zerofile bs=1M count=1
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |