sections in this module | City
College
of San Francisco - CS260A Unix/Linux System Administration Module: Administration Basics I |
module list |
Preview question: Suppose you wanted to make a copy
of each file on the system that had changed in the last
week. How would you do this? If you know how to use find, the problem
would be simple. |
find outputs paths to files on the system that have a particular set of characteristics. It is used to gather statistics on the filesystem and is one of the system administrator's best friends.
Notes:
Two qualities of find make it very useful.
Before we investigate further, consider the following fairly simple snippet of shell programming:
This snippet locates all the regular files whose size is greater than one megabyte and has neither been modified nor accessed for a year. It omits files owned by root. The files found are checked to ensure they are not compressed. If not, they are compressed and a record of the original attributes kept in a log file.
The snippet above is used to save space on the filesystem as well as to generate a report later so that users of this clutter can be notified.
For the regular user, find can be very useful as well. One such use is to locate a lost file. ("I know I put that report somewhere")
The general format of a find command is
find [-H|-L|-P] <list of paths> [ options ]
list of paths is a list of places to begin looking. (A list is a whitespace-separated set of paths, usually to directories.) This is usually a single directory such as / or ~. find is always recursive. The path to each file found is output. The default is "find everything" and the default path (on linux) is . (the current directory).
The new form of the find command has one option appearing before the path list. This option tells find what to do when it encounters a symbolic link. There are three possibilities:
-P never
follow symbolic links. With this option, a symlink is a symlink and the
thing it points to is not examined. This option is the default.
-L always follow symbolic links. With this option, all symlinks are dereferenced and the things they point to examined.
-H only follow symlinks if they are in the list of paths
Hopefully, the difference between these options can be seen by a simple example. In this example we will simply use find and wc -l to tell how many things find finds. By default find finds everything, and wc -l counts each item found.
We create a symlink to /bin in the current directory and a symlink to /mnt in a subdirectory. Then we see how many things find finds in each
Using -P (never follow symlinks) only finds the things themselve:
Using -L goes through each link, adding 127 (/bin) and 7 (/mnt) to the total:
Using -H only goes through the symlink on the command-line itself. The result is all the items in /bin (125) plus the two items on the command-line (2):
The -L, -P and -H options can be intermixed with the paths on the
commandline. In this case, they apply to the paths that follow the
option. (The author has not tested this variation.) Remember, the
default is -P - dont follow symlinks.
find options
find options are used to restrict the items output. They are a bit strange - a
word rather than a single letter and most take an argument. For
example, owned by a
particular user followed by the user (as in -user gboyd), or things this large followed by
the size as in -size 40k
Many find options (such as -size) take an argument that is an integer indicating this many. For these, the option compares some attribute to that number. The default behavior is to match the number exactly. A prefix can be used to perform other comparisons:
N | exactly N |
+N | more than N |
-N | less than N |
You can see an example of this in the syntax +1M in the first example in this section to indicate more than one megabyte. In the options below, instances where these prefixes may be used are indicated by N
find has a huge number of options. The most useful ones are indicated below:
option | meaning |
-name "pattern" | only output files whose name matches the wildcard pattern. Be sure to use quotes around the pattern, so that the shell does not expand it. |
-iname "pattern" | same as -name, except the pattern is case-insensitive (linux only) |
-type t | limit search to files whose file type is t. t may be f (regular files), d (directories), l (symlinks), c or b (device files), etc |
this is required for find to output any results. Most modern versions of find implicitly add -print as the last option. | |
-user u | find files that are owned by the username u. -uid u can be used for a user id u. |
-group g | find files that whose group name is g. -gid g can be used for group id g |
-inum num | find files whose inode number is num |
-links N | find files that have N links |
-perm onum | find files whose permissions match the octal value onum exactly. More often, you are only interested in particular permissions. In this case, use -onum or /onum. (-perm -044 searches for files that have the read permission for both group and other set, irrespective of the rest of the permission bits. -perm /044 searches for files that have either the read permission for group or the read permission for other set, irrespective of other permissions.) |
-size N | whose size is N. By default, N is measured in 512-byte blocks. Use the suffix c for characters, and, on linux, k for kilobytes, M for megabytes and G for gigabytes. |
-follow | follow symbolic links and examine what they point to. Thus
if ./x is a
symlink to a file, find . -type f -follow finds
./x
whereas find . -type f
does not This option is probably only found in older code. It used to be synonymous with the -L option applied to all paths on the command-line, but now it only applies to paths appearing after it (of which there are probably none). |
-{c,a,m}time N | whose change (ctime), access (atime) or modification (mtime) date is N days ago. |
-newer file | whose modification date is more recent that the modification date of file. Usually file is a date-stamp file created by touch -t [[CC]YY]MMDDHHMM[.SS] |
-anewer file | whose access date is more recent than the modification date of file |
-exec cmd | execute cmd on each file found. Here cmd is a Unix command with the special syntax {} to indicate where in cmd the current file's path should be inserted. The end of cmd is indicated by a semicolon, which must be escaped (\;) (see the section on running commands on the files found below) |
-ok cmd | just like -exec, but verification is requested for each file before cmd is executed |
-noleaf | suppress an optimization that find makes by assuming that directory link counts follow Unix conventions. Use this option when searching non-Unix filesystems such as CDROMs or MSDOS. (I haven't tested this - see the man page) |
Combining find options with or and ! (negation)
If two find options are used, and is implied by default. Thus
find ~ -type f -name "*.txt"
outputs files that are regular files and whose names end in .txt You can change this default by using the -o conjunction
find ~ -type f -o -name "*.txt"
outputs files that are regular files or whose names end in .txt (or both, of course). Note that and binds more tightly than or, so
find ~ -type f -user gboyd -o -user root
outputs regular files that are owned by gboyd as well as everything that is owned by root. If you want the or to be limited to the two -user options, you must add parenthesis for grouping. However, the parenthesis symbol is a shell metacharacter, so it must be escaped:
find ~ -type f \( -user gboyd -o -user root \)
outputs regular files that are owned by either root or gboyd
You can also negate an option by preceding it with an exclamation point. Just make sure that the exclamation point has spaces around it. (! is a bash special character, but is ignored if it is not a prefix)
find ~ -type f ! -user gboyd
outputs regular files that are not owned by gboyd. Of course you can apply this to parenthesized expressions as well. For example,
find ~ -type f ! \( -user gboyd -o -user root \)
finds all regular files owned by anyone other than gboyd or root
Applying commands
Once you generate a list of files that match a set of criteria, you often want to do something to them. There are several ways to accomplish this.
The most general method for applying a set of commands to your file list is to save the list in a file (one-per-line), then read it into a while loop, processing one file (line) at a time. We saw an example of this at the beginning of this section. In this case, you can do any kind of processing you like to the data: your limit is simply the limit of shell scripting.
If you review that while loop, you may ask the question, Why save the list in a file at all? Can't you just use a pipe to send the output of find to the input of the loop? The answer is yes; however, there is an issue: a pipe separates two processes, so any state that you save in the while loop is gone when the loop exits. For example, the following loop (which is silly, but provides a simple illustration) doesn't do what you want.
This code outputs 0, since the loop's nfiles variable is a copy of the nfiles variable in the surrounding shell. (The surrounding shell's nfiles variable was initialized to 0 and never modified.)
If this is confusing, don't worry. The first time I encountered this it took me a long time to figure out, and I have forgotten it and been tricked again many times. Just avoid piping to a while loop (or to any snippet of shell code). Instead, save the output of find in a file and feed the file's contents into the loop like we did at the beginning of this section.
If you only need to apply a single command to each item in your list, you can choose from two simpler techniques: the -exec option of find, or xargs
find -exec
The -exec option of find embeds a single command on find's command-line and runs the command on each path that find finds. The syntax can be a bit difficult at first, so a simple example will help
find . -type f -exec wc -l {} \;
Here we have emphasized the command to run. The {} cookie instructs find where on the command-line of wc to insert the current path output by find. The escaped semi-colon tells find where the end of the embedded command is, in case more find options appear after -exec. The use of the {} cookie allows the placement of the file path anywhere on the command-line of the embedded command:
find . -type f -name "*.txt" -exec cp {} ~/textfiles \; -print
This command copies each of the files found to the given directory. Normally, the files copied would not be reported, but we have added the -print option after -exec so that we can see which files are copied and what their paths were.
Note that in older versions of find, the {} cookie must appear all by itself on the
command-line of the embedded command. In these versions the
command
is disasterous, as the second {} does
not get substituted! This results in moving every regular
file to a file of the same name ( literally, {}.sv ) one after the
other, overwriting every file
but the last one! At the time of this writing, the linux
version of find seems to work as you would hope, but the user
should test their version of find before running this command on a
lot of data. OR you could just use xargs, which seems to work in
any case (see below)
A very simple example shows just how useful -exec is. Suppose you have just uploaded your entire website and discovered that the permissions are wrong! You need to add x permission for other to each directory in the website and r permission for other to each regular file in the website. By hand, this would be a mess, but using find -exec, it is simple. Just go to the root directory of the website and do this
xargs
Extract arguments is used to apply a Unix command to a list of paths read from standard input. You give it a command to apply and it applies it to each path from the list:
$
echo a.txt b | xargs ls -l
-rw--w---- 1 gboyd gboyd 0
Oct 10 16:59 a.txt
-rw-----w- 1 gboyd gboyd 0
Oct 10 16:59 b
$
xargs has options to modify this basic behavior. Although its wide choice of options allows a lot of customization, -i makes it function similarly to find -exec. Just like find, xargs -i uses the {} cookie to indicate where on the command-line the path should be inserted. If the input is one path per line, most versions of xargs now deal correctly with spaces in filenames, and it is not a problem to embed the cookie in other text. Thus, although our earlier example of renaming files using -exec mv {} {}.sv \; was a disaster, the following works just fine on linux:
Example:
We will do a single complex example that should illustrate how you can tackle a complex question with find.
You want to find all configuration files that may have been modified since the system was installed. Suppose you are going to limit your search to files owned by either root or bin beneath /etc. You would also like to know if there appears to be another version of the file (hopefully the original version), which would be named the same, with an added suffix. You should omit XML files (.xml), as they are auto-generated. The system was last installed on June 24, 2012. Ignore files that are not text files. The resulting list should be in ls -l format.
First, let's analyze the issues here:
locate xxx
Often, you just need to locate static files with a specific name
on the system, and you are not interested in file properties such
as size. This may be for research, or just because you forgot
where something is. Using the find command to do this is silly, as it
analyzes the filesystem directly, and can be quite time-consuming.
Instead, most linux systems keep a database of the current
filesystem list, updated nightly, and you can search this list for
a file pattern. We will call this database the locate database. Remember
that this database is only as accurate as its last update, which
is configurable on your system.
The locate command
searches the locate database (created by updatedb) for files on
the system that match a given pattern. In essence, it is a quick
version of a simple find
command with the caveat that its [cached] database of file paths
is only updated periodically (usually once a day). Thus, although
it is faster than find,
it does not allow find
options (except the implied -name,
of course), and it gives information that is slightly out-of-date.
It is very useful, however, for locating a misplaced file, or for
finding files on the system that go with a particular program,
assuming they are named for the program or are in a directory that
is named for it. Note that your system must be configured to
periodically run updatedb(8)
for the locate database to be accurate. (This is usually
configured as a standard cron(8)
job.)
locate xxx outputs
the path to each object on the system whose path contains xxx. Thus, patterns
should always be used with locate.
By default, locate
uses wildcards as patterns.
locate xxx
outputs the path to each object on the system whose path contains xxx, i.e., it implies *xxx*. For example, locate httpd outputs 171
file paths - all files whose name or whose directory's name is any
variation of httpd.
Probably these files are associated with the webserver, so the
output may very well be of interest.
locate *xxx
outputs the path to each object on the system whose path ends in xxx. In the case of httpd, this command
outputs 21 paths.
locate */xxx
outputs the path to each object on the system whose path ends in /xxx Again, in the case of httpd, this command outputs 11 paths, each to a file or
directory named httpd.
locate */old/*
outputs the path to each object on the system whose path contains a directory named old
The patterns allowed with locate are, by default, globbing (wildcard) patterns, as used on the shell command-lines. The caveat, as you see, is that a lack of globbing characters implies a leading and trailing *. Supposedly you can use extended regular expressions with locate as well (if you use an option), but cursory tests of this were mixed. Here is an example that does work, though:
locate --regexp '\.pdf$'
outputs all the paths in the locate database whose name ends in .pdf
The locate database is created by updatedb with the exceptions indicated in /etc/updatedb.conf. If
you examine that file on our systems, you will notice that
filesystems of type nfs
are excluded ("pruned"). Thus on our systems you cannot use locate to locate files
beneath your home directory.
Besides this, the locate
database is created (and updated) as a cron job overnight. On
well-configured systems, this job will be picked up by anacron after a day or
so, but locate
will fail to find anything immediately after installation!
Prev | This page was made entirely
with free software on linux: the Mozilla Project and Openoffice.org |
Next |