sections in this module | City
College of San Francisco - CS260A Linux System Administration Module: Processes |
module list |
The ps command is the standard way to monitor processes on a Unix or linux system. ps has more options that any other Unix command, at least that I know of. The reason for this is that ps' options vary significantly between System5 and BSD Unix variants, and the ps version on linux tries to be all things to all users. This creates a nightmare of seemingly conflicting options that can be used to modify the output of ps to give you just what you want in the syntax of whichever variant you choose.
We will concentrate on learning a few standard ps options and the type of data output by ps. You can learn how to be more specific on your own or just use standard tools to filter the output and extract the bits you want.
By default, ps gives you abbreviated information on the processes owned by the current user and associated with the current terminal. Normally this group includes all processes subordinate to the login shell (or the shell that was originally started in the window). The abbreviated information includes the command name, the process id, the tty is is attached to and the CPU time it has used:
Common ps options
System 5 option | meaning |
-e | all processes on the system |
-l, -f, -fl | extended information |
-u user | all processes with the euid of user. user may be a comma-
or space- separated list of users |
-p pid | only process with process id pid. pid may be a comma- or space- separated list of pids |
-o fields | only output these fields in this order. Here fields is a comma-separated list of fields. |
The -l, -f options give a different mix of output fields of interest. In particular, -l gives the priorities and CPU time, while -f gives the command arguments and start time (wall clock time that the process started). -fl gives a mixture of this data.
Each field output has a name (abbreviation) which appears in upper-case in the header of the output. The field names (in lower-case) can also be used in the field specification of the -o option above.
field | meaning |
uid | user id (or user name). In -o this means uid. Use uname for user name |
pid | process id |
ppid | parent process id |
ni | nice number |
pri | scheduling priority |
sz | total memory size (in memory pages on linux. currently 4kB each) |
rsz | run-set size (in kB on linux) |
time | cpu time consumed (system time + user time) |
stime | wall-clock time the process was started |
s | state (S=sleep,R=runnable,T=stop...) |
cmd | command. with -o this means 'command + arguments' |
The -o option to ps is very useful for specifying exactly the output you want, although its format is very system-specific. An example of the linux version is below. The field names are the standard abbreviations from the table above. Note that the numbers output for sz are smaller than those for rsz. Since rsz is a subset of sz, this is impossible. The reason is that the units of sz are pages, while the units of rsz are kB.
$ ps -o pid,uid,pri,rsz,sz
PID UID PRI RSZ
SZ
31599 500 24 1492 1169
31626 500 22 760 1048
$
top
top is a very useful command to help the system administrator keep track of processes executing on the system and of the use of resources. It displays a page of data containing summary system statistics and the ps-type output of the processes that are consuming the most CPU time.
top [ -d delay ] [ -n iterations ] [ -p pid,pid,pid... ]
top, by default, continually updates the screen every few seconds (the delay) and runs forever (infinite iterations), selecting the biggest CPU users as the processes to display. The options allow for the monitoring of specific processes in addition to changing the delay and number of iterations. Other options include 'batch mode' operation, where top writes its output into a file for later examination or analysis.
If the system performance is significantly degraded, top can help identify the issue. However, in times of system bottlenecks, top is just another process, and if it is difficult to run any processes, it can be difficult to get information from top. To remedy this problem, it is useful to nice top so that it runs with increased priority. This is so often needed, that some versions of top have an option to run with a decreased nice value, and alleviate the need for using nice.
top is interactive, and responds to command keystrokes when it is running. The most important of these are h for help and q for quit.
uptime
A simple program that provides a quick thumbnail of system response time is uptime:
bash$ uptimeuptime
displays the time the system has been up as well as the one, five,
and
fifteen minute load averages, in that order. The load average is
defined as the average number of processes in the ready-to-run
state
during the period. (i.e., the number of processes waiting to run)
The reference load average, of course, is 1. A
value of 1 implies that, on average, the
system always has a process to run. As the load average increases,
the
system response time suffers. Load averages are output by top and
other process-monitoring commands.
The display of the three load averages is useful to provide a quick 'history' of how system load is changing. In the example above, the system load has decreased dramatically over the last while, since the one-minute average is much lower than the 15-minute average. This is important and reasurring information to a system administrator investigating why the system has been slow. If the numbers were reversed, like 1.04, 0.18, 0.06, it would mean the system is getting much busier. In this case it might be appropriate to run the top command and examine which processes were using the resources.
vmstat
vmstat [ delay [count] ]
vmstat gives virtual memory statistics. It gives a one-line summary of memory, paging, swapping and i/o statistics. The output of vmstat (without arguments) can be misunderstood as the single line of statistics it outputs are averages since the system was started. To get a snapshot of activity, you must run vmstat with a count greater than one. The first line is always averages, so if you are interested in current statistics it should be ignored. delay is the number of seconds over which the measurement is done. If delay is given with no count, the count is infinity, or 'measure each delay seconds forever'.
Example:
[gboyd@nelson ~]$ vmstat 1 7The above run of vmstat shows a brief flurry of system activity. During the middle measurements, a large I/O operation occurred that caused the following effects:
iostat
iostat provides an alternate view of CPU and hard disk utilization from vmstat:The interface is similar to vmstat. An interval and count follow, and the first measurement is averages.
I would give sample output here, but I/O is so fast on our systems that simulating interesting data takes too much time. Try the command
iostat -k -d -x 1 10
sar
The kernel makes a record of many system events: i/o movements, process activity, paging behavior, cpu utilization, even interrupts processed in /proc. The data is saved to a daily file in /var/log/sa/saNN, where NN is the day of the month. sar analyzes that data and dumps it in a human-readable form for analysis. By default, the current day's data is examined, which includes the activity since midnight. You can use sar to do two things with these records
sar
[options] [-s starttime ] [-e stoptime] [-f filename]
display all or part of the data recorded since midnight. The start and stop time are in hh:mm:ss format. The options limit the types of measurements shown. The default is "CPU measurements only". You can use -A for "all measurements".
If you add the -f filename
option, filename
should point to the sa
file in /var/log/sa
corresponding to the day you want to analyze.
sar [ options
] interval [ count ]
start displaying certain current measurements beginning now as the records are written. The values of interval and count determine what is displayed:
Examples:
sar -f /var/log/sa/sa05
outputs the CPU usage information from the file for the 5th of this month
sar
outputs the CPU usage information from today's file
outputs all the information from today's file
sar -A -s 12:00:00 -e 13:00:00
outputs all data collected between Noon and 1pm today
sar
0
outputs CPU usage information summary since the system was started.
The statistics output by sar
are detailed. See sar(8)
for a description of the fields.
Other process tools
sleep N
is a command that simply sleeps for N seconds. It can be used in a shell script to force a delay. Most daemons, for example, run in a loop that sleeps for a while, then checks for work to do.
wait [pid]
is a command used to suspend the current process until process pid (or, by default, all of the current process' children) have exited.
Prev | This page was made entirely
with free software on linux: Kompozer, the Mozilla Project and Openoffice.org |
Next |