sections in this module City College of San Francisco - CS260A
Linux System Administration

Module: Processes
module list

Monitoring Processes

The ps command is the standard way to monitor processes on a Unix or linux system. ps has more options that any other Unix command, at least that I know of. The reason for this is that ps' options vary significantly between System5 and BSD Unix variants, and the ps version on linux tries to be all things to all users. This creates a nightmare of seemingly conflicting options that can be used to modify the output of ps to give you just what you want in the syntax of whichever variant you choose.

We will concentrate on learning a few standard ps options and the type of data output by ps. You can learn how to be more specific on your own or just use standard tools to filter the output and extract the bits you want.

By default, ps gives you abbreviated information on the processes owned by the current user and associated with the current terminal. Normally this group includes all processes subordinate to the login shell (or the shell that was originally started in the window). The abbreviated information includes the command name, the process id, the tty is is attached to and the CPU time it has used:

Common ps options

System 5 option meaning
-e all processes on the system
-l, -f, -fl extended information
-u user all processes with the euid of user. user may be a comma-
or space- separated list of users
-p pid only process with process id pid. pid may be a comma-
or space- separated list of pids
-o fields only output these fields in this order. Here fields is a comma-separated list of fields.

The -l, -f options give a different mix of output fields of interest. In particular, -l gives the priorities and CPU time, while -f gives the command arguments and start time (wall clock time that the process started). -fl gives a mixture of this data.

Each field output has a name (abbreviation) which appears in upper-case in the header of the output. The field names (in lower-case) can also be used in the field specification of the -o option above.

field meaning
uid user id (or user name). In -o this means uid. Use uname for user name
pid process id
ppid parent process id
ni nice number
pri scheduling priority
sz total memory size (in memory pages on linux. currently 4kB each)
rsz run-set size (in kB on linux)
time cpu time consumed (system time + user time)
stime wall-clock time the process was started
s state (S=sleep,R=runnable,T=stop...)
cmd command. with -o this means 'command + arguments'

The -o option to ps is very useful for specifying exactly the output you want, although its format is very system-specific. An example of the linux version is below. The field names are the standard abbreviations from the table above. Note that the numbers output for sz are smaller than those for rsz. Since rsz is a subset of sz, this is impossible. The reason is that the units of sz are pages, while the units of rsz are kB.

$ ps -o pid,uid,pri,rsz,sz
  PID   UID PRI   RSZ    SZ
31599   500  24  1492  1169
31626   500  22   760  1048

top

top is a very useful command to help the system administrator keep track of processes executing on the system and of the use of resources. It displays a page of data containing summary system statistics and the ps-type output of the processes that are consuming the most CPU time. 

top [ -d delay ] [ -n iterations ] [ -p pid,pid,pid... ]

top, by default, continually updates the screen every few seconds (the delay) and runs forever (infinite iterations), selecting the biggest CPU users as the processes to display. The options allow for the monitoring of specific processes in addition to changing the delay and number of iterations. Other options include 'batch mode' operation, where top writes its output into a file for later examination or analysis.

If the system performance is significantly degraded, top can help identify the issue. However, in times of system bottlenecks, top is just another process, and if it is difficult to run any processes, it can be difficult to get information from top. To remedy this problem, it is useful to nice top so that it runs with increased priority. This is so often needed, that some versions of top have an option to run with a decreased nice value, and alleviate the need for using nice.

top is interactive, and responds to command keystrokes when it is running. The most important of these are h for help and q for quit.

uptime

A simple program that provides a quick thumbnail of system response time is uptime:

bash$ uptime
 12:53pm  up 3 days, 23:44,  17 users,  load average: 0.06, 0.18, 1.04
bash$

uptime displays the time the system has been up as well as the one, five, and fifteen minute load averages, in that order. The load average is defined as the average number of processes in the ready-to-run state during the period. (i.e., the number of processes waiting to run) The reference load average, of course, is 1. A value of 1 implies that, on average, the system always has a process to run. As the load average increases, the system response time suffers. Load averages are output by top and other process-monitoring commands.

The display of the three load averages is useful to provide a quick 'history' of how system load is changing. In the example above, the system load has decreased dramatically over the last while, since the one-minute average is much lower than the 15-minute average. This is important and reasurring information to a system administrator investigating why the system has been slow. If the numbers were reversed, like 1.04, 0.18, 0.06, it would mean the system is getting much busier. In this case it might be appropriate to run the top command and examine which processes were using the resources.

vmstat

vmstat [ delay  [count] ]

vmstat gives virtual memory statistics. It gives a one-line summary of memory, paging, swapping and i/o statistics. The output of vmstat (without arguments) can be misunderstood as the single line of statistics it outputs are averages since the system was started. To get a snapshot of activity, you must run vmstat with a count greater than one. The first line is always averages, so if you are interested in current statistics it should be ignored. delay is the number of seconds over which the measurement is done. If delay is given with no count, the count is infinity, or 'measure each delay seconds forever'. 

Example:

[gboyd@nelson ~]$ vmstat 1 7
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0    144  20848  21004 668868    0    0     4     3   62   22  3  0 97  0  0
 0  0    144  20784  21004 668868    0    0     0     0 1006  880  2  0 98  0  0
 3  0    144 129932  21168 558320    0    0 32104 44300 1360 1190  2 23 52 24  0
 1  4    144  42636  21256 643228    0    0     4 55584 1181  577  1 14  0 85  0
 0  4    144  43132  21264 643220    0    0     0 12068 1145  449  0  1  0 99  0
 0  1    144  12296  17584 681408    0    4  4820   212 1184  707  1 17  4 79  0
 0  0    144  12296  17584 681408    0    0     0     0 1003  266  0  0 98  2  0
[gboyd@nelson ~]$

The above run of vmstat shows a brief flurry of system activity. During the middle measurements, a large I/O operation occurred that caused the following effects:

iostat

iostat provides an alternate view of CPU and hard disk utilization from vmstat:

The interface is similar to vmstat. An interval and count follow, and the first measurement is averages.

I would give sample output here, but I/O is so fast on our systems that simulating interesting data takes too much time. Try the command

iostat -k -d -x 1 10

sar

The kernel makes a record of many system events: i/o movements, process activity, paging behavior, cpu utilization, even interrupts processed in /proc. The data is saved to a daily file in /var/log/sa/saNN, where NN is the day of the month. sar analyzes that data and dumps it in a human-readable form for analysis. By default, the current day's data is examined, which includes the activity since midnight. You can use sar to do two things with these records

sar [options] [-s starttime ] [-e stoptime] [-f filename]

display all or part of the data recorded since midnight. The start and stop time are in hh:mm:ss format.  The options limit the types of measurements shown. The default is "CPU measurements only". You can use -A for "all measurements".

If you add the -f filename option, filename should point to the sa file in /var/log/sa corresponding to the day you want to analyze.

sar [ options ] interval [ count ]

start displaying certain current measurements beginning now as the records are written. The values of interval and count determine what is displayed:

Examples:

sar -f /var/log/sa/sa05

outputs the CPU usage information from the file for the 5th of this month

sar

outputs the CPU usage information from today's file

sar -A

outputs all the information from today's file

sar -A -s 12:00:00 -e 13:00:00

outputs all data collected between Noon and 1pm today

sar 0

outputs CPU usage information summary since the system was started.

The statistics output by sar are detailed. See sar(8) for a description of the fields.

Other process tools

sleep N 

is a command that simply sleeps for N seconds. It can be used in a shell script to force a delay. Most daemons, for example, run in a loop that sleeps for a while, then checks for work to do.

wait [pid]

is a command used to suspend the current process until process pid (or, by default, all of the current process' children) have exited.


Prev This page was made entirely with free software on linux:  
Kompozer, the Mozilla Project
and Openoffice.org    
Next

Copyright 2014 Greg Boyd - All Rights Reserved.