Paging and Scheduling

City College of San Francisco - CS260A
Linux System Administration
Module: Processes

Paging and Process Scheduling

During our discussion of process creation, the astute reader may have been concerned about the efficiency of the fork and exec process. Yes, indeed, it does carry some overhead. However, this overhead is reduced significantly by paging.

[ The background discussion on paging and process scheduling has been removed due to time-constraints. We will concentrate just on what you need to know about these topics in this section. By the way, the term 'memory' in this discussion (and always in this course) means RAM. ]

Paging involves breaking the memory image of any program in execution (a process) into constant-sized pages, currently 4096 bytes. Only those portions of the process which are needed at any point in time are kept in memory. The set of memory pages of a process that is currently in-memory is called the run-set or resident-set, and its size is termed the resident-set-size (RSS or RSZ). The RSZ can be compared to the process' total size, called its memory size or just size. When a page of the process is referenced that is not in memory currently, a page fault occurs. This causes the page to be read into memory. When no free memory is available, pages that have not been used in some amount of time are stolen from existing processes. The aggressiveness of this page 'stealing' increases as memory becomes more scarce. If pages must be stolen that are still needed, the frequency of page faults could increase dramatically. This situation is called thrashing, and will cause system performance to decrease dramatically. You can verify thrashing behavior using the vmstat or top process monitoring program. We will show an example of vmstat below and discuss both of these at length in the process monitoring section.

I must interject one note of caution here, however: when examining output that gives memory usage you must be aware of the units involved. Memory usage is commonly expressed as either K (remember I always use the power-of-two version of units) OR as memory pages, which are 4K.

If a page of a program is selected for removal from memory due to disuse, what the system must do with it depends on whether it is code or data. Code (instructions) pages do not have to be saved (swapped-out) since they do not change and the page can be re-read from the program file. Data pages do have to be swapped-out so that the current state of the data can be preserved. Data that is swapped-out is saved on the swap device, if one is configured. Use of a swap device (a swap partition or swap file) for this purpose has been magnanimously termed 'virtual memory' on other systems, and that term has been adopted on linux, but it is traditionally just called paging (or 'swapping').

Let's look at the paging behavior of a linux system configured with adequate swap space and adequate RAM:

$ vmstat 1 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b   swpd   free   buff cache   si   so    bi    bo   in   cs us sy id wa st
2 0      0 1302304 116412 524856    0    0   122    13 910 2975 20 3 76 1 0
1 0      0 1302360 116412 524884    0    0     0     0 1680 4760 17 1 81 0 0

We will discuss only the swap fields for now. You can see that all swap fields are 0 (si = swap in, so = swap out). In fact, this system has never swapped. This is not too surprising since it is a personal system, has 4GB of RAM, and we are not running any programs that use a lot of data memory, such as a video editor or image manipulation program, nor is it running as a webserver or database server. The surprising thing is that hills has never swapped either, and it currently has 46 users, 846 processes and is running a webserver, oracle database, and mysql server!

Affecting Process Scheduling

Again, we will not discuss process scheduling due to time-constraints. The interested reader is referred to discussions of the Completely Fair Scheduler (CFS), such as that on Wikipedia. We will limit our discussion to the single metric that a user can available to alter process scheduling priority. That metric is called the nice number.

Traditionally, the nice number was created so that a user could indicate when she started a program that she didn't need the results right away, so she could make the program more 'nice' - or make it run with less priority (i.e., slower). This freed the system resources up for other users, making her task run in whatever extra resources were available, perhaps finishing overnight. However, this concept of a 'nice number' makes discussions of process priority and altering it very confusing, since the niceness of a process is inversely proportional to its priority. (So hang onto your seats and read this a couple of times and practice.)

Every process starts with a nice number (NI) whose default value is its parent's nice number. Nice numbers are integers that range from 20 (very nice - a request for low priority) to -20 (not nice - a request for the highest priority possible). Normally, processes are started with a nice number of 0 (the middle). Normal users can increase the nice number of their processes, making them more nice (thus lowering their priority), but only root can decrease the nice number of a process, or raise its priority.

The nice number does not directly determine scheduling priority - that is up to the scheduler (a process that arbitrates who gets resources). The nice number is simply a suggestion to the scheduler. The scheduler is free to honor or ignore nice numbers. Some schedulers only pay attention to nice numbers which have been decreased, since those are obviously requests by root to increase a process' priority. You can see the true scheduling priority (PRI) in the output of the ps command but you cannot affect it directly. You can only suggest a change in the priority by changing the nice number.

Setting the nice number

There are two ways to set a nice number: when you start a process or after it is running.

When you start a process at the commandline you can request that its nice number be adjusted from its default, which is your current process' nice number. This is done by adding the prefix nice -adj to the command when you issue it. Here the - is just a dash, just like you'd put before any Unix option and adj is the numeric adjustment you would like to add to your current nice number. Thus, the new process would have the nice number of your nice number + adj. For example, if your current nice number is 0 (the system default), then

$ nice -10 gimp &

would start the gimp process with a nice number of 10.

This gets ugly if you are root and want to make a process less nice. Here, if root's nice number is 10 and she wants to start the program top to look at the current processes running on the system, she could give it increased priority by decreasing its nice number (altering it by -10):

# nice --10 top &

Notice the first dash for the option is followed by a -10 giving what looks like a linux double-dash (but its not!)

Ugly, you say? Don't worry, it gets worse.

If a process is already running you alter its nice number using renice. The standard forms are

renice N -p pid1 [pid2 pid3 ... ] to alter the nice number of a list of processes

renice N -u user1 [user2 user3 ... ] to alter the nice number

Here N is the nice number you want to assign to the process (note: no dash!), pid is a process-id and user is a user name. As an example, here is a line from the output of the ps command on my system that shows firefox has a nice number of 0:

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 501 3297 3244 4 80 0 - 334429 poll_s ? 00:06:32 firefox

After I issue the command

$ renice 10 -p 3297
3297: old priority 0, new priority 10

Here is the changed ps output:

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 501 3297 3244 3 90 10 - 334515 poll_s ? 00:08:21 firefox

This page was made entirely with free software on linux:
the Mozilla Project and Openoffice.org