Memory Management

Contents

Virtual Memory
Creating swap space
Allocating Swap Space
Buffer Cache

This chapter discusses memory management in Linux.

Virtual Memory

A landlord has a single room apartment that is available for rent for 1 person. Two people come to the apt and want to stay but are told the apartment is only for 1 person. The people say that one has a daytime job from 8am to 8pm and the other person works from 8pm to 8am. So only 1 person uses the room at a time but 2 people are staying in the room. An outsider sees 2 people using the room. That is essentially how virtual memory works.

The way a program is run in general on a computer is as follows: it is loaded into the RAM by the operating system. The CPU fetches an instruction from the RAM and executes it. The RAM is more expensive and faster compared to the hard drive. An operating system can move a program in the RAM that is not being used to the hard drive and then later on move it back to RAM when it is needed. It gives the illusion of having more memory than there really is. This way we can load more programs and run them compared to the non-virtual scheme. When things are written out to the swap space they are written out in chunks ( usually a few kilobytes ). These chunks are called pages.

On our windows computers we can see this by using the "sysinfo" command.

See full image

We see in the above image that the actual physical memory is 8 Gb and the space reserved on disk ( called the swap space ) is around 21 Gb giving us the total virtual memory of around 29 Gb. Well if hard drive is cheap why don't we just increase the swap space to 100 Gb and get lot more virtual memory ? The problem is that then the system will have to do a lot of swapping( moving things between the RAM and the hard drive) and that itself will slow things down as reading from the RAM is much faster than reading from the hard drive. So we will usually see that the swap space is not all that large compared to the RAM size. In the image above we see that the swap space is stored in a file called "C:\pagefile.sys" .

We can use the "cat /proc/meminfo" command to view memory information on a Linux system.

[amittal@hills process]$ cat /proc/meminfo
MemTotal:       12056544 kB
MemFree:         4466564 kB
MemAvailable:    6664760 kB
Buffers:               0 kB
Cached:          2355584 kB
SwapCached:           36 kB
Active:          1497052 kB
Inactive:        3610212 kB
Active(anon):     368460 kB
Inactive(anon):  3000008 kB
Active(file):    1128592 kB
Inactive(file):   610204 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388604 kB
SwapFree:        8387240 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:       2751720 kB
Mapped:           216840 kB
Shmem:            616788 kB
KReclaimable:     780724 kB
Slab:            1284884 kB
SReclaimable:     780724 kB
SUnreclaim:       504160 kB
KernelStack:       12688 kB
PageTables:        25560 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    14416876 kB
Committed_AS:    3787844 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:           588800 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2510848 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1642304 kB
DirectMap2M:    10940416 kB
DirectMap1G:     2097152 kB
[amittal@hills process]$

From the above the physical RAM is about 12 Gb ( MemTotal field ) and the the swap size is around 8 Gb ( SwapTotal ). The output of "meminfo" shows statistics and is not something that we can tune or adjust. Let's study the output of "meminfo" in little bit more detail.

MemTotal:       12056544 kB
MemFree:         4466564 kB
MemAvailable:    6664760 kB

We know that "MemTotal" is the total physcial memory. The "MemFree" is memory that is currently unused by the system. The "MemAvailable" is being used but can be freed up. This number is an estimate.

Buffers:               0 kB
Cached:          2355584 kB

Buffers is used by processes for input/output. Suppose we are sending information to the printer that is slow but our program is sending that information very fast then we can create a buffer and send the data to the printer from the buffer.Buffers are short lived.

Cached is used for disk cache. This can be used to speed up access to the hard drive.Suppose we read from a file. It's possible that another program requests the same data soon after. We can serve the data from the cache instead of going to the hard drive. Another scenario is that there are multiple writes to the disk. We can store those writes in the cache and then write all the small writes in one big write to the disk. Total page cached size is the sum of "Cached" and "SwapCached" .

There are quite a few entries in the output. Many of the entries are complex to understand without going through a lot of details regarding architecture and memory. The below contains brief descriptions.

Amount of swap used for cached memory. This is a list of
page table entries. Each entry contains information about
the page such as the file the page has been swapped to.

SwapCached:           36 kB

Active:          1497052 kB
Inactive:        3610212 kB

Active(anon):     368460 kB
Inactive(anon):  3000008 kB
Active(file):    1128592 kB
Inactive(file):   610204 kB
Unevictable:           0 kB
Mlocked:               0 kB

Roughly 8 gb of swap memory
out of which almost all of it is free.
SwapTotal:       8388604 kB
SwapFree:        8387240 kB

Amount of memory waiting to be written back to the disk.
Dirty:                 0 kB
Total amount of memory that is actively being written to the
disk.
Writeback:             0 kB

AnonPages:       2751720 kB

Total amount of memory that have been used to map devices, files
or libraries
Mapped:           216840 kB
Shmem:            616788 kB
KReclaimable:     780724 kB

Memory used by the kernel for it's own
data structures.
Slab:            1284884 kB

SReclaimable:     780724 kB
SUnreclaim:       504160 kB
KernelStack:       12688 kB
PageTables:        25560 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    14416876 kB
Committed_AS:    3787844 kB

Total amount of space for virtual address space. This is based
on the number of bits we have for an address and page size.
This is not the actualy virtual memory. In fact the below
number is in TeraBytes. This is the theoretical total
address space we can possibly use.

VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:           588800 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2510848 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB

We have the feature of huge pages for
virtual memory. This will make the swapping
process to the hard drive a bit more efficient.
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

A translation lookaside buffer (TLB) is a memory cache that
stores the transactions of pages from virtual memory to physical ram.
The below entries roughly show how many pages were being transferred.
The amounts correspond to the total memory of the pages that have
been transferred.
DirectMap4k:     1642304 kB
DirectMap2M:    10940416 kB
DirectMap1G:     2097152 kB

Another command to print memory information ( output is much simpler ) is the free command. Running this on the hills server produces the following output.

[amittal@hills dev]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           11Gi       2.2Gi       2.0Gi       594Mi       7.4Gi       8.5Gi
Swap:         8.0Gi       1.0Mi       8.0Gi
[amittal@hills dev]$

The total shows the physical and the swap memory sizes. The meminfo run at the same time shows:
Cached: 6890100 kB

We have discussed cache above. Linux can free this up and make this available if it wants to. Another command is the "swapon" command.

[amittal@hills dev]$ swapon -s
Filename                                Type            Size    Used    Priority
/dev/dm-1                               partition       8388604 1584    -2
[amittal@hills dev

We can use a disk file for the swap space or a hard drive partition. It is easier to change the disk file rather than change a partition. However if we know the size before hand then we can use a partition. Linux also allows multiple swap spaces consisting of files or partitions. If we need a large amount of swap space we can add it without bringing down the system and then get rid of it once we are done with our special need. In the above the device being used for swap space is "/dev/dm-1" and it is a partition of which 8Gb is reserved for the partition.

Creating swap space

We can use a file for a swap space. It must not have any "holes". It's possible that some data is written at an offset to the size of the file and this produces a hole in the file which is filled up all zeroes. Many modern filesystem will not actually write these zeroes on the disk but rather keep this information in the inode structure. The result is that the actual size of file on the hard disk is much smaller. The concept of a hole in a file may be better understood with an example. The "dd" command ( disk/data duplicator) is used to copy raw data from one source to another.

[amittal@hills temp]$ dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
2+0 records in
2+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000273703 s, 29.9 MB/s
[amittal@hills temp]$ ls -l file_with_holes
-rw------- 1 amittal csdept 8192 Mar 11 18:13 file_with_holes
[amittal@hills temp]$ dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
2+0 records in
2+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000235385 s, 34.8 MB/s
[amittal@hills temp]$ ls -l file_with_holes
-rw------- 1 amittal csdept 36864 Mar 11 18:13 file_with_holes
[amittal@hills temp]$

We first create a file with 2 blocks of size 4096 in the first "dd" command. The "if" option reads from the random file so we get random data and the output file is "file_with_holes". We run another "dd" command but we start writing at 7th block of 4096 and we write 2 blocks. So we have written ( 4096*4) byest and skipped ( 7-3 = 4 blocks ), the same amount. So our file size is roughly 36Kb. We can transfer this file to our windows system and open it with Textpad in binary format.

See full image

We can see how the zeroes start after 8192 byes( The 2000 on the leftmost is in hexadecimal...16 to the power 3 times 2 is 8192). Even though we see the zeroes in the image above it's quite possible the zeroes are not on the hard drive file data. Files with holes are also called sparse files and are useful when we need a large offset address in the file without the file having the actual space allocated on the hard drive. We can also use the "truncate" command to create the file and then use "du" command to see the actual size on disk. Example:

[amittal@hills temp]$ du -h file_with_holes
16K     file_with_holes
[amittal@hills temp]$ du -h --apparent-size file_with_holes
36K     file_with_holes
[amittal@hills temp]$ truncate -s 1M another_file
[amittal@hills temp]$ ls -l another_file
-rw------- 1 amittal csdept 1048576 Mar 11 18:53 another_file
[amittal@hills temp]$

Having a hole in the file makes things complicated for the kernel that just wants to write pages to the disk sectors and if a disk sector has not been allocated due to holes then it's a problem. There is the "/dev/zero" file that we can use to create our swap file. The "/dev/zero" is a file that when we read from it gives us null values. It is useful for initializing the data. The memory page is of size 4kb so if our block size is 1Kb we should make the count value to be a multiple of 4.

ajay@ajkumar08-PC:~$ cd /home/ajay/temp/swap
ajay@ajkumar08-PC:~/temp/swap$ dd if=/dev/zero of=extra-swap bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.016882 s, 62.1 MB/s
ajay@ajkumar08-PC:~/temp/swap$

ajay@ajkumar08-PC:~/temp/swap$ swapon -s
Filename                                Type            Size    Used  P                  riority
/dev/sda5                               partition       999420  0     -                  2
ajay@ajkumar08-PC:~/temp/swap$ swapon --show
NAME      TYPE      SIZE USED PRIO
/dev/sda5 partition 976M   0B   -2

We associate the swap file but it's not enabled yet.
ajay@ajkumar08-PC:~/temp/swap$ mkswap extra-swap
mkswap: extra-swap: insecure permissions 0644, 0600 suggested.
Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
no label, UUID=e14948d2-45d7-47f7-9638-a58f5521795c
ajay@ajkumar08-PC:~/temp/swap$


Login as root.
ajay@ajkumar08-PC:~/temp/swap$ su root
Password:

Now enable it .
root@ajkumar08-PC:/home/ajay/temp/swap# swapon extra-swap
swapon: /home/ajay/temp/swap/extra-swap: insecure file owner 1000, 0 (root) suggested.

Check that the entry got added.
root@ajkumar08-PC:/home/ajay/temp/swap# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sda5                               partition       999420  0       -2
/home/ajay/temp/swap/extra-swap         file            1020    0       -3

Higher number means higher priority.

Removing the entry
root@ajkumar08-PC:/home/ajay/temp/swap# swapoff -v /home/ajay/temp/swap/extra-swap
swapoff /home/ajay/temp/swap/extra-swap
root@ajkumar08-PC:/home/ajay/temp/swap# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sda5                               partition       999420  0       -2
root@ajkumar08-PC:/home/ajay/temp/swap#
root@ajkumar08-PC:/home/ajay/temp/swap#

We can delete the file also at this stage.

If we want to retain the swap entry we can modify the "/etc/fstab" file so that when the system restarts it will use our swap file. Instead of a file we could have used a partition also.

Allocating Swap Space

How much space should be allocated for the swap space. The windows shapshot show the RAM as 16Gb and swap space as 5.25 Gb. On the hills server we have the RAM as 12Gb and swap space as 8Gb. On anothe Linux system the RAM is 2Gb and the swap space is 1Gb. There is no formula to figure out the swap space. We need to figure out what applications will run at the same time and what the maximum memory usage might be. Also take into account how many people will be using the system at the same time. As discussed previously the swap space cannot be very large because swapping itself( moving pages from the RAM to the hard drive and vice versa ) is a slow task. Even if the RAM is large enough we should still have some swap space. Linux takes an aggresive approach to swapping. If a program is not being used then it might be swapped even if there is available RAM. That way Linux does not have to spend time swapping if RAM is needed. Swap space can be spread over multiple disk, partitions and files.

Buffer Cache

We have discussed this topic briefly above.If a file is read it's possible that there might be a request for the same data. It's very expensive to read from the disk every time. Rather we can store this data in the RAM. Another case might be to run the command "ls" and save the results in RAM. If another request comes soon after then we can server the results from the RAM cache instead of another execution of the command. We need to take care of when the data becomes stale if the data changes.The RAM used for this activity is called buffer cache. We can see it's size in the output of the "free -h" command under the "cache" field.

We can use a portion of the RAM for this cache and if it gets filled up then the unused parts are discarded to make room for new data. Disk writes can also be cached and written to the disk later on.

Buffer caches can work in different ways. The write-through method will write to the cache and write to the disk at the same time.The write-back method will write to the cache but write to the disk later on.The write-back is more efficient but introduces another problem. What if some writes are written to the cache and the power goes off. The changes in the cache are lost and it's possible the files system is not in a clean state. The "sync" command will cause all the writes in the cache to be written out to the disk.

The size of the cache will vary. If the cache is very big then the normal programs will not fit in RAM and there will be swapping which is itself slow. Linux will try to use as much of the free RAM for cache and if programs need more memory then it will reduce the size of the cache.