File Systems: Part 2

Contents

Filesystem Types
Filesystem Parameters
Filesystem Creation

This chapter discusses file systems in Linux.

Filesystem Types

We have seen examples of filesystems such as windows "NTFS" and "FAT32" for flash drives. We have many different filesystems in Linux such as ext,ext2, ext3, ext4, XFS. Why do we have so many filesystems. A filesystem is an application that belongs to the operating system. As we get new innovations in technology the filesystems can incorporate new features. As an example we have journaling filesystems that keep track of changes and filesystems that can encrypt files.A Linux filesystem has 3 major properties:

1) Superblock. This contains some basic information such as size, status
about the filesystem. It is essentially the metadata about the filesystem.

2)Inode. This is a structure that contains information about a data object such
as a file or a folder. This information includes things like: permissions, owner of
the file, date the file was created, address of the data blocks where the contents
are stored.

3) Data Blocks. These are the actual blocks on the hard drive that contain the data. If
we have a file "1.txt" the data blocks will contain the text of the file.

Unix file systems can vary in different ways:

1) Where the superblock is located.
2) Location of inodes.
3) Block size. This is the file system block size as opposed to the hard drive
   block size.
4) Keeping track of free data space.
5) Organization of the parts of the file system.

Journaled FileSystems

If we create a new file with some data then the steps on a general filesystem might be:

1) The data blocks on the hard drive are filled with data.
2) An inode is allocated and filled with meta data about the file name, etc.
3) File's directory is updated to show this additional entry of the file.

Now assume the system goes down ( power failure as an example ). It goes down between steps 1) and 2). The data might exist on the hard drive but there is no reference to it since there is no inode and we have essentially lost the data.If the system goes down between 2) and 3) we cannot "see" the file because the folder's information has not been updated. Now the system comes up and a consistent check ( what the "fsck" utility does ) is performed and this can take some time. In a journaled filesystem the steps are stored somewhere; let's say a journal entry. When the steps complete then the entry can be removed. If the system goes down we can replay the entries in the journal.

ext2

The file system "ext2" ( second extended file system ) is a continuation of "ext" . It was released in 1992. It was the default file system in many Linux distributions. It is not a journal based system and is thus useful for flash drives and sd cards. The increase in performance comes beacause we do not have to write things to journal. For 1 Kb of Block size we have the following limits:

Block Size: 1Kb
Max File Size: 16Gb
Max Filesystem Size: 4Tb

ext3

The file system "ext2" was supplanted by "ext3" and came out in 2001. It has journaling feature.
Limits:

Block Size: 1Kb
Max File Size: 16Gb
Max Filesystem Size: 2Tb

ext4

The file system "ext3" was supplanted by "ext4". It came out in 2008. It has a journal less mode. It can use extents to store the location of data blocks. An extent is a pair of addresses ( the beginming and end address of blocks ) .


Max File Size: 16Tb
Max Filesystem Size: 1Exb

XFS

It was created in 1993 by Silicon Graphics. It can perform I/O operations in parallel.XFS supports journaling. It uses B+ trees for space allocation and directories.

ReiserFS

It is a journaling file system created in 2001. It stores metadata like directory entries, inode block lists in a single B+ tree data structure.

Filesystem Parameters

When we create a filesystem there are some parameters that can be specified and that need to be considered.

1) Partition Size
2) Filesystem block size. This is the smallest unit that the filesystem
will read and write
3) Number of inodes.
4) A label can be attached to a volume at the time of filesystem creation.

The following is relevant to the ext2/ext3 filesystems.

Block Size

This is important in terms of wasted space and the amount of fragmentation. The blocksize is usually 1K, 2K and 4K.
Let's consider a blocksize of 1K and an address of 4 Bytes..
The inode contains the block addresses for the file's data. There is space reserved for 15 block addresses. The first 12 entries correspond directly to an address. So that accounts for 12 Kb. The 13th address is an indirect address. It contains an address to a block that contains addresses. Since a block is 1Kb and each address is 4 Bytes we have 256 addresses.

Single indirect block.
13th Address -->  Block
                    1st Address         --> Block of 1Kb
                    .
                    .
                    .
                    256th Address

256 Addresses with each pointing to a block of 1Kb. So we have the size as 256Kb.

Doubly indirect block.
14th Address -->  Block
                1st Address         --> Block of Addresses
                 .                         256 Addresses in a single block    --> 1Kb
                 .
                 .
                256th Address

256 Addresses with each pointing to a 256Kb so we have total space that can be accessed as
256 * 256Kb = 65.5 Mb

Triple indirect block.
15th Address -->  Block
              1st Address     -> Block of 256 Address        Block of 256 Address
              .                       1st Address  ------>      1st Address    -->  1Kb
              .                       2nd Address
              .
              256th Address

256 Addresses with each pointing to a block of 256 addresses and then again 256 addresses with
each address pointing to a block. So we have the total possible size.
256 * 256 * 256 * 1Kb  =  16 Gb

TO DO
Calculate the file size when the block size is 2Kb with an address of 32 bits.

For a block size of 1Kb we can have the maximum file size as 16Gb . For a block size of 4Kb and address size of 4bytes we can have a maximum file size of 4TB. Note that the address size will also pose a limit of it's own. If say the address is 2 bits instead of 4 bytes then we can only have 4 possible block addresses. For a block size of 4Kb this gives us 16Kb as the maximum file size. The levels of indirection can contribute to performance also. With 4Kb and one level of indirect addressing we can have a file size of 4Mb. We do not need more levels for files of size less than 4Mb. When we format a partition we can specify the block size for the file system at that point. We can also find the block size of a file system as follows:

[amittal@hills cs260a]$ date > 1.txt
[amittal@hills cs260a]$ ls -l 1.txt
-rw-r--r-- 1 amittal csdept 29 Jan 31 09:52 1.txt
[amittal@hills cs260a]$ du 1.txt
4       1.txt
[amittal@hills cs260a]$

We create a new file and see that it has 29 bytes.
We then do a "du" on the file

If the following environment variables are not set then the unit is 1Kb .
DU_BLOCK_SIZE
BLOCK_SIZE
BLOCKSIZE

From the "du" output we can conclude that the block size is 4Kb.
Remember that we cannot just allocate 29 bytes on the hard
drive. Their is the hard drive block size and then the file
system block size.In this case at least
4Kb must be allocated. Only 29 bytes are used.
If we have permissions then we can also use the "tune2fs" command.

root@ajkumar08-PC:/home/ajay/temp# /sbin/tune2fs -l /dev/sda1 | grep "Block"
Block count:              38822144
Block size:               4096
Blocks per group:         32768

Block sizes are not very important in ext4 and xfs as these file systems use extents. An extent stores a contiguous set of blocks. It will store the starting block address and the number of blocks it uses.

Inodes

An inode stores information about the file. The number of inodes is created when the file system is created. If we run out of inodes then we cannot create any more files. There is a default algorithm that creates an inode for every few Kb of data. The inodes usually will occupy 1/16 of the space used by the file system. We can specify the ratio of inodes to data bytes at the creation of the file system. We can see the status of inodes using the command "df -i" . On the hills server we have:

[amittal@hills cs260a]$ df -i
Filesystem                   Inodes   IUsed     IFree IUse% Mounted on
devtmpfs                    1502724     446   1502278    1% /dev
tmpfs                       1507067       1   1507066    1% /dev/shm
tmpfs                       1507067     897   1506170    1% /run
tmpfs                       1507067      17   1507050    1% /sys/fs/cgroup
/dev/mapper/vg00-root      26214400 2847914  23366486   11% /
/dev/mapper/vg00-tmp        2621440      24   2621416    1% /tmp
/dev/mapper/vg00-var       13107200   14080  13093120    1% /var
/dev/mapper/vg01-users     26214400  524420  25689980    3% /users
/dev/mapper/vg02-students 262141952  841170 261300782    1% /students
/dev/sda1                   1048576     315   1048261    1% /boot
/dev/mapper/vg00-logs       2621440       4   2621436    1% /logs
tmpfs                       1507067       5   1507062    1% /run/user/2216
tmpfs                       1507067       5   1507062    1% /run/user/2548
tmpfs                       1507067       5   1507062    1% /run/user/14371
tmpfs                       1507067       5   1507062    1% /run/user/1398
tmpfs                       1507067       5   1507062    1% /run/user/12872
tmpfs                       1507067       5   1507062    1% /run/user/10957
tmpfs                       1507067       5   1507062    1% /run/user/6658
tmpfs                       1507067       5   1507062    1% /run/user/15534

We have lot more inodes in the "/students" folder as this will contain all
the folders and files used by the students with an hills account.

Filesystem labels

On Windows we have a letter that is assigned to a partition with a filesystem. However if we add or remove a device then that letter can change. As an example I plug a flash drive.

The flash drive has a letter assigned to it which is "D:" . I am also using the fdisk utility to assign a label of "LABEL 1" to it. A label is something we can assign to a partition but it must have a file system on it. Next I take out this flash drive and place a second flash drive and then place the first flash drive in the computer.

We can see that the second flash drive got assigned "D:' and the first flash drive got assigned "F:". However the label is the same "LABEL 1". In Linux the analogous to the letter is the device file under "/dev" . We do not want to rely on the device file name as that can change with reconfiguring the devices just as we saw for the Windows case above. Rather we can assign a label to the partition and then do our manipulation ( such as mounting ) with the labels instead. The label is guranteed to be associated with the partition. We can list the labels on the hills server as follows:

[amittal@hills dev]$ lsblk -o name,mountpoint,label,size,uuid
NAME             MOUNTPOINT LABEL  SIZE UUID
sda                                150G
+-sda1           /boot      boot     2G 74500b92-6378-48c7-9d04-89e72408209f
+-sda2                             148G ZXM8wl-FrKO-HiQw-EIH1-4ZeI-nNPY-v7wLnS
  +-vg00-root    /          root    50G 47467a2e-9fce-4427-b1cf-d57e7949c2c3
  +-vg00-swap    [SWAP]     swap     8G 2f0e837f-fc50-4f4e-9bd6-611bf940c870
  +-vg00-var     /var       var     25G 31fa1735-fe4d-47d3-83b7-442342be1f60
  +-vg00-tmp     /tmp       tmp      5G 8cee070a-905d-4927-974e-04440b1c1005
  +-vg00-logs    /logs               5G 5abdb06e-9296-4732-9c17-7242b0dc2b49
sdb                                100G aBJtwu-zl4B-FoSj-eHY7-L5Kk-bseU-6hlFjF
+-vg01-users     /users             50G c22308e8-2ac2-4507-b161-201acdf2ce61
sdc                                500G 1EUIFH-RpAv-VC3z-jYp6-NuMG-cdtF-sAbA2r
+-vg02-students  /students         500G 1cead7ec-3c37-483c-8ed4-4e7fb6b4d1ac
sdd                                300G
+-sdd1                             300G Hp0CHq-TDGo-zYHd-mUQw-HreX-nvDH-3idC7h
  +-vg03-restore                   300G 9b35c4fe-2f9d-4d1a-838f-f8c6d1584f79
sr0                               1024M

On Linux we can assign a label at the time of creating a filesystem. For the flash drive:

root@ajkumar08-PC:/home/ajay# /sbin/mkfs.vfat -n "LABEL 1" /dev/sdc1
mkfs.fat 4.2 (2021-01-31)

Check if the label got created properly.

root@ajkumar08-PC:/home/ajay# lsblk -o name,mountpoint,label,size,uuid
NAME   MOUNTPOINT LABEL     SIZE UUID
sda                       149.1G
+-sda1 /                  148.1G 4249eea2-2808-4beb-b608-a8ef8e6c8f33
+-sda2                        1K
+-sda5 [SWAP]               976M bb13af1a-3e6f-4301-8f3d-16b835d21e10
sdc                        14.9G
+-sdc1            LABEL 1  14.9G 1ADB-34E9
sr0                        1024M

We can also change the label after the filesystem has been created.

root@ajkumar08-PC:/home/ajay# /sbin/fatlabel /dev/sdc1 "LABEL 2"

We used the command "fatlabel" because we have a "vfat" filesystem. For the "ext" family systems we can use the commands "e2label" and "tune2fs" .

tune2fs -L volume_name device

We notice in the output of "lsblk" that a UUID is generated for a file system. This is a unique identifier that we can also use to identify the file system. It is not as easy to remember as the label but it is unique.We can also change it with the "tune2fs" program.

tune2fs -U random /dev/sdc1

Superblock

A superblock contains information related to a filesystem such as parameters. This can be information such as size, number of inodes, block size and so on. There can be multiple copies of the superblock for redundacy.

Filesystem Creation

We used the "mkfs" in the formatting section in Part 1 of filesystems. We used the command "mkfs.vfat" . We can also execute the "mkfs" command" without the ".vfat" extension and instead use the "-t" option with the file type after it.

root@ajkumar08-PC:/dev# /sbin/mkfs -t vfat /dev/sdc1
mkfs.fat 4.2 (2021-01-31)
root@ajkumar08-PC:/dev#

We can find out the options for the "mkfs" command by doing a man on "mkfs.fat" as an example.The following are options for the "ext2", "ext3" and "ext4" systems. The command "mke2fs" can also be used for the "ext" file systems with the "-t" option specifying the file system type.


-c  check for bad blocks. This will perform read/write tests on each block
on the partition. A single -c performs read-only tests.
Two -c options performs a destructive read/write test on each block.
You may want to use this option for a brand new or very old disk.
You may also want to use it as an added safeguard if the filesystem
will contain data that is critical and is changed so often it is
difficult to keep backed-up. Using this option significantly slows
filesystem creation. -c uses the badblocks(8) program to generate a list
of bad blocks. (I have never used this option.)
-i N  create one inode per N bytes of data. Currently the default
here is 4kB (4096). You can also specify this in a controlled
fashion using -T

-b blocksize  currently only 1024, 2048 and 4096 are allowed.

-L label  set the filesystem label to label. It is your responsibility
to ensure the label is unique! If you assign a duplicate label,
the filesystem will not be mounted or may be mounted incorrectly.
This is very important to realize if you have a dual-boot system
with a second version of linux on it. During installation, linux
will 'suggest' labels for each filesystem you create. The label
will be the same as the mount point. Thus, when you install the
second version of linux, make sure it does not try to assign a
label that is in use by the other existing version! This problem
can be avoided using UUIDs instead of labels.

-U uuid  set the UUID to uuid. Since mke2fs automatically generates
a UUID for the filesystem, this is not really necessary. However,
in an automated application that wants to create the filesystem
then mount it using its UUID, this could be useful. UUIDs can be
created using uuidgen(1)

-T use  change the heuristic for the proportion of inodes
appropriately for the expected use of the filesystem. There
are many supported values for use. The specifics are in
/etc/mke2fs.conf (see mke2fs.conf(5). The most obvious ones
are small (the default - one inode per small file), largefile
(one inode per big file (currently 1MB)), largefile4
(one inode per very big file (currently 4MB))
-n  do not actually create the filesystem, just show the
parameters you would use. This can be used on an existing
filesystem to show the blocksize and the blocks that contain
the spare superblocks. In order for this to output the
correct parameters you must give the same options to
mke2fs as were used when the filesystem was created.

We need to have a partition before we can place a file system on the device. Placing a file system will "destroy" the existing data. As a user we cannot see the previous data but it may be possible to retreive it. Once we have created the file system we can identify it by the following methods:

1) The device name in the "/dev" folder. Remember this can change at
the next reboot if the devices are reconfigured.
2) By the assigned label. This can later be changed( e2label or tune2fs).
3) By the Uid assigned automatically at filesystem creation.

We can use the "df" command to list the mounted filesystems.


root@ajkumar08-PC:/home/ajay# df
Filesystem     1K-blocks    Used Available Use% Mounted on
udev              946364       0    946364   0% /dev
tmpfs             192844    1424    191420   1% /run
/dev/sda1      151741000 5542108 138418080   4% /
tmpfs             964212       0    964212   0% /dev/shm
tmpfs               5120       4      5116   1% /run/lock
tmpfs             192840     940    191900   1% /run/user/1000
/dev/sdc1       15613024       8  15613016   1% /home/ajay/flash

We had placed the filesystem "vfat" on the flash drive. Now let's place the "ext2" system on it. We need to first unmount the device.

root@ajkumar08-PC:/home/ajay# umount /dev/sdc1
root@ajkumar08-PC:/home/ajay# mkfs.ext4 -T  largefile4 -L FLASH_1 /dev/sdc1
mke2fs 1.46.2 (28-Feb-2021)
/dev/sdc1 contains a vfat file system
Proceed anyway? (y,N) y
Creating filesystem with 3907072 4k blocks and 3840 inodes
Filesystem UUID: 6f5ed5bf-b397-47fe-951d-7a931b475a0b
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

root@ajkumar08-PC:/home/ajay#

root@ajkumar08-PC:/home/ajay# e2label /dev/sdc1
FLASH_1

We specified the "-T" option to indicate that our filesystem is going to store large files and thus does not need too many inodes. The settings for the "largefile4" are in:

root@ajkumar08-PC:/home/ajay# cat /etc/mke2fs.conf

...
        largefile4 = {
                inode_ratio = 4194304
                blocksize = -1
        }
...


root@ajkumar08-PC:/home/ajay# tune2fs -l /dev/sdc1 | grep -i 'block size'
Block size:               4096

From the above we can deduce that the blocksize is the default size. The ratio indicates an inode for every 4Mb. We also label the device at the same device with the "-L" option. The label can be shown using the "e2label" command without specifying any options. We can also use the "tune2fs" utility to check the block size. The default block size for ext4 is 4Kb. How many "inodes" got created.


root@ajkumar08-PC:/home/ajay# df -i
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
udev            207866    427  207439    1% /dev
tmpfs           216790    772  216018    1% /run
/dev/sda1      9707520 223326 9484194    3% /
tmpfs           216790      1  216789    1% /dev/shm
tmpfs           216790      3  216787    1% /run/lock
tmpfs            48210    150   48060    1% /run/user/1000
/dev/sdc1         3840     11    3829    1% /home/ajay/flash

From the inode_ratio entry there were roughly 4Mb allocated for each inode and there are 3840 nodes so we get roughly 15Gb and 16Gb is the size of the flash drive. We know that some space will be reserved for the file system such as inodes.

root@ajkumar08-PC:/home/ajay# cd flash
root@ajkumar08-PC:/home/ajay/flash# ls
lost+found

We notice that there is a folder called "lost+found" on the empty file system. At the time of file system checking any lost data will be placed in this folder. In certain scenarios it's possible that a file got deleted and data for that file exists but there is no inode to reference it. Linux will create a new file with this data and place it in the "lost+found" folder. The utility "fsck" will perform these tasks.

Resizing a filesystem

It's possible to resize a filesystem after creation but is not recommended. We shall discuss briefly how this can be done. We can use the tool "resize2fs" ,however this tool only works with the "ext" filesystems. If we want to expand the filesystem we need to expand the partition first. The tool does not manipulate partitions. We need to make sure that there is empty space after the partition and we cannot change the starting location of the partition.We can use the fdisk utility to do this. When shrinking the filesystem we need to use "resize2fs" first and the resize the partition. The size of the partition cannot be smaller than the size of the file system. If we are not sure about the filesystem sizes at the time of creation then it might be better to use the logical volume manager. We will study that in another section. and