File Systems: Part 2
This chapter discusses file systems in Linux.
Filesystem Types
We have seen examples of filesystems such as windows "NTFS" and "FAT32" for flash drives. We have many different filesystems in Linux such as ext,ext2, ext3, ext4, XFS. Why do we have so many filesystems. A filesystem is an application that belongs to the operating system. As we get new innovations in technology the filesystems can incorporate new features. As an example we have journaling filesystems that keep track of changes and filesystems that can encrypt files.A Linux filesystem has 3 major properties:1) Superblock. This contains some basic information such as size, status about the filesystem. It is essentially the metadata about the filesystem. 2)Inode. This is a structure that contains information about a data object such as a file or a folder. This information includes things like: permissions, owner of the file, date the file was created, address of the data blocks where the contents are stored. 3) Data Blocks. These are the actual blocks on the hard drive that contain the data. If we have a file "1.txt" the data blocks will contain the text of the file.Unix file systems can vary in different ways:
1) Where the superblock is located. 2) Location of inodes. 3) Block size. This is the file system block size as opposed to the hard drive block size. 4) Keeping track of free data space. 5) Organization of the parts of the file system.
Journaled FileSystems
If we create a new file with some data then the steps on a general filesystem might be:
1) The data blocks on the hard drive are filled with data. 2) An inode is allocated and filled with meta data about the file name, etc. 3) File's directory is updated to show this additional entry of the file.Now assume the system goes down ( power failure as an example ). It goes down between steps 1) and 2). The data might exist on the hard drive but there is no reference to it since there is no inode and we have essentially lost the data.If the system goes down between 2) and 3) we cannot "see" the file because the folder's information has not been updated. Now the system comes up and a consistent check ( what the "fsck" utility does ) is performed and this can take some time. In a journaled filesystem the steps are stored somewhere; let's say a journal entry. When the steps complete then the entry can be removed. If the system goes down we can replay the entries in the journal.
ext2
The file system "ext2" ( second extended file system ) is a continuation of "ext" . It was released in 1992. It was the default file system in many Linux distributions. It is not a journal based system and is thus useful for flash drives and sd cards. The increase in performance comes beacause we do not have to write things to journal. For 1 Kb of Block size we have the following limits:Block Size: 1Kb Max File Size: 16Gb Max Filesystem Size: 4Tb
ext3
The file system "ext2" was supplanted by "ext3" and came out in 2001. It has journaling feature.Limits:
Block Size: 1Kb Max File Size: 16Gb Max Filesystem Size: 2Tb
ext4
The file system "ext3" was supplanted by "ext4". It came out in 2008. It has a journal less mode. It can use extents to store the location of data blocks. An extent is a pair of addresses ( the beginming and end address of blocks ) .Max File Size: 16Tb Max Filesystem Size: 1Exb
XFS
It was created in 1993 by Silicon Graphics. It can perform I/O operations in parallel.XFS supports journaling. It uses B+ trees for space allocation and directories.ReiserFS
It is a journaling file system created in 2001. It stores metadata like directory entries, inode block lists in a single B+ tree data structure.Filesystem Parameters
When we create a filesystem there are some parameters that can be specified and that need to be considered.1) Partition Size 2) Filesystem block size. This is the smallest unit that the filesystem will read and write 3) Number of inodes. 4) A label can be attached to a volume at the time of filesystem creation.The following is relevant to the ext2/ext3 filesystems.
Block Size
This is important in terms of wasted space and the amount of fragmentation. The blocksize is usually 1K, 2K and 4K.Let's consider a blocksize of 1K and an address of 4 Bytes..
The inode contains the block addresses for the file's data. There is space reserved for 15 block addresses. The first 12 entries correspond directly to an address. So that accounts for 12 Kb. The 13th address is an indirect address. It contains an address to a block that contains addresses. Since a block is 1Kb and each address is 4 Bytes we have 256 addresses.
Single indirect block. 13th Address --> Block 1st Address --> Block of 1Kb . . . 256th Address 256 Addresses with each pointing to a block of 1Kb. So we have the size as 256Kb.
Doubly indirect block. 14th Address --> Block 1st Address --> Block of Addresses . 256 Addresses in a single block --> 1Kb . . 256th Address 256 Addresses with each pointing to a 256Kb so we have total space that can be accessed as 256 * 256Kb = 65.5 Mb
Triple indirect block. 15th Address --> Block 1st Address -> Block of 256 Address Block of 256 Address . 1st Address ------> 1st Address --> 1Kb . 2nd Address . 256th Address 256 Addresses with each pointing to a block of 256 addresses and then again 256 addresses with each address pointing to a block. So we have the total possible size. 256 * 256 * 256 * 1Kb = 16 Gb TO DO Calculate the file size when the block size is 2Kb with an address of 32 bits.For a block size of 1Kb we can have the maximum file size as 16Gb . For a block size of 4Kb and address size of 4bytes we can have a maximum file size of 4TB. Note that the address size will also pose a limit of it's own. If say the address is 2 bits instead of 4 bytes then we can only have 4 possible block addresses. For a block size of 4Kb this gives us 16Kb as the maximum file size. The levels of indirection can contribute to performance also. With 4Kb and one level of indirect addressing we can have a file size of 4Mb. We do not need more levels for files of size less than 4Mb. When we format a partition we can specify the block size for the file system at that point. We can also find the block size of a file system as follows:
[amittal@hills cs260a]$ date > 1.txt [amittal@hills cs260a]$ ls -l 1.txt -rw-r--r-- 1 amittal csdept 29 Jan 31 09:52 1.txt [amittal@hills cs260a]$ du 1.txt 4 1.txt [amittal@hills cs260a]$ We create a new file and see that it has 29 bytes. We then do a "du" on the file If the following environment variables are not set then the unit is 1Kb . DU_BLOCK_SIZE BLOCK_SIZE BLOCKSIZE From the "du" output we can conclude that the block size is 4Kb. Remember that we cannot just allocate 29 bytes on the hard drive. Their is the hard drive block size and then the file system block size.In this case at least 4Kb must be allocated. Only 29 bytes are used. If we have permissions then we can also use the "tune2fs" command. root@ajkumar08-PC:/home/ajay/temp# /sbin/tune2fs -l /dev/sda1 | grep "Block" Block count: 38822144 Block size: 4096 Blocks per group: 32768Block sizes are not very important in ext4 and xfs as these file systems use extents. An extent stores a contiguous set of blocks. It will store the starting block address and the number of blocks it uses.
Inodes
An inode stores information about the file. The number of inodes is created when the file system is created. If we run out of inodes then we cannot create any more files. There is a default algorithm that creates an inode for every few Kb of data. The inodes usually will occupy 1/16 of the space used by the file system. We can specify the ratio of inodes to data bytes at the creation of the file system. We can see the status of inodes using the command "df -i" . On the hills server we have:[amittal@hills cs260a]$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on devtmpfs 1502724 446 1502278 1% /dev tmpfs 1507067 1 1507066 1% /dev/shm tmpfs 1507067 897 1506170 1% /run tmpfs 1507067 17 1507050 1% /sys/fs/cgroup /dev/mapper/vg00-root 26214400 2847914 23366486 11% / /dev/mapper/vg00-tmp 2621440 24 2621416 1% /tmp /dev/mapper/vg00-var 13107200 14080 13093120 1% /var /dev/mapper/vg01-users 26214400 524420 25689980 3% /users /dev/mapper/vg02-students 262141952 841170 261300782 1% /students /dev/sda1 1048576 315 1048261 1% /boot /dev/mapper/vg00-logs 2621440 4 2621436 1% /logs tmpfs 1507067 5 1507062 1% /run/user/2216 tmpfs 1507067 5 1507062 1% /run/user/2548 tmpfs 1507067 5 1507062 1% /run/user/14371 tmpfs 1507067 5 1507062 1% /run/user/1398 tmpfs 1507067 5 1507062 1% /run/user/12872 tmpfs 1507067 5 1507062 1% /run/user/10957 tmpfs 1507067 5 1507062 1% /run/user/6658 tmpfs 1507067 5 1507062 1% /run/user/15534 We have lot more inodes in the "/students" folder as this will contain all the folders and files used by the students with an hills account.
Filesystem labels
On Windows we have a letter that is assigned to a partition with a filesystem. However if we add or remove a device then that letter can change. As an example I plug a flash drive.The flash drive has a letter assigned to it which is "D:" . I am also using the fdisk utility to assign a label of "LABEL 1" to it. A label is something we can assign to a partition but it must have a file system on it. Next I take out this flash drive and place a second flash drive and then place the first flash drive in the computer.
We can see that the second flash drive got assigned "D:' and the first flash drive got assigned "F:". However the label is the same "LABEL 1". In Linux the analogous to the letter is the device file under "/dev" . We do not want to rely on the device file name as that can change with reconfiguring the devices just as we saw for the Windows case above. Rather we can assign a label to the partition and then do our manipulation ( such as mounting ) with the labels instead. The label is guranteed to be associated with the partition. We can list the labels on the hills server as follows:
[amittal@hills dev]$ lsblk -o name,mountpoint,label,size,uuid NAME MOUNTPOINT LABEL SIZE UUID sda 150G +-sda1 /boot boot 2G 74500b92-6378-48c7-9d04-89e72408209f +-sda2 148G ZXM8wl-FrKO-HiQw-EIH1-4ZeI-nNPY-v7wLnS +-vg00-root / root 50G 47467a2e-9fce-4427-b1cf-d57e7949c2c3 +-vg00-swap [SWAP] swap 8G 2f0e837f-fc50-4f4e-9bd6-611bf940c870 +-vg00-var /var var 25G 31fa1735-fe4d-47d3-83b7-442342be1f60 +-vg00-tmp /tmp tmp 5G 8cee070a-905d-4927-974e-04440b1c1005 +-vg00-logs /logs 5G 5abdb06e-9296-4732-9c17-7242b0dc2b49 sdb 100G aBJtwu-zl4B-FoSj-eHY7-L5Kk-bseU-6hlFjF +-vg01-users /users 50G c22308e8-2ac2-4507-b161-201acdf2ce61 sdc 500G 1EUIFH-RpAv-VC3z-jYp6-NuMG-cdtF-sAbA2r +-vg02-students /students 500G 1cead7ec-3c37-483c-8ed4-4e7fb6b4d1ac sdd 300G +-sdd1 300G Hp0CHq-TDGo-zYHd-mUQw-HreX-nvDH-3idC7h +-vg03-restore 300G 9b35c4fe-2f9d-4d1a-838f-f8c6d1584f79 sr0 1024MOn Linux we can assign a label at the time of creating a filesystem. For the flash drive:
root@ajkumar08-PC:/home/ajay# /sbin/mkfs.vfat -n "LABEL 1" /dev/sdc1 mkfs.fat 4.2 (2021-01-31) Check if the label got created properly. root@ajkumar08-PC:/home/ajay# lsblk -o name,mountpoint,label,size,uuid NAME MOUNTPOINT LABEL SIZE UUID sda 149.1G +-sda1 / 148.1G 4249eea2-2808-4beb-b608-a8ef8e6c8f33 +-sda2 1K +-sda5 [SWAP] 976M bb13af1a-3e6f-4301-8f3d-16b835d21e10 sdc 14.9G +-sdc1 LABEL 1 14.9G 1ADB-34E9 sr0 1024MWe can also change the label after the filesystem has been created.
root@ajkumar08-PC:/home/ajay# /sbin/fatlabel /dev/sdc1 "LABEL 2"We used the command "fatlabel" because we have a "vfat" filesystem. For the "ext" family systems we can use the commands "e2label" and "tune2fs" .
tune2fs -L volume_name deviceWe notice in the output of "lsblk" that a UUID is generated for a file system. This is a unique identifier that we can also use to identify the file system. It is not as easy to remember as the label but it is unique.We can also change it with the "tune2fs" program.
tune2fs -U random /dev/sdc1
Superblock
A superblock contains information related to a filesystem such as parameters. This can be information such as size, number of inodes, block size and so on. There can be multiple copies of the superblock for redundacy.Filesystem Creation
We used the "mkfs" in the formatting section in Part 1 of filesystems. We used the command "mkfs.vfat" . We can also execute the "mkfs" command" without the ".vfat" extension and instead use the "-t" option with the file type after it.root@ajkumar08-PC:/dev# /sbin/mkfs -t vfat /dev/sdc1 mkfs.fat 4.2 (2021-01-31) root@ajkumar08-PC:/dev#We can find out the options for the "mkfs" command by doing a man on "mkfs.fat" as an example.The following are options for the "ext2", "ext3" and "ext4" systems. The command "mke2fs" can also be used for the "ext" file systems with the "-t" option specifying the file system type.
-c check for bad blocks. This will perform read/write tests on each block on the partition. A single -c performs read-only tests. Two -c options performs a destructive read/write test on each block. You may want to use this option for a brand new or very old disk. You may also want to use it as an added safeguard if the filesystem will contain data that is critical and is changed so often it is difficult to keep backed-up. Using this option significantly slows filesystem creation. -c uses the badblocks(8) program to generate a list of bad blocks. (I have never used this option.) -i N create one inode per N bytes of data. Currently the default here is 4kB (4096). You can also specify this in a controlled fashion using -T -b blocksize currently only 1024, 2048 and 4096 are allowed. -L label set the filesystem label to label. It is your responsibility to ensure the label is unique! If you assign a duplicate label, the filesystem will not be mounted or may be mounted incorrectly. This is very important to realize if you have a dual-boot system with a second version of linux on it. During installation, linux will 'suggest' labels for each filesystem you create. The label will be the same as the mount point. Thus, when you install the second version of linux, make sure it does not try to assign a label that is in use by the other existing version! This problem can be avoided using UUIDs instead of labels. -U uuid set the UUID to uuid. Since mke2fs automatically generates a UUID for the filesystem, this is not really necessary. However, in an automated application that wants to create the filesystem then mount it using its UUID, this could be useful. UUIDs can be created using uuidgen(1) -T use change the heuristic for the proportion of inodes appropriately for the expected use of the filesystem. There are many supported values for use. The specifics are in /etc/mke2fs.conf (see mke2fs.conf(5). The most obvious ones are small (the default - one inode per small file), largefile (one inode per big file (currently 1MB)), largefile4 (one inode per very big file (currently 4MB)) -n do not actually create the filesystem, just show the parameters you would use. This can be used on an existing filesystem to show the blocksize and the blocks that contain the spare superblocks. In order for this to output the correct parameters you must give the same options to mke2fs as were used when the filesystem was created.We need to have a partition before we can place a file system on the device. Placing a file system will "destroy" the existing data. As a user we cannot see the previous data but it may be possible to retreive it. Once we have created the file system we can identify it by the following methods:
1) The device name in the "/dev" folder. Remember this can change at the next reboot if the devices are reconfigured. 2) By the assigned label. This can later be changed( e2label or tune2fs). 3) By the Uid assigned automatically at filesystem creation.We can use the "df" command to list the mounted filesystems.
root@ajkumar08-PC:/home/ajay# df Filesystem 1K-blocks Used Available Use% Mounted on udev 946364 0 946364 0% /dev tmpfs 192844 1424 191420 1% /run /dev/sda1 151741000 5542108 138418080 4% / tmpfs 964212 0 964212 0% /dev/shm tmpfs 5120 4 5116 1% /run/lock tmpfs 192840 940 191900 1% /run/user/1000 /dev/sdc1 15613024 8 15613016 1% /home/ajay/flashWe had placed the filesystem "vfat" on the flash drive. Now let's place the "ext2" system on it. We need to first unmount the device.
root@ajkumar08-PC:/home/ajay# umount /dev/sdc1 root@ajkumar08-PC:/home/ajay# mkfs.ext4 -T largefile4 -L FLASH_1 /dev/sdc1 mke2fs 1.46.2 (28-Feb-2021) /dev/sdc1 contains a vfat file system Proceed anyway? (y,N) y Creating filesystem with 3907072 4k blocks and 3840 inodes Filesystem UUID: 6f5ed5bf-b397-47fe-951d-7a931b475a0b Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done root@ajkumar08-PC:/home/ajay# root@ajkumar08-PC:/home/ajay# e2label /dev/sdc1 FLASH_1We specified the "-T" option to indicate that our filesystem is going to store large files and thus does not need too many inodes. The settings for the "largefile4" are in:
root@ajkumar08-PC:/home/ajay# cat /etc/mke2fs.conf ... largefile4 = { inode_ratio = 4194304 blocksize = -1 } ... root@ajkumar08-PC:/home/ajay# tune2fs -l /dev/sdc1 | grep -i 'block size' Block size: 4096From the above we can deduce that the blocksize is the default size. The ratio indicates an inode for every 4Mb. We also label the device at the same device with the "-L" option. The label can be shown using the "e2label" command without specifying any options. We can also use the "tune2fs" utility to check the block size. The default block size for ext4 is 4Kb. How many "inodes" got created.
root@ajkumar08-PC:/home/ajay# df -i Filesystem Inodes IUsed IFree IUse% Mounted on udev 207866 427 207439 1% /dev tmpfs 216790 772 216018 1% /run /dev/sda1 9707520 223326 9484194 3% / tmpfs 216790 1 216789 1% /dev/shm tmpfs 216790 3 216787 1% /run/lock tmpfs 48210 150 48060 1% /run/user/1000 /dev/sdc1 3840 11 3829 1% /home/ajay/flashFrom the inode_ratio entry there were roughly 4Mb allocated for each inode and there are 3840 nodes so we get roughly 15Gb and 16Gb is the size of the flash drive. We know that some space will be reserved for the file system such as inodes.
root@ajkumar08-PC:/home/ajay# cd flash root@ajkumar08-PC:/home/ajay/flash# ls lost+foundWe notice that there is a folder called "lost+found" on the empty file system. At the time of file system checking any lost data will be placed in this folder. In certain scenarios it's possible that a file got deleted and data for that file exists but there is no inode to reference it. Linux will create a new file with this data and place it in the "lost+found" folder. The utility "fsck" will perform these tasks.