What Is an Inode?
Understanding inodes is key to a better understanding of HPC filesystems.
If you are reading or learning about high-performance computing (HPC), where storage is a very important consideration, having a basic introduction to an inode is fairly important. In this article, I want to give you a high-level definition of an inode along with some additional details.
For clarity, I’ll start with the recent evolution of filesystems. Over time, more features have been added to filesystems, creating a “spectrum” of filesystems, from the simple to the really sophisticated. Moreover, filesystems now address specific usage models, so they are not so generalized and might not be POSIX compliant. In this article, I stay with the classic filesystems that use inodes, oftentimes referred to as block-oriented filesystems, which excludes object and pure key-value filesystems.
Filesystems
The classic POSIX or POSIX-like block-oriented filesystems, in general, have two parts: the data to be stored and the metadata (i.e., the “data” about the data). Everyone knows the first kind of data stored in filesystems. For some, that is a very large collection of cat photos or KC and the Sunshine Band recordings. The second part, the metadata, you don’t really see or have a visceral feel for. However, this metadata is a very key component of many filesystems. Think of the metadata as a database, of sorts, that contains the information about your data. More precisely, it includes information such as the file name; the date the file was modified or accessed; the file owner, group owner, and permissions; the blocks in the filesystem where the data resides; and so on. This type of information is key to a filesystem because, otherwise, you just have a bunch of bits on storage media, and you have no idea what is in those blocks and how files are spread across the blocks.
Inodes
In general for most *nix filesystems, each file or directory has its own metadata. (Remember that in *nix operating systems, directories are just files.) The metadata is generally a fixed-size data structure called an inode. Each inode is then assigned an inode number that is unique to that file. (Although sometimes the inodes aren't unique, the combination of the inode with other information makes the metadata about the file unique.)
For POSIX filesystems or most POSIX-compliant filesystems, the information in any inode is defined a priori, which allows applications and libraries to call a function that queries, creates, or deletes an inode; the information accessed is always the same.
The origin of the term “inode” is not known with any certainty. One of the original developers of Unix, Dennis Ritchie, said the following about the origin of the term:
In truth, … It was just a term that we started to use. “Index” is my best guess, because of the slightly unusual file system structure that stored the access information of files as a flat array on the disk, with all the hierarchical directory information living aside from this. Thus, the i-number is an index in this array, the i-node is the selected element of the array. (The “i-” notation was used in the 1st edition manual; its hyphen was gradually dropped.) [inodes Wikipedia page]
How the inodes in a filesystem are created depends on the specific filesystem. Several older filesystems create all of the inodes when the filesystem is created, resulting in a fixed number of inodes. For example, ext3 and ext4 filesystems do this. The result is that the filesystem has a fixed number of inodes, which then fixes the number of files or directories that can be held in the filesystem. For filesystems such as ext3 or ext4, it is possible use all of the inodes and still have free storage capacity in the filesystem. However, you won’t be able to store any more data because you have run out of inodes. (It doesn’t happen often, but it is theoretically possible.) If you need more inodes, you have to remake the filesystem, losing all the data already there.
Many recent filesystems use dynamic inode allocation; that is, they create inodes when they are needed. They typically start with a number of inodes, but as inodes are used, the filesystem creates more according to the heuristics of the filesystem. Typically, these inodes come at the expense of data storage capacity, but it is only a small percentage of the total capacity, so it is a reasonable trade-off. So that these filesystems don’t impose performance penalties, additional inodes are not created one at a time, but in blocks according to an algorithm included in the filesystem. A great example of this is the XFS filesystem.
To go even further, modern filesystems such as ZFS, OpenZFS, ReiserFS, and Btrfs don’t really have a fixed-size inode table. To be POSIX compatible or, at worst, mostly POSIX compliant, they provide an equivalent so that any stat-like command (see below) can be satisfied.
Inode Information
You have several easy ways of getting inode information. For example, you can see the inode numbers for files and directories simply by adding the -i switch with the ls command (Listing 1). The integer on the far left is the inode number associated with the file or directory. Remember that in *nix operating systems, everything is a file, including directories.
Listing 1: Listing the Inodes
$ ls -il total 120920 42729565 drwxr-xr-x 7 laytonjb laytonjb 4096 May 15 2020 darshan-3.2.1 31872599 -rw-rw-r-- 1 laytonjb laytonjb 3066907 Nov 20 2020 darshan-3.2.1.tar.gz 31992289 drwxrwxr-x 8 laytonjb laytonjb 4096 Jul 13 2021 darshan-darshan-3.3.1 31865359 -rw-rw-r-- 1 laytonjb laytonjb 4053028 Jul 13 2021 darshan-darshan-3.3.1.tar.gz 32249740 drwxrwxr-x 17 laytonjb laytonjb 12288 Dec 4 2020 fio-fio-3.24 31863782 -rw-rw-r-- 1 laytonjb laytonjb 1027274 Dec 1 2020 fio-fio-3.24.tar.gz 31872514 -rw-rw-r-- 1 laytonjb laytonjb 4363294 Nov 20 2020 hydra-3.3.2.tar.gz 39588527 drwxrwxr-x. 5 laytonjb laytonjb 4096 Jul 13 2020 iozone3_490 31888630 -rw-rw-r--. 1 laytonjb laytonjb 4136960 Dec 9 2020 iozone3_490.tar 31984747 drwxrwxr-x 21 laytonjb laytonjb 4096 Nov 20 2020 Lmod-8.4.15 31863444 -rw-rw-r-- 1 laytonjb laytonjb 19946519 Nov 20 2020 Lmod-8.4.15.tar.gz 31988342 drwxrwxr-x 2 laytonjb laytonjb 4096 Oct 27 14:22 mpibzip2-0.6 31988329 -rw-rw-r-- 1 laytonjb laytonjb 92160 Oct 27 14:18 mpibzip2-0.6.tar 31872452 -rw-rw-r-- 1 laytonjb laytonjb 27311775 Nov 20 2020 mpich-3.3.2.tar.gz 31870282 -rw-rw-r-- 1 laytonjb laytonjb 18473572 Nov 20 2020 mvapich2-2.3.4.tar.gz 31984799 drwxrwxr-x 17 laytonjb laytonjb 4096 Nov 20 2020 OpenBLAS-0.3.10 31872561 -rw-rw-r-- 1 laytonjb laytonjb 12246979 Nov 20 2020 OpenBLAS-0.3.10.tar.gz 31872449 -rw-rw-r-- 1 laytonjb laytonjb 17163544 Nov 20 2020 openmpi-4.0.5.tar.gz 39322319 drwxrwxr-x 7 laytonjb laytonjb 4096 Oct 24 2020 psutil-release-5.7.3 32639755 drwxrwxr-x 2 laytonjb laytonjb 4096 Nov 6 10:25 pxz-master 31866302 -rw-rw-r-- 1 laytonjb laytonjb 13228 Nov 6 10:25 pxz-master.zip 31865740 drwxrwxr-x 6 laytonjb laytonjb 4096 Jun 25 2021 pymp-master 31865276 -rw-rw-r-- 1 laytonjb laytonjb 21738 Jun 25 2021 pymp-master.zip 31860913 -rw-rw-r-- 1 laytonjb laytonjb 2831628 Nov 16 2020 remora-1.8.3.tar.gz 31850774 drwxrwxr-x 5 laytonjb laytonjb 4096 Nov 16 2020 remora-1.8.4 31861221 -rw-rw-r-- 1 laytonjb laytonjb 2833018 Nov 16 2020 remora-1.8.4.tar.gz 31985247 drwxrwxr-x 20 laytonjb laytonjb 4096 Nov 23 2020 singularity-3.6.4 31872719 -rw-rw-r-- 1 laytonjb laytonjb 6154050 Nov 23 2020 singularity-3.6.4.tar.gz
Each time a file or directory is created, an inode number is allocated, and the various entries of the inode are initialized or populated. Conversely, if a file or directory is deleted, the inode number is put back for reuse for a new file or directory.
To see at any time how many inodes exist, how many are used, and how many are free, you can query on a filesystem basis or for the entire system (Listing 2). As you can see, the second column is the number of inodes at the time of the query, the third column is the number of inodes in use (IUsed), and the fourth column is the number of free inodes (IFree). In the case of loopback devices, zero inodes available is expected because no more are needed.
Listing 2: Inode Info for Filesystem
$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on udev 32968052 916 32967136 1% /dev tmpfs 32983590 1409 32982181 1% /run /dev/nvme0n1p2 31227904 814030 30413874 3% / tmpfs 32983590 6 32983584 1% /dev/shm tmpfs 32983590 7 32983583 1% /run/lock tmpfs 32983590 18 32983572 1% /sys/fs/cgroup /dev/loop0 29 29 0 100% /snap/bare/5 /dev/loop3 10847 10847 0 100% /snap/core18/2284 /dev/loop2 10836 10836 0 100% /snap/core18/2253 /dev/loop1 12847 12847 0 100% /snap/core/12725 /dev/loop4 11777 11777 0 100% /snap/core20/1328 /dev/loop5 18500 18500 0 100% /snap/gnome-3-34-1804/77 /dev/loop6 18500 18500 0 100% /snap/gnome-3-34-1804/72 /dev/loop7 17441 17441 0 100% /snap/gnome-3-38-2004/87 /dev/nvme1n1p1 62513152 7087560 55425592 12% /home /dev/nvme0n1p1 0 0 0 - /boot/efi /dev/loop8 17495 17495 0 100% /snap/gnome-3-38-2004/99 /dev/sda1 183144448 38466772 144677676 22% /home2 /dev/loop9 64986 64986 0 100% /snap/gtk-common-themes/1515 /dev/loop11 14 14 0 100% /snap/gtk2-common-themes/13 /dev/loop10 24054 24054 0 100% /snap/p7zip-desktop/220 /dev/loop12 65095 65095 0 100% /snap/gtk-common-themes/1519 /dev/loop13 11777 11777 0 100% /snap/core20/1361 /dev/loop14 480 480 0 100% /snap/snapd/14978 /dev/loop15 17311 17311 0 100% /snap/snap-store/558 /dev/loop16 15841 15841 0 100% /snap/snap-store/547 tmpfs 32983590 49 32983541 1% /run/user/1000
If you are having trouble saving files to a filesystem, it’s a good idea to run df -i to see whether any free inodes are available. If not, depending on the filesystem type, you might have to copy all of the data from the existing filesystem, remake it with a larger number of inodes, and copy the data to the new filesystem.
Another command you can use for examining inode information is stat, which queries the inode information for a particular file or directory and returns some of this information to you (Listing 3).
Listing 3: Inode Info for File
$ stat OpenBLAS-0.3.10.tar.gz File: OpenBLAS-0.3.10.tar.gz Size: 12246979 Blocks: 23920 IO Block: 4096 regular file Device: 10302h/66306d Inode: 31872561 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 1000/laytonjb) Gid: ( 1000/laytonjb) Access: 2021-06-14 02:53:51.268540668 -0400 Modify: 2020-11-20 09:08:56.928744633 -0500 Change: 2020-11-20 09:08:56.936745104 -0500 Birth: -
The command does not output all elements of the specific inode corresponding to the file. For details on what it does output, look at the man pages with the man 2 stat command.
The output from stat gives you a fair amount of information, such as:
- file name
- size (in bytes)
- size of the I/O block (4KiB in this case)
- file type (in this case a regular file)
- device (in hex and decimal)
- number of hard links (1 in this case)
- file permissions in numeric and symbolic
- UID (user ID) of the file
- GID (group ID) of the file
- last time the file was accessed (the line below the permissions, UID, GID)
- last time the file was modified
- last time the file was changed
- date of “Birth,” which isn't supported on Linux
Although I don’t discuss how filesystems are organized with inodes in this article, you can get more detail about the filesystem with the tune2fs command, including more inode information. An example of running the command on an ext4 filesystem is shown in Listing 4. You can scan through the output and pick out inode information, as well as other useful information.
Listing 4: tune2fs on ext4 Filesystem
$ sudo tune2fs -l /dev/nvme0n1p2
tune2fs 1.45.5 (07-Jan-2020)
Filesystem volume name:
Last mounted on: /
Filesystem UUID: db7bca35-5c8d-4587-a2f8-ae0d7108d53d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 31227904
Block count: 124895488
Reserved block count: 6244774
Free blocks: 100624000
Free inodes: 30413904
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Sun Jan 31 09:38:44 2021
Last mount time: Sun Feb 27 07:47:43 2022
Last write time: Sun Feb 27 07:47:43 2022
Mount count: 296
Maximum mount count: -1
Last checked: Sun Jan 31 09:38:44 2021
Check interval: 0 ()
Lifetime writes: 811 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 23db69c1-dd63-4e11-9771-23b5f65ac46c
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0xa90645bd
Summary
The concept of an inode is pretty fundamental to traditional filesystems within Linux and other *nix operating systems. Even for modern filesystems, the idea of an inode is important for POSIX compatibility. Conceptually, an inode is fairly easy to understand: It’s just the data about the data (i.e., metadata), such as the file and group owner, permissions, and several file timestamps.
Some filesystems (e.g., ext3 and ext4) create all the inodes at the time of their creation. Thus, you could “run out” of storage if all of the inodes are used, even though space is still available for more data. To help get around this problem, other filesystems (e.g., XFS) create inodes as needed.
Armed with the basic concepts of inodes, you can examine various filesystems and determine which is right for you. Also, grasping the concept of an inode helps you understand why something as simple as ls -l might take so long to respond.