Read-only file compression with SquashFS

Data Crush

As part of my life experience, I have discovered that people like to keep pretty much every piece of data that's crossed their hard drive. That is, the rm command is rarely, if ever, used. I am no exception. I have lots of data that I want to keep available, yet rarely touch.

Even though you can now get 10TB hard drives and high-performance computing (HPC) systems routinely have more than 1PB of storage, it is fairly easy to run out of space. Users don't take the time to compress their data files to save space – possibly for good reasons. A compressed file has to be uncompressed and then examined to discover its contents. If it is used, then it needs to be compressed again, which means several commands have to be performed just to examine the data.

What could be more useful would be the use of a compressed filesystem. Linux has several options, including the definitely underestimated SquashFS [1].

Compressing Data

The concept behind data compression, which has been around for a long time, is to encode data by using various techniques that save storage space. Data compression also reduces the size of data that is to be transmitted. The most common example in Linux is Gzip, which is used to compress data files. Listing 1 shows a quick example illustrating the change in file size.

Listing 1

Gzip Example

$ ls -lsah FS_scan.csv
3.2M -rw-r--r-- 1 laytonjb laytonjb 3.2M 2014-06-09 20:31 FS_scan.csv
$ gzip -9 FS_scan.csv
$ ls -lsah FS_scan.csv.gz
268K -rw-r--r-- 1 laytonjb laytonjb 261K 2014-06-09 20:31 FS_scan.csv.gz

The original file is 3.2MB, but after using gzip with the

...

Use Express-Checkout link below to read the full article (PDF).