Encrypting Files
The revelations of Edward Snowden caused a big upsurge in the use of encryption for protecting data from inappropriate access. People are now using encrypted filesystems as well as self-encrypted hard drives (SEDs). However, not everyone is using encryption.
Recent revelations about accessing the data of individuals includes the story about how the NSA and Britain’s Government Communications Headquarters (GHCQ) supposedly gained access to SIM cards from Gemalto, allowing them to access any cell phone communications that use these cards. There is also the story about how Lenovo installed malware on its laptops that allows the software to steal web traffic using man-in-the-middle attacks.
When you use an encrypted filesystem or SEDs, all of the data is encrypted. However, if you forget the password, you lose all of the data on the filesystem or drive. It may be easier to encrypt files individually so that if you forget the password, you only lose a single file and not the entire filesystem or drive. Moreover, you might be casually copying your files to the cloud or other backup systems from your desktop, laptop, or cellphone. If you do not encrypt these files yourself, more likely than not, these files are not encrypted.
Using simple tools to encrypt files individually and then copy them to your backup is an easy process. As previously mentioned, by encrypting the files individually, if you forget the password, then theoretically you will lose only a single file (unless you use the same passphrase for all files, in which case you might lose access to all data).
Before you proceed with the rest of this article, realize that I’m not a security or cryptography expert, nor do I play one on TV. Please do your own research. Given that, in the sections below, I review a few file encryption/decryption tools and finish with some personal recommendations on using them.
GPG
To start, I’ll look at the probably the most popular encryption tool, GNU Privacy Guard (GPG). The tool has become popular because it’s fast, the encryption is very good if used correctly, the code is open-source, and it follows the OpenPGP specification that is an IETF standard. GPG was really designed as a command-line encryption tool for files but has been incorporated into email tools for encrypting email.
GPG uses a hybrid encryption approach with a combination of two approaches: symmetric-key encryption and public-key cryptography. Symmetric key encryption/decryption means that both the sender and the receiver share the same key. Typically symmetric-key encryption is used for speed and public-key cryptology is used because of easy secure key exchange.
As mentioned, GPG can be used for encrypting messages such as email. To do this, GPG uses asymmetric key-pairs that are individually generated for each user. From this key pair, you can exchange the public keys with other users using Internet key servers or something similar, allowing them to decrypt the email you have sent.
A variety of encryption options are available with GPG. By default it uses the symmetric encryption algorithm, called CAST5 that is a 128-bit symmetric-key block cipher. Other encryption algorithms available are listed below along with the public key techniques and compression algorithms.
- Public key:
- Cipher:
- Hash
- MD5
- SHA-1
- RIPEMD-160
- SHA-256
- SHA-384
- SHA-512
- SHA-224
- Compression
- ZIP
- ZLIB
- BZIP2
Note: For AES, GPG always use block sizes of 128 bits and a varying key length of 128, 192, and 256 bits.
For some cipher algorithms such as AES-256, the number indicates the length of the hash key used in the algorithm. A general rule of thumb is that the larger the hash key, the more “protected” your data will be (if your passphrase is sufficiently long). However, it also means that it takes more resources, such as CPU, memory, and time, to encrypt the file. If you want to encrypt the file and very rarely decrypt it, you might want to use an algorithm with a very long hash key. If you’re going to be decrypting the file fairly often, then you might want to try a shorter key to improve encryption/decryption time at the expense of somewhat “weaker” encryption. Ultimately the choice is yours but personally I like to encrypt my data with a very long cipher key (almost as large as I can get).
According to evil32.com, using modern GPUs, 32-bit key IDs can be decoded. They say that it only takes four seconds to generate a colliding 32-bit key ID on a GPU. In fact, they claim that they found collisions for every 32-bit key ID in the Web of Trust (WOT) strong set. Breaking the 32-bit key ID doesn’t compromise GPG’s encryption according to the site, but “… it further erodes the usability of GPG and increases the chance of human error.”
Key IDs are not typically used in encrypting data, but you should definitely be aware of them, particularly if you use GPG in every day use. Therefore the researchers highly recommend using 64-bit key IDs.
Using GPG is very easy. You begin with a file and use gpg to encrypt it using the -c option which uses a symmetric key as well as the default CAST5 cipher. The example below encrypts the text file hpc_001.html:
[laytonjb@home4 TEMP]$ ls -s total 11228 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html [laytonjb@home4 TEMP]$ gpg -c hpc_001.html [laytonjb@home4 TEMP]$ ls -s total 11256 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html 28 hpc_001.html.gpg
Notice that the gpg command leaves the original file in place and creates a new file with a .gpg extension. Also notice that encrypting a simple text file produced a much smaller encrypted file than the plain text original.
During the encryption I had to enter my passphrase twice. You have to remember this passphrase because without it, you cannot decrypt the file. Please remember this. The data cannot be recovered without a massive amount of CPU time to crack the encryption. This is no joke – cracking the file could potentially take years (many years). Therefore do not forget the passphrase, but also don’t write it down and leave it somewhere.
You can also compress the text file before you encrypt it as shown here:
[laytonjb@home4 TEMP]$ gzip -9 hpc_001.html [laytonjb@home4 TEMP]$ ls -s total 11084 11032 Flying_Beyond_the_Stall.pdf 28 hpc_001.html.gpg 24 hpc_001.html.gz [laytonjb@home4 TEMP]$ gpg -c hpc_001.html.gz [laytonjb@home4 TEMP]$ ls -s total 11108 11032 Flying_Beyond_the_Stall.pdf 28 hpc_001.html.gpg 24 hpc_001.html.gz 24 hpc_001.html.gz.gpg
Notice that the compressed file hpc_001.html.gz is encrypted this time. [Note: GPG typically has the option of compressing the file as well as encrypting it, but I like to keep things separate.]
To decrypt the encrypted file to another file, you just use the -d -o (decrypted file) options. The -o option directs the output to a file, and the -d option tells GPG to decrypt the file. In the example below, I decrypt the compressed file hpc_001.html.gz:
[laytonjb@home4 TEMP]$ gpg -o hpc_001.html.gz -d hpc_001.html.gz.gpg gpg: 3DES encrypted data gpg: encrypted with 1 passphrase gpg: WARNING: message was not integrity protected [laytonjb@home4 TEMP]$ ls -s total 11108 11032 Flying_Beyond_the_Stall.pdf 28 hpc_001.html.gpg 24 hpc_001.html.gz 24 hpc_001.html.gz.gpg
During the decryption I had to give the passphrase that I used to encrypt the file. Notice that the decrypted file is called hpc_001.hml.gz. (I erased the original hpc_001.html.gz before I decrypted the file.) You can check that the file is correct by uncompressing it and then looking at the first few lines, which should be text:
[laytonjb@home4 TEMP]$ gunzip hpc_001.html.gz [laytonjb@home4 TEMP]$ ls -s total 11280 11032 Flying_Beyond_the_Stall.pdf 28 hpc_001.html.gpg 196 hpc_001.html 24 hpc_001.html.gz.gpg [laytonjb@home4 TEMP]$ head -n 5 hpc_001.html HPC Storage - Getting Started with IO profiling applications
Looks like plain text to me and it matches the original file.
You can also choose a cipher other than CAST5. In the example below, the AES-256 cipher is used to encrypt the PDF file in the directory.
[laytonjb@home4 TEMP]$ ls -s total 11228 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html [laytonjb@home4 TEMP]$ gpg -c -crypto-algo=AES256 Flying_Beyond_the_Stall.pdf gpg: WARNING: recipients (-r) given without using public key encryption [laytonjb@home4 TEMP]$ ls -s total 20940 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html 9712 Flying_Beyond_the_Stall.pdf.gpg
The option -crypto-algo=AES256 tells GPG to use the AES-256 cipher to encrypt the file. Again, I had to enter my passphrase twice to encrypt the file.
GPG is very flexible and powerful. For example, you have options for handling keys so that you don’t have to enter a passphrase (unattended key generation), but keep in mind that these should be 64-bit and not the typical 32-bit keys. Articles on the Internet can walk you through these options and how you can use them.
ZIP
ZIP is an archive file format, something along the lines of TAR. In addition to collecting files in a single archive file as tar does, zip can also compress the resulting archive or components of the archive. It supports several compression methods including the following:
- Shrink
- Reduce (levels 1-4)
- Implode
- Deflate
- Deflate64
- bzip2
- LZMA (EFS)
- WavPack
- PPMd
According to the Wikipedia link, the most popular compression method is Deflate.
In addition to creating an archive and compression, zip is also capable of encrypting the archive. It can use AES methods, which are documented in the .zip file format specification. Also, starting in version 6.2 of the ZIP format, file name encryption was introduced so that metadata was encrypted in what is called the “Central Directory” portion of the ZIP. However, there are portions of the archive where the file names are not encrypted.
Using zip to encrypt files is very similar to using gpg, as shown in the example below:
[laytonjb@home4 TEMP]$ ls -s total 11228 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html [laytonjb@home4 TEMP]$ zip --password MY_SECRET file.zip hpc_001.html adding: hpc_001.html (deflated 88%) [laytonjb@home4 TEMP]$ ls -s total 11252 24 file.zip 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html
In the command line, the option --password MY_SECRET specifies the passphrase as MY_SECRET. (You can use the -P option instead of --password.) If you want to use a longer passphrase with blanks, enclose it in quotes:
[laytonjb@home4 TEMP]$ zip --password 'Help me Watson' file.zip hpc_001.html adding: hpc_001.html (deflated 88%) [laytonjb@home4 TEMP]$ ls -s total 11252 24 file.zip 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html
However, specifying the passphrase on the command line means that it will be in the “history” of the shell. This is probably not the most secure way to encrypt files with zip. Perhaps a better way is just to use the encrypt option (-e); then, it will prompt you for the passphrase, which you have to enter twice.
[laytonjb@home4 TEMP]$ zip -r -0 -e files.zip ./ Enter password: Verify password: adding: Flying_Beyond_the_Stall.pdf (stored 0%) adding: hpc_001.html (stored 0%) [laytonjb@home4 TEMP]$ ls -s total 22456 11228 files.zip 11032 Flying_Beyond_the_Stall.pdf 196 hpc_001.html
The options used are,
- -r: recursively zip
- -0: no compression (for faster execution)
- -e: encrypt (prompts the user for a passphrase)
The command takes all of the files in the current directory and sub-directories and creates a single archive without compression. However, if you compressed the archive, during the compression, zip will post the list of files in the archive. Depending on your level of paranoia, you might not want this to happen. In that case, it might be better to use tar to create the archive and then compress and encrypt it with zip (i.e., zip -e).