Encrypted backup with Duplicity
Packed and Sent
Duplicity [1] packages one or more directories into a tar archive, encrypts the results with GnuPG, and automatically uploads the backup created in this way to a backup server. Signatures help reveal tampering or disk failure, which means backups can be stored on insecure servers or in the cloud. Duplicity even offers native functions for talking to some well-known cloud services.
Additionally, Duplicity can create incremental backups, in which the transferred archive contains only the delta to the previously created backup. This not only saves disk space on the server but also means individual backups are created faster. Duplicity is licensed under the GNU GPL and thus can be used free of charge.
Duplicity is tailored for Linux and other Unix operating systems, such as BSD or OS X. Most major Linux distributions have it in their repositories. Users of OS X can install it via Fink, for example. For Ubuntu-based distributions, there is also a PPA [2] with the current Duplicity version. Alternatively, Duplicity can be built quickly from the source code (see the box "Self-Build"). Duplicity works on Windows in the Cygwin environment but is unable to handle the specific features of the Windows filesystem. Administrators should back up Windows systems with some other software if possible.
Self-Build
If you to build Duplicity 0.7 yourself, you need the following programs and libraries:
- Python 2 version 2.6, along with the developer packages (for Ubuntu in the python-dev package)
- librsync version 0.9.6 along with developer files (for Ubuntu in the packages librsync1 and librsync-dev )
- GnuPG version 1.x
- lftp as of version 3.7.15
- A development environment with GCC
- The Python modules
python-lockfile
,python paramiko
, andpython-pycryptopp
(the latter is frequently found in the python-cryptopp package)
Also, under certain circumstances, you might need other libraries for accessing cloud services. If you want Duplicity to store your backups on Amazon S3, you will need Boto version 2.0 or newer (for Ubuntu in the python-boto package).
Once the conditions are met, you can download the source code package from the Duplicity website, extract it, and then – working as root or with administrative rights – call
python setup.py install
in the source code directory. This step installs Duplicity for all users in /usr
.
When this issue went to press, the latest stable version was 0.6.26. This will be replaced shortly by the new version 0.7, of which Ubuntu 4.15 already includes a developer version. The old series 0.6 can thus be considered outdated, but the developers say they will continue supplying bug fixes. Do not be put off by the low version number: Duplicity has been around since 2002 and has proved itself tough in everyday use.
Creating Backups
Duplicity is very simple to operate. At the command line, you pass in the directory to be backed up and the storage directory to the tool. The following example packages the complete /etc
directory in a tar archive, encrypts it, and uses secure copy (scp
) to store the results on the server at example.com
below a directory named /var/backup
(Figure 1). Note the double slashes after the domain name:
duplicity --progress /etc scp://dd@example.com//var/backup
During the backup, Duplicity considers deleted files, all file permissions, subdirectories, FIFOs, device files, and symbolic links, but not hard links. Specifying the --progress
parameter tells Duplicity to indicate the progress continuously. Note that the tool always expects parameters in front of the directory information. Furthermore, you must ensure that Duplicity has the correct permissions. In the above case, it must therefore be allowed to access /etc
and all its contents.
Duplicity automatically compresses the archive with gzip
, which can be switched off with the --no-compression
option. Additionally, Duplicity creates some temporary files in the appropriate directory – for Linux, this is usually in /tmp
. If you have insufficient free space, you can use --tempdir /<path/to>/tmp
to define another directory. In previous Duplicity versions, users had to define the temporary directory in the environment variable TMPDIR
. The developers have made this method obsolete, however.
Duplicity encrypts the resulting archive with GnuPG. For this reason, you need to create and type a password (the GnuPG key) after calling Duplicity; you will need the password to restore the backup later. Accordingly, you will want to make the password as long and cryptic as possible – but not so long that you forget it; otherwise, you can say goodbye to your data.
Transferring Passwords for SSH and FTP
The previous command assumes that you log in to the SSH server using private and public keys. If you want to authenticate via password, specify the Duplicity --ssh-askpass
parameter. The tool then prompts you for the required SSH password when connecting. If the SSH server is not listening on the default port, you also need to specify the port in the usual way separated by colons after the domain name:
duplicity /etc scp://dd@example.com:2222//var/backup
If you want to store the backup on an FTP server, you need to enter the password for the server in the FTP_PASSWORD
environment variable. In the following example, it is 123
. For the FTP transmission method, the domain name is followed by a slash:
FTP_PASSWORD=123 duplicity /home/tim ftp://dd@example.com/var/backup
Incidentally, Duplicity also evaluates the FTP_PASSWORD
environment variable for an SSH connection. You can thus omit the --ssh-askpass omit
parameter and define the SSH password in the FTP_PASSWORD
environment variable. This is especially useful if you want to include Duplicity in a script. If you want Duplicity to create the backup archive on a local storage medium, use the file://
protocol:
duplicity /etc file:///mnt/backup
Duplicity can transmit the backup archive with many other protocols such as Rsync and WebDAV. Additionally, Duplicity can store the backups in various cloud services, including Dropbox, Azure, OpenStack Swift, and Amazon S3, along with a couple of quirky storage memory options such as sending email. Almost every new release of Duplicity adds new protocols. For a complete and quite long list, check the Duplicity man page. To access this, type man duplicity
and look for the "URL format" section. The man page for the current Duplicity version is also available online [1].
For some protocols and services, Duplicity requires additional libraries and tools (see the box "Modules Used"). The backup program prompts for any missing helpers when called. On Linux, there is no need for manual attention for the standard protocols, but this is not true for many cloud services. For example, to access Amazon S3, you need Boto [3] software version 2.0 or newer. For a complete list of all dependencies for all supported services, see the Duplicity man page "Requirements" section.
Modules Used
Duplicity is programmed in Python and is modular. The actual transfer of backups is handled by the back end. For example, the LFTP back end takes care of communication with FTP servers. Most back ends involve other command-line programs for data transmission. The LFTP back end requires the lftp
client, for example. Sometimes you even have the choice between different back ends. For example, when transferring via SCP, Duplicity uses the Paramiko back end, based on the Python library of the same name.
If you specify pexpect+scp://
instead of the scp://
URL prefix, Duplicity uses the Pexpect Python library. In general, the predefined back ends are selected carefully. A change is only worthwhile if, for example, a library is not available.
Storing Encrypted Backups
Duplicity uses symmetric encryption by default – that is, the same password is used to encrypt and decrypted the backup. Alternatively, the tool can use GnuPG public key encryption. Here, each user has two keys: An archive locked with the public key can only be unlocked again with the private key.
If you want to use a new key pair for the backup, create the key before the first backup using gpg --gen-key
. To do this, answer the questions posed; if in doubt, leave the fields blank or accept the default settings by pressing Enter (Figure 2). You will need to type the passphrase each time for encryption and decryption. At the end, GPG outputs a key ID, which you will want to remember.
Because the keypair secures your backup, you will want to save it on an external medium. Use the following two commands to create a copy of the public and private keys in the files /mnt/key_pub.gpg
and /mnt/key_sec.gpg
:
gpg --output /mnt/key_pub.gpg --armor --export Key-ID gpg --output /mnt/key_sec.gpg --armor --export-secret-keys Key-ID
On another system, or after system recovery, the key can then be reloaded using gpg --import
. When creating a new backup, you need to tell Duplicity the key ID of the public key using the --encrypt-key
parameter:
duplicity --ssh-askpass --encrypt-key 12345678 /etc scp://dd@example.com//var/backup
Normally, you need to type the passphrase after calling the command. If Duplicity runs directly, the GPG agent has probably stored the passphrase in the background. Furthermore, Duplicity buffers some metadata in the ~/.cache/duplicity/
directory that it retrieves whenever called.
When you call Duplicity from a script, you can enter the passphrase in the PASSPHRASE
environment variable. This method, however, poses a security risk: Anyone who can read the script automatically discovers the passphrase. If you set the environment variable in a script, you should at least explicitly dump its contents from memory afterward using unset
. You can completely disable encryption with --no-encryption
.
Buy this article as PDF
(incl. VAT)