Save and Restore Linux Processes with CRIU

On Ice

Pitfalls

"Freezing and thawing" does not work for some processes. The CRIU developers offer a short list of officially supported software [6]; it includes programs such as Make. GCC, Tar, Git, Apache, MySQL, SSH, and MongoDB. You can never freeze processes that are accessing a hardware device – whether block or character – for two reasons: The precise functions of the device are hidden from CRIU, and when restoring a process, the device could be missing.

Because CRIU uses the same interfaces as a debugger, the tool cannot freeze any processes that are already being monitored by Strace or GDB.Also, CRIU does not back up processes that hold file locks, because CRIU cannot determine whether another process is allowed access to the file in question. However, the optional --file-locks parameter forces CRIU to back up the lock, too. Furthermore, version 1.1 of CRIU cannot cope with the Btrfs filesystem, although the developers are working on a solution.

Additionally, some values can change after restoring a process, including the IDs of mountpoints, sockets, and the process start time; cat /proc/1234/stat (e.g., for a process with an ID of 1234 in field 22) unveils the start time.

On the Network

CRIU freezes programs that communicate over a TCP connection with the help of the Linux kernel. To do so, it closes the socket and blocks the TCP connection with additional firewall rules. CRIU thus ensures that the connection remains in the same state when saved. Therefore, this firewall rule must still be in the Netfilter table when you restore the process.

To trigger the use of a TCP connection, you pass CRIU the --tcp-established parameter for the backup and restore. Such a process can only be frozen and restored exactly once. Any further attempt will fail, because the TCP connection then has a different state. The CRIU wiki describes the technical background in detail [7].

CRIU not only freezes the process, but for safety's sake, its child processes and any dependent processes, as well. If two programs talk through a pipe or a Unix socket, CRIU therefore needs to freeze both simultaneously.

In some situations, however, CRIU can only freeze one of the two processes; in this case, CRIU refuses to continue and outputs an external socket is used error message. Using the --ext-unix-sk parameter for the backup and restore, you can still persuade CRIU at least to back up a process. When restoring, however, you must then ensure that the remote site already exists.

Conclusions

CRIU provides extremely useful functionality, and many fixes have been made since the version used in this article [8], but caution is advised in production use. Extensive pretesting with the proposed setup is therefore an essential requirement. CRIU development is progressing at a rapid pace, and anyone who wants to freeze and migrate processes should keep CRIU in mind (see also the "Remote Control" box). Further information, including insight into CRIU techniques, is provided by the extensive wiki [5].

Remote Control

CRIU can also be started as a daemon and then remotely controlled by RPC calls:

criu service --daemon -o logdatei.txt

The --daemon option starts CRIU as precisely that; -o names the logfile to which CRIU writes its output. The tool then waits for RPC requests on the Unix socket SOCK_SEQPACKET below /var/run/criu-service.socket. Client programs output messages there using the CRIU-RPC protocol. Their detailed structure is described by the CRIU wiki [9]; sample programs for C and Python are found in the CRIU source code directory below test/rpc. C programmers can alternatively use the libcriu library, which provides wrapper functions for the corresponding RPC calls [10].

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Analyzing Kernel Crash Dumps

    If the Linux server crashes, not only do you need to restore operations, you also need to analyze the problem. A kernel crash dump at the time of the crash can be a big help.

  • Live snapshots with Virtual Machine Manager
    In the scope of developing Fedora 20, the live snapshot function, which has long been supported by libvirt, was integrated with the graphical front end. If you prefer to avoid command-line acrobatics à la Virsh, you can now freeze your virtual KVM and Xen machines in VMM at the press of a button.
  • PostgreSQL 9.3

    The new PostgreSQL 9.3 release introduces several speed and usability improvements, as well as SQL standards compliance.

  • New in PostgreSQL 9.3
    The new PostgreSQL 9.3 release introduces several speed and usability improvements, as well as SQL standards compliance.
  • Maintaining Android in the enterprise
    No matter how insecure Android might appear, you can't escape the "bring your own device" philosophy in today's corporate environment. In this article, we show how admins can use on-board tools in Android phones to regain a little control.
comments powered by Disqus