|This article relies too much on references to primary sources. (August 2014) (Learn how and when to remove this template message)|
Checkpoint and Restore in Userspace
|Developer(s)||OpenVZ Team at Virtuozzo|
|Initial release||23 July 2012|
|Stable release||2.11 (February 13, 2017) [±]|
|Written in||C and Assembler|
|Platform||x86 64, ARM, PPC|
|License||GNU GPL v.2|
Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, /krɪʊ/), is a software tool for the Linux operating system. Using this tool, it is possible to freeze a running application (or part of it) and checkpoint it to persistent storage as a collection of files. One can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space, rather than in the kernel.
The project is currently under active development, with monthly release cycle for stable releases.
In September 2011, the project was presented at the Linux Plumbers Conference. In general, most of the attendees took a positive view of the project, which is proven by the fact that a number of kernel patches required for implementing the project were included in the mainline kernel. Andrew Morton, however, was a bit skeptical:
A note on this: this is a project by various mad Russians to perform c/r mainly from userspace, with various oddball helper code added into the kernel where the need is demonstrated... However I'm less confident than the developers that it will all eventually work! So what I'm asking them to do is to wrap each piece of new code inside CONFIG_CHECKPOINT_RESTORE. So if it all eventually comes to tears and the project as a whole fails, it should be a simple matter to go through and delete all trace of it.— Andrew Morton, 
The CRIU tool is being developed as part of the OpenVZ project, with the aim of replacing the in-kernel checkpoint/restore. Though its main focus is to support the migration of containers, allowing users to check-point and restore the current state of running processes and process groups. The tool can currently be used on x86-64 and ARM systems and supports the following features:
- Processes: their hierarchy, PIDs, user and group authenticators (UID, GID, SID, etc.), system capabilities, threads, and running and stopped states
- Application memory: memory-mapped files and shared memory
- Open files
- Pipes and FIFOs
- Unix domain sockets
- Network sockets, including TCP sockets in ESTABLISHED state (see below)
- System IPC
- Linux kernel-specific system calls: inotify, signalfd, eventfd and epoll
As of September 2013[update], no kernel patching is required because all of the required functionality has already been merged into the Linux kernel mainline since kernel version 3.11, which was released on September 2, 2013.
TCP connection migration
One of the initial project goals was to support the migration of TCP connections, the biggest challenge being to suspend and then restore only one side of a connection. This was necessary for performing the live migration of containers (along with all their active network connections) between physical servers, the main scenario of using the checkpoint/restore feature in OpenVZ. To cope with this problem, a new feature, "TCP repair mode", was implemented. The feature was included in version 3.5 of the Linux kernel mainline and provides users with additional means to disassemble and reconstruct TCP sockets without the necessity of exchanging network packets with the opposite side of the connection.
The following projects provide functionality similar to CRIU:
- Pavel Emelyanov (2012-07-23). "Checkpoint-restore tool v0.1".
- "criu 2.11 released".
- Pavel Emelyanov (2011-07-15). "Checkpoint/restore mostly in the userspace".
- "Checkpoint/restart in the userspace". Linux Plumbers Conf 2011.
- "Merge branch 'akpm' (aka "Andrew's patch-bomb, take two")". Linux kernel source tree. 2012-01-13.
- "Installation: Linux Kernel".
Linux kernel v3.11 or newer is required, with some specific options set
- "Linux kernel 3.11, Section 1.5. Detailed tracking of which pages a task writes". kernelnewbies.org. 2013-09-02. Retrieved 2016-05-03.
- Pavel Emelyanov (2012-02-29). "TCP connection repair". Linux Netdev Mailing List.
- "DMTCP: Distributed MultiThreaded CheckPointing". SourceForge.
- "Berkeley Lab Checkpoint/Restart (BLCR) for LINUX". Lawrence Berkeley National Laboratory.
- "Linux Checkpoint/Restart". kernel.org.
- Sanidhya Kashyap. "Rebootless Kernel Update and Validation".
- Rami Rosen. "Linux Containers and the Future Cloud" (PDF).