|This article relies too much on references to primary sources. (August 2014)|
Checkpoint and Restore in Userspace
|Developer(s)||OpenVZ Team at Odin|
|Initial release||23 July 2012|
|Stable release||1.5 (March 2, 2015) [±]|
|Written in||C and Assembler|
|Platform||x86 64, ARM|
|License||GNU GPL v.2|
Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, /krɪʊ/), is a software tool for the Linux operating system. Using this tool, it is possible to freeze a running application (or part of it) and checkpoint it to persistent storage as a collection of files. One can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space, rather than in the kernel.
The project is currently under active development.
In September 2011, the project was presented at the Linux Plumbers Conference. In general, most of the attendees took a positive view of the project, which is proven by the fact that a number of kernel patches required for implementing the project were included in the mainline kernel. Linus Torvalds, however, was a bit skeptical:
A note on this: this is a project by various mad Russians to perform c/r mainly from userspace, with various oddball helper code added into the kernel where the need is demonstrated... However I'm less confident than the developers that it will all eventually work! So what I'm asking them to do is to wrap each piece of new code inside CONFIG_CHECKPOINT_RESTORE. So if it all eventually comes to tears and the project as a whole fails, it should be a simple matter to go through and delete all trace of it.—Linus Torvalds, 
The CRIU tool is being developed as part of the OpenVZ project, with the aim of replacing the in-kernel checkpoint/restore. Though its main focus is to support the migration of containers, allowing users to check-point and restore the current state of running processes and process groups. The tool can currently be used on x86-64 and ARM systems and supports the following features:
- Processes: their hierarchy, PIDs, user and group authenticators (UID, GID, SID, etc.), system capabilities, threads, and running and stopped states
- Application memory: memory-mapped files and shared memory
- Open files
- Pipes and FIFOs
- Unix domain sockets
- Network sockets, including TCP sockets in ESTABLISHED state (see below)
- System IPC
- Linux kernel-specific system calls: inotify, signalfd, eventfd and eventpoll
TCP connection migration
One of the initial project goals was to support the migration of TCP connections, the biggest challenge being to suspend and then restore only one side of a connection. This was necessary for performing the live migration of containers (along with all their active network connections) between physical servers, the main scenario of using the checkpoint/restore feature in OpenVZ. To cope with this problem, a new feature, "TCP repair mode", was implemented. The feature was included in the v3.5 mainline Linux kernel and provides users with additional means to disassemble and reconstruct TCP sockets without the necessity of exchanging network packets with the opposite side of the connection.
The following projects provide functionality similar to CRIU:
- "Checkpoint-restore tool v0.1".
- "Checkpoint-restore tool v1.5".
- "Checkpoint/restore mostly in the userspace".
- "Checkpoint/restart in the userspace".
- Linux kernel git commit, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=099469502f62fbe0d7e4f0b83a2f22538367f734
- http://lwn.net, «TCP connection repair»
- "DMTCP : Distributed MultiThreaded Checkpointing". sourceforge.net.
- "Berkeley Lab Checkpoint/Restart (BLCR)". lbl.gov.
- "Linux Checkpoint / Restart Wiki". kernel.org.
- Rebootless Kernel Update, February 10, 2015, by Sanidhya Kashyap