|Original author(s)||Paul Menage, Rohit Seth|
|Developer(s)||Tejun Heo et al.|
|Type||resource management for process groups|
|License||GNU General Public License|
This work was started by engineers at Google (primarily Paul Menage and Rohit Seth) in 2006 under the name "process containers"; in late 2007 it was renamed 'Control Groups' (due to the confusion caused by multiple meanings of the term "container" in the Linux kernel) and merged into kernel version 2.6.24. Since then, many new features and controllers have been added, such as support for kernfs and the introduction of a unified hierarchy.
One of the design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes (like nice) to whole operating system-level virtualization (like OpenVZ, Linux-VServer, LXC). Cgroups provides:
- Resource limiting: groups can be set to not exceed a set memory limit — this also includes file system cache. The original paper was presented at Linux Symposium and can be found at Containers: Challenges with the memory resource controller and its performance.
- Prioritization: some groups may get a larger share of CPU or disk I/O throughput.
- Accounting: to measure how much resources certain systems use for e.g. billing purposes.
- Control: freezing groups or checkpointing and restarting.
A control group is a collection of processes that are bound by the same criteria. These groups can be hierarchical, where each group inherits limits from its parent group. The kernel provides access to multiple controllers (subsystems) through the cgroup interface. For instance, the "memory" controller limits memory use, "cpuacct" accounts CPU usage, etc.
Control groups can be used in multiple ways:
- By accessing the cgroup virtual file system manually.
- Create and manage groups on the fly using tools like cgcreate, cgexec, cgclassify (from libcgroup).
- The "rules engine daemon" that can automatically move processes of certain users, groups, or commands to cgroups as specified in configuration.
- Indirectly through other software that uses cgroups, such as Linux Containers (LXC) virtualization, libvirt, systemd, Open Grid Scheduler/Grid Engine, and Google's lmctfy.
The Linux kernel documentation contains full technical details of the setup and use of control groups.
While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, UTS, network and SysV IPC namespaces.
- The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.
- Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.
- "UTS" namespace allows changing the hostname.
- Mount namespace allows creating a different file system layout, or making certain mount points read-only.
- IPC namespace isolates the System V inter-process communication between namespaces.
Early in cgroups development, the "ns" subsystem was added, to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel.
Whenever designing a software, the software engineer picks those solutions, which overall best blend the exigencies regarding stability, security, performance, as well as maintainability, programmability (API) and usability (ABI). By their nature, they balance each other, e.g. a mighty API to user space, that offers not too much functionality, but simply exposes some of the wrong inner workings, might seriously compromise kernel stability and security. That is especially true, if that software is part of the Linux kernel. Tejun Heo, decided to alter cgroups to that effect, that there will be one unified hierarchy and, that only one user space entity will have exclusive access to the facilities offered by cgroups.
Kernfs was introduced into the Linux kernel with version 3.14, the main author being Tejun Heo. One of the main motivators for a separate kernfs is the cgroups file-system. Kernfs is basically the splitting off of some of the sysfs logic into an independent entity so that other kernel subsystems can more easily implement their own virtual file-system with handling for device connect and disconnect, dynamic creation and removal as needed or unneeded, and other attributes.
It is anticipated that debugfs will move to being Kernfs-based in the future as well.
Various projects are using cgroups as their basis, including the following:
- Jonathan Corbet (29 May 2007). "Process containers". LWN.net.
- Jonathan Corbet (29 October 2007). "Notes from a container". LWN.net.
- "cgroup: convert to kernfs". 2014-01-28.
- "cgroup: prepare for the default unified hierarchy". 2014-03-13.
- Jonathan Corbet (31 July 2007). "Controlling memory use in containers". LWN.
- Balbir Singh, Vaidynathan Srinivasan (July 2007). "Containers: Challenges with the memory resource controller and its performance". Ottawa Linux Symposium.
- Jonathan Corbet (23 October 2007). "Kernel space: Fair user scheduling for Linux". Network World. Retrieved 2012-08-22.
- Kamkamezawa Hiroyu (19 November 2008). "Cgroup and Memory Resource Controller" (PDF presentation slides). Japan Linux Symposium.
- Dave Hansen. "Resource Management" (PDF presentation slides). Linux Foundation.
- Matt Helsley (3 February 2009). "LXC: Linux container tools". IBM developerWorks.
- "Grid Engine cgroups Integration". Scalable Logic. 2012-05-22.
- cgroups, kernel.org
- Pavel Emelyanov, Kir Kolyshkin (19 November 2007). "PID namespaces in the 2.6.24 kernel". LWN.net.
- Jonathan Corbet (30 January 2007). "Network namespaces". LWN.net.
- Serge E. Hallyn, Ram Pai (17 September 2007). "Applying mount namespaces". IBM developerWorks.
- Janak Desai (11 January 2006). "Linux kernel documentation on unshare".
- "kernfs, sysfs, driver-core: implement synchronous self-removal". LWN.net. 2014-02-03. Retrieved 2014-04-07.