Jump to content

Cgroups: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Thiagocv (talk | contribs)
Rescuing 1 sources and tagging 0 as dead. #IABot (v1.5beta)
Line 53: Line 53:
One of the design goals of cgroups is to provide a unified interface to many different [[use case]]s, from controlling single processes (by using [[nice (Unix)|nice]], for example) to whole [[operating system-level virtualization]] (as provided by [[OpenVZ]], [[Linux-VServer]] or [[LXC]], for example). Cgroups provides:
One of the design goals of cgroups is to provide a unified interface to many different [[use case]]s, from controlling single processes (by using [[nice (Unix)|nice]], for example) to whole [[operating system-level virtualization]] (as provided by [[OpenVZ]], [[Linux-VServer]] or [[LXC]], for example). Cgroups provides:
* ''Resource limiting''{{snd}} groups can be set to not exceed a configured [[main memory|memory]] limit, which also includes the [[page cache|file system cache]]<ref>{{Cite news |date=31 July 2007 |author= Jonathan Corbet | title = Controlling memory use in containers | publisher=LWN |url = //lwn.net/Articles/243795/}}</ref><ref name="ols-memcg">{{cite web |title=Containers: Challenges with the memory resource controller and its performance | author= Balbir Singh, Vaidynathan Srinivasan | date = July 2007| publisher = Ottawa Linux Symposium | url = http://www.kernel.org/doc/ols/2007/ols2007v2-pages-209-222.pdf }}</ref>
* ''Resource limiting''{{snd}} groups can be set to not exceed a configured [[main memory|memory]] limit, which also includes the [[page cache|file system cache]]<ref>{{Cite news |date=31 July 2007 |author= Jonathan Corbet | title = Controlling memory use in containers | publisher=LWN |url = //lwn.net/Articles/243795/}}</ref><ref name="ols-memcg">{{cite web |title=Containers: Challenges with the memory resource controller and its performance | author= Balbir Singh, Vaidynathan Srinivasan | date = July 2007| publisher = Ottawa Linux Symposium | url = http://www.kernel.org/doc/ols/2007/ols2007v2-pages-209-222.pdf }}</ref>
* ''Prioritization''{{snd}} some groups may get a larger share of CPU utilization<ref>{{Cite news | date = 23 October 2007 | author = Jonathan Corbet | title = Kernel space: Fair user scheduling for Linux |publisher = Network World | url = http://www.networkworld.com/news/2007/101207-kernel.html|accessdate = 2012-08-22}}</ref> or disk I/O throughput<ref>{{cite conference |date = 19 November 2008 | author = Kamkamezawa Hiroyu | title = Cgroup and Memory Resource Controller |format= PDF presentation slides |publisher=Japan Linux Symposium | url= http://www.linuxfoundation.jp/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf}}</ref>
* ''Prioritization''{{snd}} some groups may get a larger share of CPU utilization<ref>{{Cite news | date = 23 October 2007 | author = Jonathan Corbet | title = Kernel space: Fair user scheduling for Linux |publisher = Network World | url = http://www.networkworld.com/news/2007/101207-kernel.html|accessdate = 2012-08-22}}</ref> or disk I/O throughput<ref>{{cite conference | date = 19 November 2008 | author = Kamkamezawa Hiroyu | title = Cgroup and Memory Resource Controller | format = PDF presentation slides | publisher = Japan Linux Symposium | url = http://www.linuxfoundation.jp/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf | deadurl = yes | archiveurl = https://web.archive.org/web/20110722113016/http://www.linuxfoundation.jp/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf | archivedate = 22 July 2011 | df = dmy-all }}</ref>
* ''Accounting''{{snd}} measures a group's resource usage, which may be used, for example, for billing purposes<ref name=lf-hansen>{{cite conference | author = Dave Hansen | title = Resource Management | format = PDF presentation slides | publisher = Linux Foundation | url = http://events.linuxfoundation.org/slides/lfcs09_hansen2.pdf}}</ref>
* ''Accounting''{{snd}} measures a group's resource usage, which may be used, for example, for billing purposes<ref name=lf-hansen>{{cite conference | author = Dave Hansen | title = Resource Management | format = PDF presentation slides | publisher = Linux Foundation | url = http://events.linuxfoundation.org/slides/lfcs09_hansen2.pdf}}</ref>
* ''Control''{{snd}} freezing groups of processes, their [[Application checkpointing|checkpointing]] and restarting<ref name=lf-hansen />
* ''Control''{{snd}} freezing groups of processes, their [[Application checkpointing|checkpointing]] and restarting<ref name=lf-hansen />

Revision as of 09:54, 2 August 2017

cgroups
Original author(s)Paul Menage, Rohit Seth
Developer(s)kernel.org (Tejun Heo et al.) and freedesktop.org
Initial release2007; 17 years ago (2007)
Written inC
Operating systemLinux
TypeSystem software
LicenseGPL and LGPL
Websitewww.kernel.org/doc/Documentation/cgroup-v1/ and www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

Engineers at Google (primarily Paul Menage and Rohit Seth) started the work on this feature in 2006 under the name "process containers".[1] In late 2007, the nomenclature changed to "control groups" to avoid confusion caused by multiple meanings of the term "container" in the Linux kernel context, and the control groups functionality was merged into the Linux kernel mainline in kernel version 2.6.24, which was released in January 2008.[2] Since then, developers have added many new features and controllers, such as support for kernfs,[3] firewalling,[4] and unified hierarchy.[5]

Versions

There are two versions of cgroups.

Cgroups was originally written by Paul Menage et al. and mainlined into the Linux kernel in 2007. Afterwards this is called cgroups version 1.[6]

Then development and maintenance of cgroups was taken over by Tejun Heo. Tejun Heo redesigned and rewrote cgroups. This rewrite is now called version 2, the documentation of cgroups-v2 first appeared in Linux kernel 4.5 released on March 14, 2016.[7]

Unlike v1, cgroup v2 has only a single process hierarchy and discriminates between processes, not threads.

Features

One of the design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes (by using nice, for example) to whole operating system-level virtualization (as provided by OpenVZ, Linux-VServer or LXC, for example). Cgroups provides:

  • Resource limiting – groups can be set to not exceed a configured memory limit, which also includes the file system cache[8][9]
  • Prioritization – some groups may get a larger share of CPU utilization[10] or disk I/O throughput[11]
  • Accounting – measures a group's resource usage, which may be used, for example, for billing purposes[12]
  • Control – freezing groups of processes, their checkpointing and restarting[12]

Use

As an example of indirect usage, systemd assumes exclusive access to the cgroups facility

A control group (abbreviated as cgroup) is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical, meaning that each group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface;[2] for example, the "memory" controller limits memory use, "cpuacct" accounts CPU usage, etc.

Control groups can be used in multiple ways:

  • By accessing the cgroup virtual file system manually.
  • By creating and managing groups on the fly using tools like cgcreate, cgexec, and cgclassify (from libcgroup).
  • Through the "rules engine daemon" that can automatically move processes of certain users, groups, or commands to cgroups as specified in its configuration.
  • Indirectly through other software that uses cgroups, such as Docker, Linux Containers (LXC) virtualization,[13] libvirt, systemd, Open Grid Scheduler/Grid Engine,[14] and Google's lmctfy.

The Linux kernel documentation contains full technical details of the setup and use of control groups.[15]

Redesign

Redesign of cgroups started in 2013,[16] with additional changes brought by versions 3.15 and 3.16 of the Linux kernel.[17][18][19]

Namespace isolation

While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, UTS, network and SysV IPC namespaces.

  • The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.[20]
  • Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.[21]
  • "UTS" namespace allows changing the hostname.
  • Mount namespace allows creating a different file system layout, or making certain mount points read-only.[22]
  • IPC namespace isolates the System V inter-process communication between namespaces.
  • User namespace isolates the user IDs between namespaces.[23]

Namespaces are created with the "unshare" command or syscall, or as new flags in a "clone" syscall.[24]

The "ns" subsystem was added early in cgroups development to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel.

Linux namespaces were inspired by the more general namespace functionality used heavily throughout Plan 9 from Bell Labs.[25]

Unified hierarchy

Whenever designing software, a software engineer seeks solutions which overall best address exigencies regarding stability, security, performance, as well as maintainability, programmability (API) and usability (ABI). By their nature, these exigencies balance each other, e.g., a mighty API to user space, that doesn't offer too much functionality, but carelessly exposes some key inner working, might seriously compromise stability and security. That is especially true if that software is part of the Linux kernel.

Tejun Heo decided to alter cgroups to prevent these scenarios, designing and implementing a unified hierarchy with only one user space entity that has exclusive access to the facilities offered by cgroups.

Kernfs was introduced into the Linux kernel with version 3.14, the main author being Tejun Heo.[26] One of the main motivators for a separate kernfs is the cgroups file system. Kernfs is basically created by splitting off some of the sysfs logic into an independent entity so that other kernel subsystems can more easily implement their own virtual file system with handling for device connect and disconnect, dynamic creation and removal as needed or unneeded, and other attributes. Redesign continued into version 3.15 of the Linux kernel.[27]

Kernel memory control groups (kmemcg)

Kernel memory control groups (kmemcg) were merged into version 3.8 (February 2013, 18; 11 years ago (18-02-2013)) of the Linux kernel mainline.[28][29][30] The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes.

Adoption

Various projects use cgroups as their basis, including CoreOS, Docker, Hadoop, Jelastic, Kubernetes,[31] lmctfy (Let Me Contain That For You), LXC (LinuX Containers), systemd, Mesos and Mesosphere,[31] HTCondor and major Linux distribution also adopted it such as Red Hat Enterprise Linux 6 in November 2010, 3 years later from mainline Linux kernel adoption.[32]

See also

References

  1. ^ Jonathan Corbet (29 May 2007). "Process containers". LWN.net.
  2. ^ a b Jonathan Corbet (29 October 2007). "Notes from a container". LWN.net. Retrieved 14 April 2015. The original 'containers' name was considered to be too generic – this code is an important part of a container solution, but it's far from the whole thing. So containers have now been renamed 'control groups' (or 'cgroups') and merged for 2.6.24.
  3. ^ "cgroup: convert to kernfs". 28 January 2014.
  4. ^ "netfilter: x_tables: lightweight process control group matching". 23 April 2014. Archived from the original on 24 April 2014. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  5. ^ "cgroup: prepare for the default unified hierarchy". 13 March 2014.
  6. ^ "diff between Linux kernel 4.4 and 4.5". 14 March 2016.
  7. ^ "Documentation/cgroup-v2.txt as appeared in Linux kernel 4.5". 14 March 2016.
  8. ^ Jonathan Corbet (31 July 2007). "Controlling memory use in containers". LWN.
  9. ^ Balbir Singh, Vaidynathan Srinivasan (July 2007). "Containers: Challenges with the memory resource controller and its performance" (PDF). Ottawa Linux Symposium.
  10. ^ Jonathan Corbet (23 October 2007). "Kernel space: Fair user scheduling for Linux". Network World. Retrieved 22 August 2012.
  11. ^ Kamkamezawa Hiroyu (19 November 2008). Cgroup and Memory Resource Controller (PDF). Japan Linux Symposium. Archived from the original (PDF presentation slides) on 22 July 2011. {{cite conference}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  12. ^ a b Dave Hansen. Resource Management (PDF presentation slides). Linux Foundation.
  13. ^ Matt Helsley (3 February 2009). "LXC: Linux container tools". IBM developerWorks.
  14. ^ "Grid Engine cgroups Integration". Scalable Logic. 22 May 2012.
  15. ^ "cgroups". kernel.org.
  16. ^ "All About the Linux Kernel: Cgroup's Redesign". Linux.com. 15 August 2013. Retrieved 19 May 2014.
  17. ^ "The unified control group hierarchy in 3.16". LWN.net. 11 June 2014.
  18. ^ "Pull cgroup updates for 3.15 from Tejun Heo". kernel.org. 3 April 2014.
  19. ^ "Pull cgroup updates for 3.16 from Tejun Heo". kernel.org. 9 June 2014.
  20. ^ Pavel Emelyanov, Kir Kolyshkin (19 November 2007). "PID namespaces in the 2.6.24 kernel". LWN.net.
  21. ^ Jonathan Corbet (30 January 2007). "Network namespaces". LWN.net.
  22. ^ Serge E. Hallyn, Ram Pai (17 September 2007). "Applying mount namespaces". IBM developerWorks.
  23. ^ Michael Kerrisk (27 February 2013). "Namespaces in operation, part 5: User namespaces". lwn.net Linux Info from the Source.
  24. ^ Janak Desai (11 January 2006). "Linux kernel documentation on unshare".
  25. ^ "The Use of Name Spaces in Plan 9". 1992.
  26. ^ "kernfs, sysfs, driver-core: implement synchronous self-removal". LWN.net. 3 February 2014. Retrieved 7 April 2014.
  27. ^ "Linux kernel source tree: kernel/git/torvalds/linux.git: cgroups: convert to kernfs". kernel.org. 11 February 2014. Retrieved 23 May 2014.
  28. ^ "memcg: kmem controller infrastructure". kernel.org. 18 December 2012.
  29. ^ "memcg: kmem accounting basic infrastructure". kernel.org. 18 December 2012.
  30. ^ "memcg: add documentation about the kmem controller". kernel.org. 18 December 2012.
  31. ^ a b "Mesosphere to Bring Google's Kubernetes to Mesos". Mesosphere.io. 10 July 2014. Retrieved 13 July 2014.
  32. ^ https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/6.0_Release_Notes/Red_Hat_Enterprise_Linux-6-6.0_Release_Notes-en-US.pdf