Jump to content

OS-level virtualization

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Georgewilliamherbert (talk | contribs) at 01:30, 18 November 2016 (+ category Linux Containerization). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Operating-system-level virtualization is a server virtualization method in which the kernel of an operating system allows the existence of multiple isolated user-space instances, instead of just one. Such instances, which are sometimes called containers, software containers,[1] virtualization engines (VEs) or jails (FreeBSD jail or chroot jail), may look and feel like a real server from the point of view of its owners and users.

On Unix-like operating systems, this technology can be seen as an advanced implementation of the standard chroot mechanism. In addition to isolation mechanisms, the kernel often provides resource-management features to limit the impact of one container's activities on other containers.

Uses

Operating-system-level virtualization is commonly used in virtual hosting environments, where it is useful for securely allocating finite hardware resources amongst a large number of mutually-distrusting users. System administrators may also use it, to a lesser extent, for consolidating server hardware by moving services on separate hosts into containers on the one server.

Other typical scenarios include separating several applications to separate containers for improved security, hardware independence, and added resource management features. The improved security provided by the use of a chroot mechanism, however, is nowhere near ironclad.[2] Operating-system-level virtualization implementations capable of live migration can also be used for dynamic load balancing of containers between nodes in a cluster.

Overhead

Operating-system-level virtualization usually imposes little to no overhead, because programs in virtual partitions use the operating system's normal system call interface and do not need to be subjected to emulation or be run in an intermediate virtual machine, as is the case with whole-system virtualizers (such as VMware ESXi, QEMU or Hyper-V) and paravirtualizers (such as Xen or UML). This form of virtualization also does not require support in hardware to perform efficiently.

Flexibility

Operating-system-level virtualization is not as flexible as other virtualization approaches since it cannot host a guest operating system different from the host one, or a different guest kernel. For example, with Linux, different distributions are fine, but other operating systems such as Windows cannot be hosted.

Solaris partially overcomes the above described limitation with its branded zones feature, which provides the ability to run an environment within a container that emulates an older Solaris 8 or 9 version in a Solaris 10 host. Linux branded zones (referred to as "lx" branded zones) are also available on x86-based Solaris systems, providing a complete Linux userspace and support for the execution of Linux applications; additionally, Solaris provides utilities needed to install Red Hat Enterprise Linux 3.x or CentOS 3.x Linux distributions inside "lx" zones.[3][4] However, in 2010 Linux branded zones were removed from Solaris; in 2014 they were reintroduced in Illumos, which is the open source Solaris fork, supporting 32-bit Linux kernels.[5]

Storage

Some operating-system-level virtualization implementations provide file-level copy-on-write (CoW) mechanisms. (Most commonly, a standard file system is shared between partitions, and those partitions that change the files automatically create their own copies.) This is easier to back up, more space-efficient and simpler to cache than the block-level copy-on-write schemes common on whole-system virtualizers. Whole-system virtualizers, however, can work with non-native file systems and create and roll back snapshots of the entire system state.

Implementations

Mechanism Operating system License Available since or between Features
File system isolation Copy on Write Disk quotas I/O rate limiting Memory limits CPU quotas Network isolation Nested virtualization Partition checkpointing and live migration Root privilege isolation
chroot most UNIX-like operating systems varies by operating system 1982 Partial[a] No No No No No No Yes No No
Docker Linux[7] Apache License 2.0 2013 Yes Yes Not directly Yes (since 1.10) Yes Yes Yes Yes No Yes (since 1.10)
Linux-VServer
(security context)
Linux GNU GPLv2 2001 Yes Yes Yes Yes[b] Yes Yes Partial[c] ? No Partial[d]
lmctfy Linux Apache License 2.0 2013 Yes Yes Yes Yes[b] Yes Yes Partial[c] ? No Partial[d]
LXC Linux GNU GPLv2 2008 Yes[9] Yes Partial[e] Partial[f] Yes Yes Yes Yes No Yes[9]
LXD[10] Linux Apache License 2.0 2015 Yes Yes Partial (see LXC) Partial (see LXC) Yes Yes Yes Yes Partial[g] Yes
OpenVZ Linux GNU GPLv2 2005 Yes Yes (ZFS) Yes Yes[h] Yes Yes Yes[i] Partial[j] Yes Yes[k]
Virtuozzo Linux, Windows Proprietary 2000[15] Yes Yes Yes Yes[l] Yes Yes Yes[i] Partial[m] Yes Yes
Solaris Containers (Zones) illumos (OpenSolaris),
Solaris
CDDL,
Proprietary
2004 Yes Yes (ZFS) Yes Partial[n] Yes Yes Yes[o][18][19] Partial[p] Partial[q][r] Yes[s]
FreeBSD jail FreeBSD BSD License 2000[21] Yes Yes (ZFS) Yes[t] Yes Yes[22] Yes Yes[23] Yes Partial[24][25] Yes[26]
sysjail OpenBSD, NetBSD BSD License 2006–2009 Yes No No No No No Yes No No ?
WPARs AIX Proprietary 2007 Yes No Yes Yes Yes Yes Yes[u] No Yes[28] ?
HP-UX Containers (SRP) HPUX Proprietary 2007 Yes No Partial[v] Yes Yes Yes Yes ? Yes ?
iCore Virtual Accounts Windows XP Proprietary: Freeware 2008 Yes No Yes No No No No ? No ?
Sandboxie Windows Proprietary: Shareware 2004 Yes Yes Partial No No No Partial No No Yes
Spoon Windows Proprietary 2012 Yes Yes No No No No Yes No No Yes
VMware ThinApp Windows Proprietary 2008 Yes Yes No No No No Yes No No Yes

See also

Notes

  1. ^ Root user can easily escape from chroot. Chroot was never supposed to be used as a security mechanism.[6]
  2. ^ a b Utilizing the CFQ scheduler, there is a separate queue per guest.
  3. ^ a b Networking is based on isolation, not virtualization.
  4. ^ a b A total of 14 user capabilities are considered safe within a container. The rest may cannot be granted to processes within that container without allowing that process to potentially interfere with things outside that container.[8]
  5. ^ Disk quotas per container are possible when using separate partitions for each container with the help of LVM, or when the underlying host filesystem is btrfs, in which case btrfs subvolumes are automatically used.
  6. ^ I/O rate limiting is supported when using Btrfs.
  7. ^ In progress: Works on non-systemd OS[11]
  8. ^ Available since Linux kernel 2.6.18-028stable021. Implementation is based on CFQ disk I/O scheduler, but it is a two-level schema, so I/O priority is not per-process, but rather per-container.[12]
  9. ^ a b Each container can have its own IP addresses, firewall rules, routing tables and so on. Three different networking schemes are possible: route-based, bridge-based, and assigning a real network device (NIC) to a container.
  10. ^ Docker containers can run inside OpenVZ containers.[13]
  11. ^ Each container may have root access without possibly affecting other containers.[14]
  12. ^ Available since version 4.0, January 2008.
  13. ^ Docker containers can run inside Virtuozzo containers.[16]
  14. ^ Yes with illumos[17]
  15. ^ See OpenSolaris Network Virtualization and Resource Control for more details.
  16. ^ Only when top level is a KVM zone (illumos) or a kz zone (Oracle).
  17. ^ Starting in Solaris 11.3 Beta, Solaris Kernel Zones may use live migration.
  18. ^ Cold migration (shutdown-move-restart) is implemented.
  19. ^ Non-global zones are restricted so they may not affect other zones via a capability-limiting approach. The global zone may administer the non-global zones.[20]
  20. ^ Check the "allow.quotas" option and the "Jails and File Systems" section on the FreeBSD jail man page for details.
  21. ^ Available since TL 02.[27]
  22. ^ Yes with logical volumes.

References

  1. ^ Hogg, Scott (2014-05-26). "Software Containers: Used More Frequently than Most Realize". Network World. Network World, Inc. Retrieved 2015-07-09. There are many other OS-level virtualization systems such as: Linux OpenVZ, Linux-VServer, FreeBSD Jails, AIX Workload Partitions (WPARs), HP-UX Containers (SRP), Solaris Containers, among others.
  2. ^ "How to break out of a chroot() jail". 2002. Retrieved 7 May 2013.
  3. ^ "System Administration Guide: Oracle Solaris Containers-Resource Management and Oracle Solaris Zones, Chapter 16: Introduction to Solaris Zones". Oracle Corporation. 2010. Retrieved 2014-09-02.
  4. ^ "System Administration Guide: Oracle Solaris Containers-Resource Management and Oracle Solaris Zones, Chapter 31: About Branded Zones and the Linux Branded Zone". Oracle Corporation. 2010. Retrieved 2014-09-02.
  5. ^ Bryan Cantrill (2014-09-28). "The dream is alive! Running Linux containers on an illumos kernel". slideshare.net. Retrieved 2014-10-10.
  6. ^ "3.5. Limiting your program's environment". freebsd.org.
  7. ^ "Docker drops LXC as default execution environment". InfoQ.
  8. ^ Linux-VServer Paper, Secure Capabilities
  9. ^ a b Graber, Stéphane (1 January 2014). "LXC 1.0: Security features [6/10]". Retrieved 12 February 2014. LXC now has support for user namespaces. [...] LXC is no longer running as root so even if an attacker manages to escape the container, he'd find himself having the privileges of a regular user on the host
  10. ^ Kouka, Abdelmonam (2015). Ubuntu Server Essentials. Packt Publishing Ltd. p. 124. ISBN 9781785282768. Retrieved 2016-03-31. Also known as the Linux container hypervisor, LXD is the next-generation hypervisor provided by Canonical. It combines the density of containers with the manageability of virtual machines.
  11. ^ "Live Migration in LXD". Ubuntu Insights Web site.
  12. ^ "I/O priorities for containers". OpenVZ Virtuozzo Containers Wiki.
  13. ^ "Docker inside CT".
  14. ^ "Container". OpenVZ Virtuozzo Containers Wiki.
  15. ^ "Initial public prerelease of Virtuozzo (named ASPcomplete at that time)".
  16. ^ "Parallels Virtuozzo Now Provides Native Support for Docker".
  17. ^ Pijewski, Bill. "Our ZFS I/O Throttle".
  18. ^ Network Virtualization and Resource Control (Crossbow) FAQ
  19. ^ "Managing Network Virtualization and Network Resources in Oracle® Solaris 11.2".
  20. ^ Oracle Solaris 11.1 Administration, Oracle Solaris Zones, Oracle Solaris 10 Zones and Resource Management E29024.pdf, pp. 356–360. Available within an archive.
  21. ^ "Contain your enthusiasm - Part Two: Jails, Zones, OpenVZ, and LXC". Jails were first introduced in FreeBSD 4.0 in 2000
  22. ^ "Hierarchical_Resource_Limits - FreeBSD Wiki". Wiki.freebsd.org. 2012-10-27. Retrieved 2014-01-15.
  23. ^ "Implementing a Clonable Network Stack in the FreeBSD Kernel" (PDF). usenix.org. 2003-06-13.
  24. ^ "VPS for FreeBSD". Retrieved 2016-02-20.
  25. ^ "[Announcement] VPS // OS Virtualization // alpha release". Retrieved 2016-02-20.
  26. ^ "3.5. Limiting your program's environment". Freebsd.org. Retrieved 2014-01-15.
  27. ^ "IBM Fix pack information for: WPAR Network Isolation - United States". ibm.com.
  28. ^ Live Application Mobility in AIX 6.1