Garbage collection (computer science)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about garbage collection in memory management. For garbage collection in a Solid State Drive, see Garbage collection (SSD).

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.[1][2]

Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of approaches, including other techniques such as stack allocation and region inference. Like other memory management techniques, garbage collection may take a significant proportion of total processing time in a program and can thus have significant influence on performance.

Resources other than memory, such as network sockets, database handles, user interaction windows, and file and device descriptors, are not typically handled by garbage collection. Methods used to manage such resources, particularly destructors, may suffice to manage memory as well, leaving no need for GC. Some GC systems allow such other resources to be associated with a region of memory that, when collected, causes the other resource to be reclaimed; this is called finalization. Finalization may introduce complications limiting its usability, such as intolerable latency between disuse and reclaim of especially limited resources, or a lack of control over which thread performs the work of reclaiming.

Principles[edit]

The basic principles of garbage collection are:

  • Find data objects in a program that cannot be accessed in the future.
  • Reclaim the resources used by those objects.

Many programming languages require garbage collection, either as part of the language specification (for example, Java, C#, D language,[3] Go and most scripting languages) or effectively for practical implementation (for example, formal languages like lambda calculus); these are said to be garbage collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations available (for example, C, C++). Some languages, like Ada, Modula-3, and C++/CLI allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects; others, like D, are garbage collected but allow the user to manually delete objects and also entirely disable garbage collection when speed is required.

While integrating garbage collection into the language's compiler and runtime system enables a much wider choice of methods,[citation needed] post hoc GC systems exist, e.g ARC, including some that do not require recompilation. (Post-hoc GC is sometimes distinguished as litter collection.) The garbage collector will almost always be closely integrated with the memory allocator.

Advantages[edit]

Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:

  • Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use, with unpredictable results.
  • Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again.
  • Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memory exhaustion. (Garbage collection typically does not deal with the unbounded accumulation of data that is reachable, but that will actually not be used by the program.)
  • Efficient implementations of persistent data structures

Some of the bugs addressed by garbage collection can have security implications.

Disadvantages[edit]

Typically, garbage collection has certain disadvantages:

  • Garbage collection consumes computing resources in deciding which memory to free, even though the programmer may have already known this information. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can lead to decreased or uneven performance.[4] A peer-reviewed paper came to the conclusion that GC needs five times the memory to compensate for this overhead and to perform as fast as explicit memory management.[5] Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was also given by Apple as a reason for not adopting garbage collection in iOS despite being the most desired feature.[6]
  • The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.
  • Non-deterministic GC is incompatible with RAII based management of non-GCed resources. As a result, the need for explicit manual resource management (release/close) for non-GCed resources becomes transitive to composition. That is: in a non-deterministic GC system, if a resource or a resource-like object requires manual resource management (release/close), and this object is used as 'part of' another object, then the composed object will also become a resource-like object that itself requires manual resource management (release/close).

Tracing garbage collectors[edit]

Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.

Reference counting[edit]

Main article: Reference counting

Reference counting is a form of garbage collection whereby each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero.

As with manual memory management, and unlike tracing garbage collection, reference counting guarantees that objects are destroyed as soon as their last reference is destroyed, and usually only accesses memory which is either in CPU caches, in objects to be freed, or directly pointed by those, and thus tends to not have significant negative side effects on CPU cache and virtual memory operation.

There are a number of disadvantages to reference counting; this can generally be solved or mitigated by more sophisticated algorithms:

Cycles
If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems using reference counting (like the one in CPython) use specific cycle-detecting algorithms to deal with this issue.[7]
Another strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in that when the referent object becomes garbage, any weak reference to it lapses, rather than being permitted to remain dangling, meaning that it turns into a predictable value, such as a null reference.
Space overhead (reference count)
Reference counting requires space to be allocated for each object to store its reference count. The count may be stored adjacent to the object's memory or in a side table somewhere else, but in either case, every single reference-counted object requires additional storage for its reference count. An unsigned pointer–sized memory space is commonly used for this task, meaning that 32 or 64 bits of reference count storage must be allocated for each object.
On some systems, it may be possible to mitigate this overhead by using a tagged pointer to store the reference count in unused areas of the object's memory. Often, an architecture does not actually allow programs to access the full range of memory addresses that could be stored in its native pointer size; certain number of high bits in the address is either ignored or required to be zero. If an object reliably has a pointer at a certain location, the reference count can be stored in the unused bits of the pointer. For example, each object in Objective-C has a pointer to its class at the beginning of its memory; on the ARM64 architecture using iOS 7, 19 unused bits of this class pointer are used to store the object's reference count.[8][9]
Speed overhead (increment/decrement)
In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, in the common case, when a reference is copied from an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference. In the programming language C++, this technique is readily implemented and demonstrated with the use of const references.
Reference counting in C++ is usually implemented using "smart pointers" whose constructors, destructors and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new smart pointer (which would increase the reference count on entry into the function and decrease it on exit). Instead the function receives a reference to the smart pointer which is produced inexpensively.
Requires atomicity
When used in a multithreaded environment, these modifications (increment and decrement) may need to be atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they have to be emulated with software algorithms.
It is possible to avoid this issue by adding per-thread or per-CPU reference counts and only accessing the global reference count when the local reference counts become or are no longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction in exchange for not having a global reference count at all), but this adds significant memory overhead and thus tends to be only useful in special cases (it is used, for example, in the reference counting of Linux kernel modules).
Not real-time
Naive implementations of reference counting do not in general provide real-time behavior, because any pointer assignment can potentially cause a number of objects bounded only by total allocated memory size to be recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating the freeing of objects whose reference count dropped to zero to other threads, at the cost of extra overhead.

Escape analysis[edit]

Main article: Escape analysis

Escape analysis can be used to convert heap allocations to stack allocations, thus reducing the amount of work needed to be done by the garbage collector. This is done using a compile-time analysis to determine whether an object allocated within a function is not accessible outside of it (i.e. escape) to other functions or threads. In such a case the object may be allocated directly on the thread stack and released when the function returns, reducing its potential garbage collection overhead.

Compile-time[edit]

Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation. This form of garbage collection has been studied in the Mercury programming language.[10]

Greater usage saw such an automatic compile-time memory management with the introduction of LLVM's automatic reference counter (ARC) into Apple's ecosystem (iOS and MacOS) in 2011.[11][12][13]

Availability[edit]

Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In languages that do not have built in garbage collection, it can often be added through a library, as with the Boehm garbage collector for C (for "nearly all programs") and C++. This approach is not without drawbacks, such as changing object creation and destruction mechanisms.

Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp is especially notable as both the first functional programming language and the first language to introduce garbage collection.

Other dynamic languages, such as Ruby (but not Perl 5 or PHP before version 5.3,[14] which both use reference counting), also tend to use GC. Object-oriented programming languages such as Smalltalk, Java and ECMAScript usually provide integrated garbage collection. Notable exceptions are C++ and Delphi which have destructors.

BASIC[edit]

Historically, languages intended for beginners, such as BASIC and Logo, have often used garbage collection for heap-allocated variable-length data types, such as strings and lists, so as not to burden programmers with manual memory management. On early microcomputers, with their limited memory and slow processors, BASIC garbage collection could often cause apparently random, inexplicable pauses in the midst of program operation.

Some BASIC interpreters, such as Applesoft BASIC on the Apple II family, repeatedly scanned the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in O(N*N) performance, which could introduce minutes-long pauses in the execution of string-intensive programs. A replacement garbage collector for Applesoft BASIC published in Call-A.P.P.L.E. (January 1981, pages 40–45, Randy Wigginton) identified a group of strings in every pass over the heap, which cut collection time dramatically. BASIC.System, released with ProDOS in 1983, provided a windowing garbage collector for BASIC that reduced most collections to a fraction of a second.

Apple ecosystem[edit]

While Objective-C has not traditionally had GC, in 2007 Apple introduced with Mac OS X 10.5 Garbage Collection for Objective-C 2.0, using a runtime collector developed in-house.[15] But in 2012 with OS X 10.8, GC was deprecated by LLVM's automatic reference counter which was introduced with OS X 10.7.[16] Starting with May 2015 Apple even forbids the usage of GC for new Apps in the App Store for MacOS.[13][17] For iOS, GC was never introduced by Apple, due to problems in App responsitivity and performance.[6][18] Instead, ARC is provided.[12][11]

Limited environments[edit]

Garbage collection is rarely used on embedded or real-time systems because of the perceived need for very tight control over the use of limited resources. However, garbage collectors compatible with such limited environments have been developed.[19] The Microsoft .NET Micro Framework and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.

See also[edit]

References[edit]

  1. ^ "Recursive functions of symbolic expressions and their computation by machine, Part I". Portal.acm.org. Retrieved 29 March 2009. 
  2. ^ "Recursive functions of symbolic expressions and their computation by machine, Part I". Retrieved 29 May 2009. 
  3. ^ "Overview - D Programming Language". dlang.org. Digital Mars. Retrieved 2014-07-29. D memory allocation is fully garbage collected. 
  4. ^ Zorn, Benjamin (1993-01-22). "The Measured Cost of Conservative Garbage Collection". Department of Computer Science, University of Colorado Boulder. Retrieved 2012-11-18. Conservative garbage collection does not come without a cost. In the programs measured, the garbage collection algorithm used 30–150 per cent more address space than the most space efficient explicit management algorithm. In addition, the conservative garbage collection algorithm significantly reduced the reference locality of the programs, greatly increasing the page fault rate and cache miss rate of the applications for a large range of cache and memory sizes. This result suggests that not only does the conservative garbage collection algorithm increase the size of the address space, but also frequently references the entire space it requires. 
  5. ^ Matthew Hertz, Emery D. Berger (2005). "Quantifying the Performance of Garbage Collection vs. Explicit Memory Management" (PDF). OOPSLA 2005. Retrieved 2015-03-15. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. 
  6. ^ a b "Developer Tools Kickoff - session 300" (PDF). WWDC 2011. Apple, inc. 2011-06-24. Retrieved 2015-03-27. At the top of your wishlist of things we could do for you is bringing garbage collection to iOS. And that is exactly what we are not going to do... Unfortunately garbage collection has a suboptimal impact on performance. Garbage can build up in your applications and increase the high water mark of your memory usage. And the collector tends to kick in at undeterministic times which can lead to very high CPU usage and stutters in the user experience. And that’s why GC has not been acceptable to us on our mobile platforms. In comparison, manual memory management with retain/release is harder to learn, and quite frankly it's a bit of a pain in the ass. But it produces better and more predictable performance, and that’s why we have chosen it as the basis of our memory management strategy. Because out there in the real world, high performance and stutter-free user experiences are what matters to our users. 
  7. ^ "Reference Counts". Extending and Embedding the Python Interpreter. 21 February 2008. Retrieved 22 May 2014. While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. 
  8. ^ Mike Ash. "Friday Q&A 2013-09-27: ARM64 and You". mikeash.com. Retrieved 2014-04-27. 
  9. ^ "Hamster Emporium: [objc explain]: Non-pointer isa". Sealiesoftware.com. 2013-09-24. Retrieved 2014-04-27. 
  10. ^ Mazur, Nancy (May 2004). Compile-time garbage collection for the declarative language Mercury (PDF) (Thesis). Katholieke Universiteit Leuven. 
  11. ^ a b Rob Napier, Mugunth Kumar (2012-11-20). "iOS 6 Programming Pushing the Limit". John Wiley & Sons. Retrieved 2015-03-30. "ARC is not garbage collection [...] this makes the code behave the way the programmer intended it to but without an extra garbage collection step. Memory is reclaimed faster than with garbage collection and decision are done at compile time rather than at run-time, which generally improves overall performance." 
  12. ^ a b Cruz, José R.C. (2012-05-22). "Automatic Reference Counting on iOS". Dr. Dobbs. Retrieved 2015-03-30. Finally, the [Garbage collection] service still incurs a performance hit despite being conservative. This is one reason why garbage collection is absent on iOS.[...] ARC is an innovative approach that has many of the benefits of garbage collection, but without the performance costs. Internally, ARC is not a runtime service. It is, in fact, a deterministic two-part phase provided by the new Clang front-end. 
  13. ^ a b Apple says Mac app makers must transition to ARC memory management by May by AppleInsider (February 20, 2015)
  14. ^ "PHP: Performance Considerations". php.net. Retrieved 14 January 2015. 
  15. ^ Objective-C 2.0 Overview
  16. ^ Mac OS X 10.7 Lion: the Ars Technica review John Siracusa (20 Juli 2011)
  17. ^ Cichon, Waldemar (2015-02-21). "App Store: Apple entfernt Programme mit Garbage Collection". Heise.de. Retrieved 2015-03-30. Bis Mai 2015 dürfen im App Store eingestellten neu eingestellte oder aktualisierte OS-X-Programme keine Garbage Collection mehr haben. Danach ist nur noch Automatic Reference Counting erlaubt.[...] in OS X 10.8 die Garbage Collection als "deprecated" (dt. veraltet) gekennzeichnet. 
  18. ^ Silva, Precious (2014-11-18). "iOS 8 vs Android 5.0 Lollipop: Apple Kills Google with Memory Efficiency". International Business Times. Retrieved 2015-04-07. 
  19. ^ "Wei Fu and Carl Hauser, "A Real-Time Garbage Collection Framework for Embedded Systems". ACM SCOPES '05, 2005". Portal.acm.org. Retrieved 9 July 2010. 

Further reading[edit]

External links[edit]

Implementations[edit]