In computer storage, fragmentation is a phenomenon in which storage space is used inefficiently, reducing capacity and often performance. Fragmentation leads to storage space being "wasted", and the term also refers to the wasted space itself.
There are three different but related forms of fragmentation: external fragmentation, internal fragmentation, and data fragmentation, which can be present in isolation or conjunction. Fragmentation is often accepted in return for improvements in speed or simplicity.
When a computer program requests blocks of memory from the computer system, the blocks are allocated in chunks. When the computer program is finished with a chunk, it can free the chunk back to the computer. The size and the amount of time a chunk is held by a program varies. During its lifespan, a computer program can request and free many chunks of memory.
When a program is started, the free memory areas are long and contiguous. Over time and with use, the long contiguous regions become fragmented into smaller and smaller contiguous areas. Eventually, it may become impossible for the program to request large chunks of memory.
Types of fragmentation
Due to the rules governing memory allocation, more computer memory is sometimes allocated than is needed. For example, memory can only be provided to programs in chunks divisible by 4, 8 or 16, and as a result if a program requests perhaps 23 bytes, it will actually get a chunk of 24. When this happens, the excess memory goes to waste. In this scenario, the unusable memory is contained within an allocated region. This arrangement, termed fixed partitions, suffers from inefficient memory use - any process, no matter how small, occupies an entire partition. This waste is called internal fragmentation.
Unlike other types of fragmentation, internal fragmentation is difficult to reclaim; usually the best way to remove it is with a design change. For example, in dynamic memory allocation, memory pools drastically cut internal fragmentation by spreading the space overhead over a larger number of objects.
External fragmentation arises when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application. The term "external" refers to the fact that the unusable storage is outside the allocated regions.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block.
External fragmentation also occurs in file systems as many files of different sizes are created, change size, and are deleted. The effect is even worse if a file which is divided into many small pieces is deleted, because this leaves similarly small regions of free spaces.
|Start with all memory available for allocation.|
|A||B||C||Allocated three blocks A, B, and C, of size 0x1000.|
|A||C||Freed block B. Notice that the memory that B used cannot be included for an allocation larger than B's size.|
Compared to external fragmentation, overhead and internal fragmentation account for little loss in terms of wasted memory and reduced performance. It is defined as:
Fragmentation of 0% means that all the free memory is in a single large block; fragmentation is 90% (for example) when 100 MB free memory is present but largest free block of memory for allocation is just 10 MB.
Data fragmentation occurs when a collection of data in memory is broken up into many pieces that are not close together. It is typically the result of attempting to insert a large object into storage that has already suffered external fragmentation.
For example, files in a file system are usually managed in units called blocks or clusters. When a file system is created, there is free space to store file blocks together contiguously. This allows for rapid sequential file reads and writes. However, as files are added, removed, and changed in size, the free space becomes externally fragmented, leaving only small holes in which to place new data. When a new file is written, or when an existing file is extended, the operating system puts the new data in new non-contiguous data blocks to fit into the available holes. The new data blocks are necessarily scattered, slowing access due to seek time and rotational latency of the read/write head, and incurring additional overhead to manage additional locations. This is called file system fragmentation.
When writing a new file of a known size, if there are any empty holes that are larger than that file, the operating system can avoid data fragmentation by putting the file into any one of those holes. There are a variety of algorithms for selecting which of those potential holes to put the file; each of them is a heuristic approximate solution to the bin packing problem. The "best fit" algorithm chooses the smallest hole that is big enough. The "worst fit" algorithm chooses the largest hole. The "first-fit algorithm" chooses the first hole that is big enough. The "next fit" algorithm keeps track of where each file was written. The "next fit" algorithm is faster than "first fit", which is in turn faster than "best fit", which is the same speed as "worst fit".
Just as compaction can eliminate external fragmentation, data fragmentation can be eliminated by rearranging data storage so that related pieces are close together. For example, the primary job of a defragmentation tool is to rearrange blocks on disk so that the blocks of each file are contiguous. Most defragmenting utilities also attempt to reduce or eliminate free space fragmentation. Some moving garbage collectors will also move related objects close together (this is called compacting) to improve cache performance.
There are 4 kinds of systems that never experience data fragmentation—they always store every file contiguously. Alas, all 4 kinds have significant disadvantages compared to systems that allow at least some temporary data fragmentation:
- Simply write each file contiguously, as with CD-R. If there isn't already enough contiguous free space to hold the file, the system immediately fails to store the file—even when there are lots of little bits of free space from deleted files that add up to more than enough to store the file.
- If there isn't already enough contiguous free space to hold the file, use a copying collector to convert many little bits of free space into one contiguous free region big enough to hold the file. This takes a lot more time than breaking the file up into fragments and putting those fragments into the available free space.
- fixed-size-blocks allocation: write the file into any free block. If a programmer picks a fixed block size too small, the system immediately fails to store some files—files larger than the block size—even when there are many free blocks that add up to more than enough to store the file. If a programmer picks a block size too big, we waste a lot of space on internal fragmentation.
- Some systems avoid dynamic allocation entirely, pre-allocating (contiguous) space for all possible files they will need—for example, MultiFinder pre-allocated a chunk of RAM to each application as it was started according to how much RAM that application's programmer claimed it would need.
Performance degradation due to fragmentation
Memory fragmentation is one of the most severe problems faced by system managers. Over time, it leads to degradation of system performance. Eventually, memory fragmentation may lead to complete loss of free memory.
Memory fragmentation is a kernel programming level problem. During real-time computing of applications, fragmentation levels can reach as high as 99%, and may lead to system crashes or other instabilities. This type of system crash can be difficult to avoid, as it is impossible to anticipate the critical rise in levels of memory fragmentation.
- "Partitioning, Partition Sizes and Drive Lettering". The PC Guide. April 17, 2001. Retrieved 2012-01-20.
- "Switches: Sector copy". Symantec. 2001-01-14. Retrieved 2012-01-20.
- D. Samanta. "Classic Data Structures" 2004. p. 76
- C++ Footprint and Performance Optimization, R. Alexander; G. Bensley, Sams Publisher, First edition, Page no:128, ISBN no:9780672319044
- Ibid, Page no:129
- Videos explaining the concept of slack space, Department of Computer Systems and Communications, UTM