The term memory hierarchy is used in computer architecture when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. A "memory hierarchy" in computer storage distinguishes each level in the "hierarchy" by response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by the controlling technology.
The many trade-offs in designing for high performance will include the structure of the memory hierarchy, i.e. the size and technology of each component. So the various components can be viewed as forming a hierarchy of memories (m1,m2,...,mn) in which each member mi is in a sense subordinate to the next highest member mi-1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling to activate the transfer.
There are four major storage levels.
- Internal – Processor registers and cache.
- Main – the system RAM and controller cards.
- On-line mass storage – Secondary storage.
- Off-line bulk storage – Tertiary and Off-line storage.
This is a most general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture.
Example use of the term 
Here are some quotes.
- Adding complexity slows down the memory hierarchy.
- CMOx memory technology stretches the Flash space in the memory hierarchy
- One of the main ways to increase system performance is minimising how far down the memory hierarchy one has to go to manipulate data.
- Latency and bandwidth are two metrics associated with caches and memory. Neither of them is uniform, but is specific to a particular component of the memory hierarchy.
- Predicting where in the memory hierarchy the data resides is difficult.
- ...the location in the memory hierarchy dictates the time required for the prefetch to occur.
Application of the concept 
The memory hierarchy in most computers is:
- Processor registers – the fastest possible access (usually 1 CPU cycle), only hundreds of bytes in size
- Level 1 (L1) cache – often accessed in just a few cycles, usually tens of kilobytes
- Level 2 (L2) cache – higher latency than L1 by 2× to 10×, usually has 512 KiB or more
- Level 3 (L3) cache – higher latency than L2, usually has 2048 KiB or more
- Main memory – may take hundreds of cycles, but can be multiple gigabytes. Access times may not be uniform, in the case of a NUMA machine.
- Disk storage – millions of cycles latency if not cached, but can be multiple terabytes
- Tertiary storage – several seconds latency, can be huge
Note that the hobbyist who reads "L1 cache" in the computer specifications sheet is reading about the 'internal' memory hierarchy .
Most modern CPUs are so fast that for most program workloads, the bottleneck is the locality of reference of memory accesses and the efficiency of the caching and memory transfer between different levels of the hierarchy. As a result, the CPU spends much of its time idling, waiting for memory I/O to complete. This is sometimes called the space cost, as a larger memory object is more likely to overflow a small/fast level and require use of a larger/slower level.
Modern programming languages mainly assume two levels of memory, main memory and disk storage, though in assembly language and inline assemblers in languages such as C, registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):
- Programmers are responsible for moving data between disk and memory through file I/O.
- Hardware is responsible for moving data between memory and caches.
- Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.
Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory hierarchy will be assessed during code refactoring.
See also 
- Memory wall
- Use of spatial and temporal locality: hierarchical memory
- The difference between buffer and cache
- Random-access memory#Memory hierarchy
- Cache hierarchy in a modern processor
- Computer data storage
- Computer memory
- Ty, Wing; Zee, Benjamin (1986). Computer Hardware/Software Architecture. Bell Telephone Laboratories, Inc. p. 30. ISBN 0-13-163502-6.
- "Memory Hierarchy". Unitity Semiconductor Corporation. Retrieved 16 September 2009.
- Pádraig Brady. "Multi-Core". Retrieved 16 September 2009.
- van der Pas, Ruud (2002). "Memory Hierarchy in Cache-Based Systems". Santa Clara, California: Sun Microsystems. p. 26. 817-0742-10 http://www.sun.com/ Missing or empty