Jump to content

Memory hierarchy

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Sutekh.destroyer (talk | contribs) at 19:16, 24 August 2011 (→‎Application of the concept: Registers are fast, L1 cache exists.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Diagram of the computer memory hierarchy

The term memory hierarchy is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. A 'memory hierarchy' in computer storage distinguishes each level in the 'hierarchy' by response time. Since response time, complexity, and capacity are related,[1] the levels may also be distinguished by the controlling technology.

The many trade-offs in designing for high performance will include the structure of the memory hierarchy, i.e. the size and technology of each component. So the various components can be viewed as forming a hierarchy of memories (m1,m2,...,mn) in which each member mi is in a sense subordinate to the next highest member mi-1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling to activate the transfer.

There are four major storage levels.[1]

  1. InternalProcessor registers and cache.
  2. Main – the system RAM and controller cards.
  3. On-line mass storage – Secondary storage.
  4. Off-line bulk storage – Tertiary and Off-line storage.

This is a most general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture.

Example use of the term

Here are some quotes.

  • Adding complexity slows down the memory hierarchy.[2]
  • CMOx memory technology stretches the Flash space in the memory hierarchy[3]
  • One of the main ways to increase system performance is minimising how far down the memory hierarchy one has to go to manipulate data.[4]
  • Latency and bandwidth are two metrics associated with caches and memory. Neither of them is uniform, but is specific to a particular component of the memory hierarchy.[5]
  • Predicting where in the memory hierarchy the data resides is difficult.[5]
  • ...the location in the memory hierarchy dictates the time required for the prefetch to occur.[5]

Application of the concept

The memory hierarchy in most computers is:

  • Processor registers – fastest possible access (usually 1 CPU cycle), only hundreds of bytes in size
  • Level 1 (L1) cache – often accessed in just a few cycles, usually tens of kilobytes
  • Level 2 (L2) cache – higher latency than L1 by 2× to 10×, usually has 512 KiB or more
  • Level 3 (L3) cache – higher latency than L2, usually has 2048 KiB or more
  • Main memory – may take hundreds of cycles, but can be multiple gigabytes. Access times may not be uniform, in the case of a NUMA machine.
  • Disk storage – millions of cycles latency if not cached, but very large
  • Tertiary storage – several seconds latency, can be huge

Note that the hobbyist who reads "L1 cache" in the computer specifications sheet is reading about the 'internal' memory hierarchy .

Most modern CPUs are so fast that for most program workloads, the bottleneck is the locality of reference of memory accesses and the efficiency of the caching and memory transfer between different levels of the hierarchy.[citation needed] As a result, the CPU spends much of its time idling, waiting for memory I/O to complete. This is sometimes called the space cost, as a larger memory object is more likely to overflow a small/fast level and require use of a larger/slower level.

Modern programming languages mainly assume two levels of memory, main memory and disk storage, though in assembly language and inline assemblers in languages such as C, registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):

  • Programmers are responsible for moving data between disk and memory through file I/O.
  • Hardware is responsible for moving data between memory and caches.
  • Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.

Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory hierarchy will be assessed during code refactoring.

What a memory hierarchy is not

Although a file server is a virtual device on the computer, it was not a design consideration of the computer architect. Network Area Storage (NAS) devices optimize their own memory hierarchy (to optimize their physical design). On-line" (secondary storage) here refers to the "network" from the CPU's point of view, which is "on" the computer "line"—the hard drive. "Off-line" (tertiary storage) here refers to "infinite" latency (waiting for manual intervention). It is the boundary of the realm of the 'memory hierarchy'.

Memory hierarchy refers to a CPU-centric latency (delay)—the primary criterion for designing a placement in storage in a memory hierarchy—that fits the storage device into the design considerations concerning the operators of the computer. It is used only incidentally in operations, artifactually. Its primary use is in thinking about abstract machines.

Network architect's use the term latency directly instead of memory hierarchy because they do not design what will become on-line. A NAS may have this article in "computer data storage" but the memory hierarchy of the computers which read and edit it do not care how.

See also

References

  1. ^ a b Ty, Wing; Zee, Benjamin (1986). Computer Hardware/Software Architecture. Bell Telephone Laboratories, Inc. p. 30. ISBN 0-13-163502-6.
  2. ^ Write-combining
  3. ^ "Memory Hierarchy". Unitity Semiconductor Corporation. Retrieved 16 September 2009.
  4. ^ Pádraig Brady. "Multi-Core". Retrieved 16 September 2009.
  5. ^ a b c van der Pas, Ruud (2002). (Document). Santa Clara, California: Sun Microsystems. p. 26. 817-0742-10. {{cite document}}: Cite has empty unknown parameters: |editor2-first=, |editor-first=, |editor2-last=, |editor-last=, and |series= (help); Missing or empty |title= (help); Unknown parameter |contribution-url= ignored (help); Unknown parameter |contribution= ignored (help); Unknown parameter |url= ignored (help)