Symmetric multiprocessor system

From Wikipedia, the free encyclopedia
Jump to: navigation, search
A diagram of a symmetric multiprocessor system

A symmetric multiprocessor system (SMP) is a multiprocessor system with centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors—i.e., it is not a heterogeneous computing system.

More precisely, an SMP is a tightly coupled multiprocessor system with a pool of homogeneous processors running independently, each processor executing different programs and working on different data, with the capability to share resources (memory, I/O device, interrupt system, etc.), and connected using a system bus or a crossbar. An SMP attempts to balance workload across its processors in order to optimize performance. If performance improved with additional processors in a perfectly linear manner, two processors would run twice as fast as one and ten processors would run ten times as fast as a single processor. However, this performance increase is not necessarily linear due to critical sections in the operating system that must be executed serially. This limitation on speedup based on serial sections is represented in Amdahl's law.[1][2][3]

Each processor usually has an associated private high-speed memory known as cache memory (or cache) to speed-up the MM data access and to reduce the system bus traffic. These caches store values temporarily that may be accessed by a processor multiple times. This reduces latency by preventing multiple unnecessary memory accesses. A single cache can be used per processor, or multiple levels of cache memory may be employed to further increase efficiency.[3]

According to Flynn's taxonomy, a SMP is a type of multiple instruction, multiple data (MIMD) architecture with uniform memory access (UMA) since all processors have the same latency when accessing memory. It is important to note that the term SMP is sometimes loosely used to describe architectures where memory access does not differ by a significant amount (although it may differ somewhat). [4]

Terminology[edit]

Sometimes the term "symmetric multiprocessor" is confused with the term symmetric multiprocessing.

While multiprocessing is a type of processing in which two or more processors work together to process more than one program simultaneously, the term "multiprocessor" refers to the hardware architecture that allows multiprocessing.

The term "multiprocessor" is the opposite of the term "uniprocessor".

The term "symmetric multiprocessor" is used in the majority of the technical papers.[5][6][7][8][9][10]

Interconnection[edit]

Some SMP architectures use a bus to connect the processors, memory, and IO devices. At a high level, a bus is implemented as a set of wires that allows each processor to post a command on the bus as well as listen for commands from other processors. A bus may be either synchronous where all the connected devices share a common clock, or asynchronous where they do not share a common clock. An asynchronous bus requires some form of protocol / handshake to establish communication which can result in slower performance. The determination of which type to use is based on the expected number of devices and length of the bus. In a smaller system with a fixed number of devices, a synchronous bus is often used. However, in a system with an unknown number of devices with potentially long clock lines (and thus high skew), such as an IO bus, the asynchronous bus can be advantageous. In a multiprocessor system, several ancillary hardware components are necessary in order to achieve cache coherence across processors connected to the bus. [11]

Some other architectures use a crossbar[12](xbar). A crossbar switch is a matrix of switches that allows various nodes to be connected by closing a series of the switches. In the mid 1800s, mechanical switches were used to route telegram messages. Today, crossbar switches are usually implemented with semiconductor technology. An important feature of a crossbar switch is the potential for multiple nodes to communicate simultaneously. Therefore, in contrast to a bus, a crossbar switch is not inherently serial. This can be both an advantage in terms of communication speed, and a problem when it introduces race conditions. Some architectures use a bus for the address and a crossbar for data (Data crossbar). [13][14][15]

In all these architectures, a coherence controller is added between each cache and the memory interconnection. The coherence controller's job is to maintain cache coherence. An outstanding transaction table within the coherence controller keeps track of bus transactions that have not finished. A bus snooper is connected directly to the bus to monitor bus traffic and listen for operations on data that is currently cached. If after consulting the cache tag array, the block is found in a cached state, the coherence controller has to determine whether to respond by changing the state of the block and/or flushing its data to the bus. A finite state machine is implemented within the coherence controller to determine state changes for blocks. A write-back buffer / queue is employed to store data that needs to be flushed. This may be necessary in order to respond to a snooped bus transaction or simply a normal eviction of a dirty block. [11]

Bus transactions must not only be snooped by the caches but the memory as well. The memory may need to respond by supplying data to the bus or accepting data from the bus depending on the type of transaction that is snooped. Contrary to the cache's controllers however, the memory does not need to maintain states so the finite state machine is not necessary.

Examples of SMPs[edit]

The SMP architecture is utilized in many microprocessors including:
Pentium Pro Quad
Sun Enterprise
[4]

References[edit]

  1. ^ "An Introduction to the New IBM e-server pSeries High Performance Switch" - Glossary pg. 246 - http://www.redbooks.ibm.com/redbooks/pdfs/sg246978.pdf
  2. ^ Locking in OS Kernels for SMP Systems - http://irl.cs.ucla.edu/~yingdi/web/paperreading/smp_locking.pdf
  3. ^ a b "Patent US6349369 - Protocol for transferring modified-unsolicited state during data intervention". google.nl. 
  4. ^ a b Solihin, Yan (2008–2009). Fundamentals of Parallel Computer Architecture. Madison, WI: OmniPress. p. 13. ISBN 978-0-9841630-0-7. 
  5. ^ "Brevetto US8453122 - Symmetric multi-processor lock tracing". google.com. 
  6. ^ http://www8.cs.umu.se/kurser/5DV016/VT09/assignments/A2/mm.pdf
  7. ^ Intel MultiProcessor Specification - 2. System Overview - http://pdos.csail.mit.edu/6.828/2007/readings/ia32/MPspec.pdf
  8. ^ "Patent US7103631 - Symmetric multi-processor system". google.com. 
  9. ^ http://www.uspto.gov/web/patents/patog/week06/OG/html/1387-1/US08370595-20130205.html
  10. ^ http://www.uspto.gov/web/patents/patog/week49/OG/html/1385-1/US08327372-20121204.html
  11. ^ a b Solihin, Yan (2008–2009). Fundamentals of Parallel Computer Architecture. Madison, WI: OmniPress. p. 199-201. ISBN 978-0-9841630-0-7. 
  12. ^ AMD Opteron Shared Memory MP Systems – http://www.cse.wustl.edu/~roger/569M.s09/28_AMD_Hammer_MP_HC_v8.pdf
  13. ^ Multi-processor system with shared memory – http://www.freepatentsonline.com/5701413.html
  14. ^ Method for transferring data in a multiprocessor computer system with crossbar interconnecting unit – http://www.google.com/patents/EP0923032A1?cl=en
  15. ^ Specification and Verification of the PowerScale Bus Arbitration Protocol: An Industrial Experiment with LOTOS, Chap. 2, Pag. 4 – ftp://ftp.inrialpes.fr/pub/vasy/publications/cadp/Chehaibar-Garavel-et-al-96.pdf