Talk:Symmetric multiprocessing

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Interpretation of SMP[edit]

This article is incorrect in its interpretation of SMP. SMP (Symmetric MultiProcessing) refers to the capability of any part of the operating system to execute on any processor. Asymmetric MP is a system where key portions of the OS such as IO operations can only execute on the "master" CPU. Applications code can also execute on "slave" CPUs. Asymmetric MP is typically easier to implement but does not scale as well as SMP because the "master" cpu becomes a bottleneck. SMP avoids this by allowing all code to execute on any available CPU. This requires reentrant OS code.

NUMA and UMA refer to memory access in shared memory MP architectures (usually SMP). UMA (Uniform Memory Access) is generally implemented as a bus where each CPU has essentially the same path to shared memory. This is difficult to implement in systems with large numbers of CPUs, though examples have existed with 64 CPUs. In this design the memory bus eventually becomes a bottleneck. To avoid this, NUMA (NonUniform Memory Access) systems are typically composed of building blocks of small UMA SMP nodes with two to four CPUs and some local memory linked by high speed networks so that any CPU can access all addressable memory. Access to nonlocal memory is slower. There are usually several tiers of networking in very large NUMA systems with over a thousand CPUs. These systems scale better than UMA because with good locality of reference and intelligent scheduling much data required by a given CPU will be held in local memory avoiding bus contention. The term ccNUMA means cache coherent NUMA. Some provision such as bus snooping or a directory is used to maintain a coherent picture of shared memory in the cache of each processor. All major commercial NUMA machines are cache coherent, so the cc is often dropped.

Another popular multiprocessing model is the distributed memory cluster. In this case you have a dedicated network of independent computing nodes which do not have a shared address space. These systems employ message passing to communicate data between nodes. This requires a different approach to programming since data resides on specific nodes rather than in a single shared address space. Distributed clusters are generally far less costly than shared memory multiprocessors of similar size.

— Preceding unsigned comment added by 64.136.49.229 (talkcontribs) 11:14, 1 January 2005‎

This article is incorrect in its interpretation of SMP. SMP (Symmetric MultiProcessing) refers to the capability of any part of the operating system to execute on any processor.

Ah...No, that would be a multi-programmed OS, or a Multi-processor aware applicion.

Asymmetric MP is a system where key portions of the OS such as IO operations can only execute on the "master" CPU.

An Example being a PowerMacintosh 9500 180/MP.

Applications code can also execute on "slave" CPUs.

Actually Applications can execute functions on the 'slave' CPU. When an application is executed, its loaded into memory, and the OS passes control to it.

Asymmetric MP is typically easier to implement but does not scale as well as SMP because the "master" cpu becomes a bottleneck.

Only for certain types of applications.

SMP avoids this by allowing all code to execute on any available CPU. This requires reentrant OS code.

As most applications are. There are a few badly behaved applications, like games, that have to manage multi-programming themselves, but for the most part,

Another popular multiprocessing model is the distributed memory cluster. In this case you have a dedicated network of independent computing nodes which do not have a shared address space. These systems employ message passing to communicate data between nodes. This requires a different approach to programming since data resides on specific nodes rather than in a single shared address space. Distributed clusters are generally far less costly than shared memory multiprocessors of similar size.

Distrubuted memory clusters, for the most part are NUMA machines. Due to efficency, each memory segment has multiple processors. There are many examples of this. The message passing can occur on a dedicated processor bus, the system bus, an I/O bus, an I/O bus to Ethernet/Myranet or custom communcation fabrics like the MassPar.
— Preceding unsigned comment added by Artoftransformation (talkcontribs) 13:04, 5 November 2007‎

SMP Optimisation[edit]

How do you design and optimise application software to run under SMP? Surely if the application is designed to run as one large (monolithic) process, then it will sit on one CPU and the other CPUs will be idle? Does or can a Java Virtual Machine result in multiple processes under SMP? — Preceding unsigned comment added by Robertbowerman (talkcontribs) 08:16, 2 June 2005‎

Make your program multi-threaded, using something like NPTL. Usually this results with threads running on different CPUs. Not sure if the JVM is multi-threaded. --68.235.128.173 17:04, 12 July 2006 (UTC)
However, Many SMP servers make effective use of SMP by running multiple single-threaded instances of an application instead of a multi-threaded application. A related use is when compiling a large application: the "make" program can be configured to launch multiple instances of the compiler. Note that while each instance only runs on one processor at a time, the OS can (and often does) use different processors at different points inthe application's life. It usually makes sense to tell "make" to dispatch one or two "extra" compilations, so if you have two real processors, tell make to dispatch three or four concurrent compiles. When an instance goes idle while waiting for I/O, the OS switches thta processor to another instance. -Arch dude 19:03, 20 January 2007 (UTC)
This question has nothing to do with NUMA and only generically to do with SMP. Some JVMs are multi-threaded; some aren't. The Sun JVM is heavily multi-threaded. Note that even in multi-threaded VMs and frameworks there can be some serious lock contention. For example, while Python is multi-threaded the CPython core has a "GIL" or "global interpreter lock" which is used to maintain the interpreter's state consistency during operations on almost all objects in the running environment. This, in practice, means that Python scripts using their native threading features don't scale evenn to two CPUs in computationally intensive tasks. For those environments threading is primarily useful as an abstraction model and for I/O multi-plexing (mostly used for GUI handling frameworks and networking respectively). JimD 01:07, 8 September 2006 (UTC)
You use a compiler and language that supports multiple threads, and remove as much time dependent code as possible. Symmetric implies that all the processors are uniform, and they are closely coupled. So much has been written about how to write multi-threaded code, it does not bear repeating. Consider this: Add a set of numbers. How to optimize this for SMP? Divide the number of numbers by the available processors, split up the problem, and converge at the end. You could further optimize this by looking at the number of processors that are idle. A historic refrence for this is how Richard Feynman optimized computation at Los Alamos. Punched cards etc. Look it up. —Preceding unsigned comment added by 67.188.118.64 (talk) 06:37, 18 September 2007 (UTC)

Add History of SMP Section[edit]

Could somebody add a 'history of' section? Fdgfds 03:00, 4 April 2006 (UTC)

I added some history, but we really need to merge the "entry" and "mid-level" sections and then rework the whole thing. As written, this is all about x86 with the rest barely present. -Arch dude 19:03, 20 January 2007 (UTC)
Along similar lines, the section on "Entry-level systems" should be rewritten with respect to multi-core systems. For example:
  • "... Core 2 Duo ... all have multi-core versions": In fact, Core 2 Duo is _only_ available with multiple cores
  • "In all cases, these systems are available in uniprocessor versions as well.": Not true, most Apple computers today are only available with multiple cores, as are most mid-level Windows systems and all workstations.
Gglockner 13:32, 4 April 2007 (UTC)

Isn't NUMA is a Qualifier for SMP?[edit]

I think the article is wrong in claiming that NUMA is a non-symmetrical form of MP. However, I don't consider myself to be enough of an expert to back up that assertion with an authoritative reference. My understanding is that all NUMA systems still provide "symmetrical" access to all main system memory --- but that this access isn't "uniform" (that some memory is much "harder" to get to from some CPUs --- and therefore is much slower). The need for NUMA arises from scaling MP past a certain point (which depends on the speeds of the CPUs and the interconnects among them but is approach by about 8 CPUs and practically unavoidable past 16).

The key point to understand about NUMA vs. "UMA" is the affect on software, particular OS and scheduler, design. (Note: as far as I know "UMA" is a back-formation from NUMA to describe the default memory access design goal; giving us something to which to compare NUMA).

Because NUMA is (usually?) a form of SMP one can run any MP capable system on a NUMA system. However, if the OS/scheduler and memory management system is not NUMA aware then the coherency/locking that results from "remote" memory accesses (in the hardware) will incur far more overhead than would occur with a properly NUMA aware system. NUMA aware software has additional code for understanding the geometry or layout of the CPU/memory interconnections so that allocations of memory preferentially use "local" memory and scheduling preferential constrains execution to CPUs which are "close" to the existing memory page set (that's already been allocated). (Of course the issue of Processor affinity affects scheduler design on UMA as well as NUMA machines).

Linus Torvalds once pointed out (to me, User:JimD) that the code necessary to implement NUMA awareness was similar to some code that's necessary to handle different memory "zones" on the (32-bit) PC architecture that result from its history of extensions. Even on single CPU systems access to some sorts of memory (such as PAE) is much slower than access to others. Also on the PC architecture there are constraints on which memory is accessible to the DMA controllers (based on whether they are ISA or PCI controller chips, among other things) which necessitates the use of "bounce buffers" for some I/O. The point of this being that their are some non-uniform aspects to memory access that are inherent in the PC design, even for single CPU systems. (This was just a casual conversation in some restaurant or at some trade show ... so I can't offer any link to it).

Its unlikely that Linux is running on any significant NUMA installation. Most NUMA Installations are using IRIX or custom coded programs. According to digests of the TOP 500 supercomputer lists, almost all of the NUMA Machines are running -> "Silicon Graphics' NUMALink" [1] Artoftransformation 08:20, 19 September 2007 (UTC)

Anyway, I'd love for more authoritative contributor to either fix up the article or comment here on whether my interpretation is correct (that NUMA is a form of SMP) ... or to provide a reference to a credible counter-example or refutation.

SMP is a processor architecture, NUMA is a memory architecture. MOST NUMA Machines are SMP. Clusters used to be non-SMP. More and more clusters are becoming SMP, but only at the level of commodity multi-core CPUs, not as a result of custom design processor/memory busses. Artoftransformation 08:20, 19 September 2007 (UTC)

JimD 00:59, 8 September 2006 (UTC)

I agree with your interpretation. --Bkkbrad 17:46, 30 November 2006 (UTC)
You cannot define SMP without looking at the alternatives, asymmetric MP and "coprocessor". The problem is, virtually nobody has made a AMP system for a while. We can agree that a coprocessor has a inherently different instruction capabilities (floating point vs integer, DSP vs microprocessor). The last definitely asymmetric system was probably the Intergraph 486s. They have several characteristics: not application transparent, interrupts only on one processor, IO and memory space of the second processor is limited or independent (regardless of speed) while the first is unlimited, and the processors are identical (both x86).
Interrupts has been traditionally a definition, but that's blurry. Some systems (Cray-like supercomputers) with very light I/O capabilities only have one processor handling interrupts and I/O. Likewise, most large distributed (NUMA) systems force interrupts to be handled local to the node. Pre-APIC systems and OS (earlier Linux) have IO only on one processor.
Consider a shared-memory SMP system. You make your program multithreaded or have multiple processes with IPC, and you don't care about general memory speed. (memory mapped IO is not memory, PAE is memory being exposed as MMIO, ignore caches). On a AMP, the first processor explicity assigns tasks for the second processor to run, explicitly load the second processor's memory, and you explicitly handle IO from the second processor. Now look at NUMA: you don't write instructions to handle I/O or memory, but you write the program to such that the system doesn't have to treat I/O or memory as shared. That is, you don't tell the CPU to get memory from another node, but you do things like avoid far accesses and group them together, and keep I/O on the same thread, because these things make a difference.
Basically what does this mean? On SMP, you don't have to care about significant asymmetries the multiprocessing architecture (memory, I/O), because they don't exist. On AMP, you explicitly handle asymmetries with code you wrote. On NUMA, you don't write any code because the OS and hardware handle this for you, but if you don't care about them, and architect your code for them, your program will be slow(er).
Because NUMA requires different programming from SMP and AMP systems, the memory and I/O layout is significant in determining what is "symmetric" and "asymmetric".
169.231.18.68 04:22, 12 May 2007 (UTC)
Almost all TOP500 Supercomputers are both SMP at the level of computational processor, and AMP at the system level. Artoftransformation 08:20, 19 September 2007 (UTC)
SMP is a processor/processor subsystem design, NUMA is a system level design for managing large and very large memory pools between processors. Most SMP implementations are non-NUMA, only because NUMA is found on many non-clustered super computers. ccNUMA is a special case of NUMA, where the memory pools need cache coherence due to the nature of the problem, that the processors have to be in communication about chancing local memory and non-local ( non-uniform ) memory. Most NUMA systems are SMP at the level of local uniform memory sub-systems. ( Infact the example of NUMA systems here is one of these, but this is not nessesarly so ). There are machines, like the Origin 2000 at GPL where some classes of problems require that the cache coherency not be used, i.e. searching and manipulating geographic and atmospheric datasets. where also, cache coherency speeds up problems like atmospheric simulation significicantly. Problems in Q.E.D. can fit the whole range, from Simple SMP to NUMA to ccNUMA to hardware specifically designed with interprocessor communication and memory managment interconnects wired specifically to the problem.
"On SMP Machines you don't have to care about significant asymmetries" No. Case in point, a MASS-PAR MP-1. SMP ( to the tune of 1024 CPUs ), Cache coherent at the level of a 4 processor group. Very fast for smith-waterman, fast for image convolution. slow to sluggish for multiple dimension atmospheric modeling. Origin 2000-ccNUMA. Fast for smith-waterman, slow for image convolution, great at multiple dimension atmospheric modeling. ( cache coherent to the extreme, in fact, one of the largest cache coherent machines ever designed )
"On NUMA, you don't write any code because the OS and hardware handle this for you, but if you dont care about them, and architect your code for them, your program will be slow(er)." In Programming a NUMA system, you have to make assumptions and setup your code so that it is parlell in the extreme, and accesses data mostly at the local SMP processor level, and at worst, on secondary storage.
Source for all this information is "In Search Of Clusters"- Second Edition- Gregory Pfister
Artoftransformation 08:20, 19 September 2007 (UTC)
NO. Flat out NOT. NUMA is a sucessor to SMP. Ask the experts: [Faq for Linux NUMA Kernel developers]

The NUMA architecture was designed to surpass the scalability limits of the SMP architecture. With SMP, which stands for Symmetric Multi-Processing, all memory access are posted to the same shared memory bus. This works fine for a relatively small number of CPUs, but the problem with the shared bus appears when you have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one memory bus, and connecting the various nodes by means of a high speed interconnect.

— Preceding unsigned comment added by 67.188.118.64 (talkcontribs) 01:42, 27 September 2007‎
The article is wrong in claiming that NUMA is a successor to SMP. NUMA resp. UMA is an attribute of shared-memory multiprocessors. Symmetric in SMP means that all cores have identical capabilities, e.g., that every interrupt can be steered to any core. E.g., Intel MP specification conformant multiprocessors are SMPs, since the APIC bus connects all APICs. 192.35.17.13 (talk) 14:16, 17 June 2011 (UTC)

Dis-informaiton Graphic[edit]

In the example graphic the third processor is an I/O processor NOT AVAIBLE FOR SMP Processing, unless the OS or application makes it availble through software. Although its a Dual processing system, capible of SMP, the third processor at the level of the others is entirely misleading and counter productive. Artoftransformation 08:24, 19 September 2007 (UTC)

I think you are right. There have been no objections in three and a half years. I would put "input/output" connected to the shared bus to make it more truly symmetric. W Nowicki (talk) 20:37, 29 May 2011 (UTC)

Amhdals law[edit]

Missing from this article also is any mention of Amhdals law. Ill come back and fix it soon. Artoftransformation 08:24, 19 September 2007 (UTC)

Here is the case for including Amhdal's law.

"In some applications, particularly software compilers and some distributed computing projects, one will see an improvement by a factor of (nearly) the number of additional processors"

I would ask for citations, but I can see that clearly there are some in mind. I would add, that Rendering( use of software such as Renderman Pro, DreamNet, Backrounder or Extreme3D ), and other embaressingly parelell applications will see a linter improvement, but having run DistCC ( The distributed c compiler )on both a AMD based rendering farm, and an Intel based rendering farm, neither showed linear results. ( and I never got to the heart of the problem of WHY an AMD machine could never sucsessfully compile for intel P6).
I would also like to add, that a colliary of Amhdals law, that ANY process that involves return communication, will eventually stop giving linear response. SETI@HOME ( godamnyou Stewart ), only sends out work units. Since It has had only 3 events in 41 million packets, it can be considered embaressingly parlell. Since compiling requires a huge amount of comunication, its response ( the marginal improvement from adding additional processors ) will never approach linearity.
In the application realm of program compiling, and kernel building, The more processor I threw at the problem, the marginally faster it became, APC=0.11 ( Amhdal's Paralell coefficent ), after running 8 processors, it would saturate the server ( dual processor ), and near the end of the process, it would saturate the backbone ( 100base, now upgraded to GOC ( gigiabit over copper ), nothing exotic as Myranet ) Since this information is anecidotal, and of primary research, its not usable in the main article, and I am trying to get some actual statistics out of the DistCC group.
—Preceding unsigned comment added by Artoftransformation (talkcontribs) 04:09, 1 October 2007 (UTC)
This is just plain wrong: just because you run a compiler on a quadcore doesn't mean you'll get a 4x increase. The compiler has to be designed to compile using multiple threads. This maybe true of distcc, but is certainly NOT true of compilers in general. —Preceding unsigned comment added by 65.28.12.137 (talk) 00:32, 9 March 2008 (UTC)

Shared memory on Amd K8, K9 and K10?[edit]

I think the first sentence of this article doesn't fit well because on newer amd smp-systems each processor has its own exclusive memory (because each processor has its own memory controller) —Preceding unsigned comment added by 84.167.76.110 (talk) 15:09, 4 January 2008 (UTC)

I believe that the newer AMD Opterons use a NUMA memory architecture. The introduction for this article is indeed outdated. I might fix it if I have the time. Rilak (talk) 06:55, 5 January 2008 (UTC)
K8, K9 and K10 are indeed NUMA-Architectures, but according to the german wikipedia article about it, NUMA is the next logical step for more scalability in symmetric Multiprocessoring architectures. —Preceding unsigned comment added by 84.167.72.118 (talk) 14:01, 7 January 2008 (UTC)
The introduction for this article states, "Symmetric multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory." If this definition is correct, then NUMA cannot be classified as being SMP, because in a NUMA system, each processor has its own memory, and is connected to the other processor's memories via multiple interconnects. Consider the Alpha 21364 with its 2D torus network - each processor has a link to its own local memory and connects to the remote memories of other processors using four independent links. The recent Athlons are similar I believe. Rilak (talk) 14:37, 7 January 2008 (UTC)

This is a badly written page[edit]

The first section is "Alternatives to SMP", followed by "Pros and cons". How about a section that actually goes into what SMP *IS* before debating it? Most of the people using this article are going to want a description of SMP first, not an opinion page on it. —Preceding unsigned comment added by 98.207.59.116 (talk) 18:44, 30 April 2008 (UTC)

Advantages and disadvantages[edit]

Removed this line as it is just not true. <bloxkquote>The nature of the different programming methods would generally require two separate code-trees to support both uniprocessor and SMP systems with maximum performance While programs may have to be adapted to be SMP safe (if they were not safe to begin with), there is no inherent reason why a program which has been designed to make use of multiple processors can not also be designed to run optimally on a single processor (and do this adaptation at run time). Maintaining seperate code and binaries is a valid design choice, but it is not the only way. —Preceding unsigned comment added by 206.165.101.124 (talk) 15:22, 17 July 2008 (UTC)

I restored the removed statement as your argument is currently not strong enough to justify the statement's removal. Note that the statement specifically states that it applies generally?. Your argument basically goes like this, "there is no theoretical reason as to why a program cannot be optimised on both uniprocessor and multiprocessor systems." While you are correct, so is the statement you've removed. The fact that a certain operating system is criticized frequently for being slow, because it was optimised for systems with over a hundred processors, lends credibility for the statement removed. Rilak (talk) 06:54, 18 July 2008 (UTC)

SMP Aware/Pseudo SMP[edit]

Hi, I'll be first to admit I'm something of a layman here, but I think it may be useful to include definitions of the above. (partly because I'm looking for some) They're terms I've come across when reading about SMP, and I'll admit that I can't make sense of entire senteces about it because I've not had these explained to me. A quick google search suggests that the terms get thrown around enough to make them worth mentioning.

Does anyone fancy giving it a go? BOMBkangaroo (talk) 21:54, 29 July 2008 (UTC)

Dual-Core SMP vs Multi-Processor SMP?[edit]

Maybe we could also make a difference between CPUs supporting "multi-processor" and "multi-core" setups.

The first is the "original" sense of SMP , that you use a mainboard with more than 1 socket and put in multiple CPUs to get more power. However the newer Processors only support this in the Server lines AFAIK ( Opteron and Xeon). The latter is to create one or more dyes on a single chip so that they can act like multiple processors.

This is interesting for people who are for example wanting to build themselves a cheap workstation as these usually require all the CPU power they can get and it is better to purchase a number of cheap CPUs than to get a single, but expensive, Octal core solution ( provided you have MT apps ) .

84.161.62.124 (talk) 18:33, 8 August 2008 (UTC)

The original sense of SMP was multiple boxes, each of which had many discrete transistors. Multi-core CPU chips are a natural step in the progression. For that matter, virtual multiprocessors are not different in kind, only in implementation. Shmuel (Seymour J.) Metz Username:Chatul (talk) 14:59, 29 November 2010 (UTC)

Core I7 listed as SMP Processor[edit]

The Intel Core I7 cpu is listed as SMP CPU, but isn't it a ccNUMA cpu? —Preceding unsigned comment added by 193.172.64.171 (talk) 15:39, 9 March 2009 (UTC)

Is SMP = UMA?[edit]

According to my knowledge, SMP yet relies on UMA it is not the same. It is a multiprocessor rather than a memory architecture. Every processor of SMP has equal opportunity to execute the kernel and I/O. It is opposed to master-slave architectures, where different functions are assigned to different processors. The equal right to execute any function makes the processors symmetric. Also, uniform memory access does not imply SMP. I can easily imagine an asymmetric system relying on an UMA. --Javalenok (talk) 20:36, 15 May 2009 (UTC)

How can one task in the system be executed on two or more processors at the same time?[edit]

How is the text in bold possible - isn't it an error state?

SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time

—Preceding unsigned comment added by 203.91.193.50 (talk) 11:07, 8 April 2010 (UTC)

Windows Embedded Compact[edit]

fix the link to Windows Embedded Compact maybe to go to Windows CE (alias) —Preceding unsigned comment added by 77.49.231.74 (talk) 17:30, 5 May 2011 (UTC)

64 GiB limitation ?? or 4 GiB ?[edit]

In section "mid-level systems":

they were all limited by the physical memory addressing limitation of 64 GiB.

I believe that 32bit systems are limited to 4 GiB (2^32), not 64 GiB. Could anyone confirm ? SN74LS00 (talk) —Preceding undated comment added 07:50, 5 December 2011 (UTC).

A system with 32-bit physical addresses would be limited to 4 GiB of main memory. A system with 36-bit physical addresses, such as a 32-bit x86 processor with Physical Address Extension, would be limited to 64 GiB of physical memory, even if, at any given time, a single address space could only include 4 GiB of that memory. Many OSes running on x86 processors would give different processes different address spaces, and many of them would let the code running in that process map files and other objects into and out of the address space, so, whilst one process cannot conveniently access more than 4 GiB of memory, it can inconveniently do so on many OSes (for example, most UN*Xes, as well as Windows NT; I don't think Windows 9x supported PAE) by mapping files in and out of the address space, and multiple processes can, together, access more than 4 GiB of memory. Guy Harris (talk) 08:29, 5 December 2011 (UTC)
Excellent explanation, thanks ! SN74LS00 (talk) 20:59, 13 December 2011 (UTC)

What can SMP systems do/allow?[edit]

SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.


I denote that the first sentence does not convey a correct information; SMP systems do not offer any restrains on tasks distributed by processores or cores.

Point #1, Flynn's taxonomy describes a an architecture like the one discussed in the article as multi-core parallelism - one should refrain to use processor in the sentece above, as cores are not physical processores and logical units of processing ( whilst many of us do comprehend the distinction, there are others that do not ).

Point #2, what an SMP system does, in broad strokes, is to have a uniform memory accessible to all logical processing units; this in itself presents programming chalenges as the programmer has to partition the data-set in order to work correctly with the data. case in point: OpenCL

[1] Frederico.almeida.marques (talk) 09:49, 24 March 2014 (UTC)

The term "chip-level multiprocessing" (or [http://www.computerworld.com/s/article/54343/Chip_Multiprocessing "chip multiprocessing"} has been used for multi-core processors. What constitutes a processor is not clearly a single chip - it could be a CPU (which, as the processor page claims, "contains an arithmetic logic unit (ALU) and processor registers", so a dual-core chip has two central processing units in that sense), or it could be a single chip. Guy Harris (talk) 08:25, 25 March 2014 (UTC)


Cite error: There are <ref> tags on this page, but the references will not show without a {{reflist}} template (see the help page).