Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor (GPP), a special-purpose processor (i.e. digital signal processor (DSP) or graphics processing unit (GPU)), a co-processor, or custom acceleration logic (application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures (ISAs).
The demand for increased heterogeneity in computing systems is partially due to the need for high-performance, highly reactive systems that interact with other environments (audio/video systems, control systems, networked applications, etc.). In the past, huge advances in technology and frequency scaling allowed the majority of computer applications to increase in performance without requiring structural changes or custom hardware acceleration. While these advances continue, their effect on modern applications is not as dramatic as other obstacles such as the memory-wall and power-wall come into play. Now, with these additional constraints, the primary method of gaining extra performance out of computing systems is to introduce additional specialized resources, thus making a computing system heterogeneous. This allows a designer to use multiple types of processing elements, each able to perform the tasks that it is best suited for. The addition of extra, independent computing resources necessarily allows most heterogeneous systems to be considered parallel computing, or multi-core (computing) systems. Another term sometimes seen for this type of computing is "hybrid computing".
The level of heterogeneity in modern computing systems gradually rises as increases in chip area and further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a system-on-chip, or SoC. For example, many new processors now include built-in logic for interfacing with other devices (SATA, PCI, Ethernet, RFID, Radios, UARTs, and memory controllers), as well as programmable functional units and hardware accelerators (GPUs, cryptography co-processors, programmable network processors, A/V encoders/decoders, etc.).
Common features 
Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include:
- ISA or instruction set architecture
- Compute elements may have different instruction set architectures, leading to binary incompatibility.
- ABI or application binary interface
- Compute elements may interpret memory in different ways. This may include both endianness, calling convention, and memory layout, and depends on both the architecture and compiler being used.
- API or application programming interface
- Library and OS services may not be uniformly available to all compute elements.
- Low-Level Implementation of Language Features
- Language features such as functions and threads are often implemented using function pointers, a mechanism which requires additional translation or abstraction when used in heterogeneous environments.
- Memory Interface and Hierarchy
- Compute elements may have different cache structures, cache coherency protocols, and memory access may be uniform or non-uniform memory access (NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
- Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access (DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc.
Heterogeneous platforms often require the use of multiple compilers in order to target the different types of compute elements found in such platforms. This results in a more complicated development process compared to homogeneous systems process; as multiple compilers and linkers must be used together in a cohesive way in order to properly target a heterogeneous platform. Interpretive techniques can be used to hide heterogeneity, but the cost (overhead) of interpretation often requires the use of just-in-time compilation mechanisms that result in a more complex run-time system that may be unsuitable in embedded, or real-time scenarios.
Heterogeneous computing platforms 
||This section needs additional citations for verification. (March 2013)|
Heterogeneous computing platforms can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets.
- Embedded Systems (DSP and Mobile Platforms)
- Reconfigurable Computing
- General Purpose Computing, Gaming, and Entertainment Devices
Hybrid-core computing 
Hybrid-core computing is the technique of extending a commodity instruction set architecture (e.g. x86) with application-specific instructions to accelerate application performance. It is a form of heterogeneous computing wherein asymmetric computational units coexist with a "commodity" processor.
Hybrid-core processing differs from general heterogeneous computing in that the computational units share a common logical address space, and an executable is composed of a single instruction stream—in essence a contemporary coprocessor. The instruction set of a hybrid-core computing system contains instructions that can be dispatched either to the host instruction set or to the application-specific hardware.
Typically, hybrid-core computing is best deployed where the predominance of computational cycles are spent in a few identifiable kernels, as is often seen in high-performance computing applications. Acceleration is especially pronounced when the kernel’s logic maps poorly to a sequence of commodity processor instructions, and/or maps well to the application-specific hardware.
Hybrid-core computing is used to accelerate applications beyond what is currently physically possible with off-the-shelf processors, or to lower power & cooling costs in a data center by reducing computational footprint. (i.e., to circumvent obstacles such as the power/density challenges faced with today's commodity processors).
Programming Heterogeneous Computing Architectures 
Programming heterogeneous machines can be difficult since developing programs that make best use of characteristics of different processors increases the programmer's burden. Requiring hardware specific code to be interleaved throughout application code increases the complexity and decreases the portability of software on heterogenous architectures. Balancing the application workload across processors can be problematic, especially given that they typically have different performance characteristics. There are different conceptual models to deal with the problem; for example, by using a coordination language and program building blocks (programming libraries and/or higher-order functions). Each block can have a different native implementation for each processor type. Users simply program using these abstractions and an intelligent compiler chooses the best implementation based on the context.
See also 
- IBM. "Cell Broadband Engine Programming Tutorial". Retrieved 2009-05-06.[dead link]
- John Shalf. "The New Landscape of Parallel Computer Architecture". Retrieved 2010-02-25.
- Michael Gschwind. "The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor". International Journal of Parallel Programming.
- Brodtkorb, André Rigland; Christopher Dyken, Trond R. Hagen, Jon M. Hjelmervik, Olaf O. Storaasli (May 2010). "State-of-the-Art in Heterogeneous Computing". Scientific Programming 18: 1–33.
- "Heterogeneous Processing: a Strategy for Augmenting Moore's Law". Linux Journal. Retrieved 2007-10-03.
- "Visions for Application Development on Hybrid Computing Systems". Retrieved 2009-02-09. Text " p " ignored (help)
- Brian Flachs, Godfried Goldrian, Peter Hofstee, Jorg-Stephan Vogt. "Bringing Heterogeneous Multiprocessors Into The Mainstream". 2009 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'09).
- Cray Computers. "Cray XD1 Datasheet." Retrieved March 22, 2013
- Ron Wilson, EDN. "Xilinx FPGA introductions hint at new realities." February 2, 2009 Retrieved June 10, 2010.
- Mike Demler, EDN. "Xilinx integrates dual ARM Cortex-A9 MPCore with 28-nm, low-power programmable logic." March 1, 2011. Retrieved March 1, 2011.
- "What is Heterogeneous System Architecture (HSA)?". AMD. March 31, 2013. Retrieved March 31, 2013.
- "A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor" (PDF). Hot Chips 17. August 15, 2005. Retrieved 1 January 2006.
- "Vector Unit Architecture for Emotion Synthesis". IEEE Micro. March 2000. Retrieved March 31, 2013.
- Heterogeneous Processing: a Strategy for Augmenting Moore's Law". Linux Journal 1/2/2006. http://www.linuxjournal.com/article/8368
- "New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies," Fred Pollack, Director of Microprocessor Research Labs http://research.ac.upc.edu/HPCseminar/SEM9900/Pollack1.pdf
- Kunzman, D. M.; Kale, L. V. (2011). "Programming Heterogeneous Systems". 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. p. 2061. doi:10.1109/IPDPS.2011.377. ISBN 978-1-61284-425-1.
- Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Andrew Richards, Raymond Namyst, Beverly Bachmayer, Christoph Kessler, David Moloney, Peter Sanders (2012), "The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures", Advances in Parallel Computing, IOS Press 22: 361–368 Unknown parameter
- John Darlinton, Moustafa Ghanem, Yike Guo, Hing Wing To (1996), "Guided Resource Organisation in Heterogeneous Parallel Computing", Journal of High Performance Computing 4 (1): 13–23