Coprocessor

A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, encryption or I/O Interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance don't need to pay for it.

Functionality

Coprocessors vary in their degree of autonomy. Some (such as FPUs) rely on direct control via coprocessor instructions, embedded in the CPU's instruction stream. Others are independent processors in their own right, capable of working asynchronously; they are still not optimized for general purpose code, or they are incapable of it due to a limited instruction set focussed on accelerating specific tasks. It is common for these to be driven by DMA, with the host processor building a command list. The PlayStation 2's Emotion engine contained an unusual DSP-like SIMD vector unit capable of both modes of operation.

History

To make best use of mainframe computer processor time, input/output tasks were delegated to separate systems called Channel I/O. The mainframe would not require any I/O processing at all, instead would just set parameters for an input or output operation and then signal the channel processor to carry out the whole of the operation. By dedicating relatively simple sub-processors to handle time-consuming I/O formatting and processing, overall system performance was improved.

Coprocessors for floating-point arithmetic first appeared in desktop computers in the 1970s and became common throughout the 1980s and into the early 1990s. Early 8-bit and 16-bit processors used software to carry out floating-point arithmetic operations. Where a co-processor was supported, floating-point calculations could be carried out many times faster. Math co-processors were popular purchases for users of computer-aided design (CAD) software and scientific and engineering calculations. Some floating-point units, such as the AMD 9511, Intel I8231 and Weitek FPUs were treated as peripheral devices, while others such as the Intel 8087, Motorola 68881 and National 32081 were more closely integrated with the CPU.

Another form of co-processor was a video display coprocessor, as used in the Atari 8-bit family, the Texas Instruments TI-99/4A and MSX home-computers, which were called "Video Display Controllers". The Commodore Amiga custom chipset included such a unit known as the Copper, as well as a Blitter for accelerating bitmap manipulation in memory.

As microprocessors developed, the cost of integrating the floating point arithmetic functions into the processor declined. High processor speeds also made a closely integrated coprocessor difficult to implement. Separately packaged mathematics co-processors are now uncommon in desktop computers. The demand for a dedicated graphics co-processor has grown, however, particularly due to an increasing demand for realistic 3D graphics in computer games.

Intel coprocessors

The original IBM PC included a socket for the Intel 8087 floating-point coprocessor (aka FPU) which was a popular option for people using the PC for CAD or mathematics-intensive calculations. In that architecture, the coprocessor sped up floating-point arithmetic on the order of fiftyfold. Users that only used the PC for word processing, for example, saved the high cost of the coprocessor, which would not have accelerated performance of text manipulation operations.

The 8087 was tightly integrated with the 8086/8088 and responded to floating-point machine code operation codes inserted in the 8088 instruction stream. An 8088 processor without an 8087 could not interpret these instructions, requiring separate versions of programs for FPU and non-FPU systems, or at least a test at run time to detect the FPU and select appropriate mathematical library functions.

Intel 80386DX CPU with 80387DX Math Coprocessor

Another coprocessor for the 8086/8088 central processor was the 8089 input/output coprocessor. It used the same programming technique as 8087 for input/output operations, such as transfer of data from memory to a peripheral device, and so reducing the load on the CPU. But IBM didn't use it in IBM PC design and Intel stopped development of this type of coprocessor.

The Intel 80386 microprocessor used an optional "math" coprocessor (the 80387) to perform floating point operations directly in hardware. The Intel 80486DX processor included floating-point hardware on the chip. Intel released a cost-reduced processor, the 80486SX, that had no floating point hardware, and also sold an 80487SX co-processor that essentially disabled the main processor when installed, since the 80487SX was a complete 80486DX with a different set of pin connections.^[1]

Intel processors later than the 80486 integrated floating-point hardware on the main processor chip; the advances in integration eliminated the cost advantage of selling the floating point processor as an optional element. It would be very difficult to adapt circuit-board techniques adequate at 75 MHz processor speed to meet the time-delay, power consumption, and radio-frequency interference standards required at gigahertz-range clock speeds. These on-chip floating point processors are still referred to as coprocessors because they operate in parallel with the main CPU.

During the era of 8- and 16-bit desktop computers another common source of floating-point coprocessors was Weitek. These coprocessors had a different instruction set from the Intel coprocessors, and used a different socket, which not all motherboards supported. The Weitek processors did not provide transcendental mathematics functions (for example, trigonometric functions) like the Intel x87 family, and required specific software libraries to support their functions.^[2]

Motorola coprocessors

The Motorola 68000 family had the 68881/68882 coprocessors which provided similar floating-point speed acceleration as for the Intel processors. Computers using the 68000 family but not equipped with the hardware floating point processor could trap and emulate the floating-point instructions in software, which, although slower, allowed one binary version of the program to be distributed for both cases. The 68451 memory-management coprocessor was designed to work with the 68020 processor.^[3]

Modern Coprocessors

As of 2002^[update], dedicated Graphics Processing Units (GPUs) in the form of graphics cards are commonplace. Certain models of sound cards have been fitted with dedicated processors providing digital multichannel mixing and real-time DSP effects as early as 1990 to 1994 (the Gravis Ultrasound and Sound Blaster AWE32 being typical examples), while the Sound Blaster Audigy and the Sound Blaster X-Fi are more recent examples.

In 2006, AGEIA announced an add-in card for computers that it called the PhysX PPU. PhysX is designed to perform complex physics computations so that the CPU and GPU do not have to perform these time consuming calculations. It is designed to work with video games, although other mathematical uses could theoretically be developed for it. In 2008 Nvidia purchased the PhysX card and began to phase out the card line; the functionality was added through software allowing the GPU to render PhysX on cores normally used for graphics processing the same way CUDA works.

In 2006, BigFoot Systems unveiled a PCI add-in card they christened the KillerNIC which ran its own special Linux kernel on a FreeScale PowerQUICC running at 400 MHz, calling the FreeScale chip a Network Processing Unit or NPU.

The SpursEngine is a media-oriented add-in card with a coprocessor based on the Cell microarchitecture.The SPUs are themselves vector coprocessors.

In 2008 Khronos Group released the OpenCL with the aim to support general purpose CPUs, ATI/AMD and Nvidia GPUs (and other accelerators) with a single common language for compute kernels.

In 2012 Intel announced the Intel Xeon Phi Co-processor.^[4]

In 2010s, some mobile computation devices had implemented the sensor hub as a coprocessor. Examples of coprocessors used for handling sensor integration in mobile devices include the Apple M7 and M8 motion coprocessors, the Qualcomm Snapdragon Sensor Core and Qualcomm Hexagon, and the Holographic Processing Unit for the Microsoft HoloLens.

As of 2016, various companies are developping coprocessors aimed at accelerating artificial neural networks for vision and other cognitive tasks (e.g. Vision processing units, TrueNorth, and Zeroth).

Other coprocessors

The MIPS architecture supports up to four coprocessor units, used for memory management, floating-point arithmetic, and two undefined coprocessors for other tasks such as graphics accelerators.^[5]
Using FPGA (field-programmable gate arrays), custom coprocessors can be created for acceleration of particular processing tasks such as digital signal processing. (e.g. Zynq, combines ARM cores with FPGA on a single die)
TLS/SSL accelerators, used on servers.
Some multi-core chips can be programmed so that one of their processors is the primary processor, and the other processors are supporting coprocessors.

Trends

Over time CPUs have tended to grow to absorb the functionality of the most popular coprocessors. FPUs are now considered an integral part of a processors' main pipeline; SIMD units gave multimedia acceleration, taking over the role of various DSP accelerator cards; and even GPUs have become integrated on CPU dies. Nonetheless, specialized units remain popular away from desktop machines, and for additional power, and allow continued evolution independently of the main processor product lines.

References

^ Scott Mueller, Upgrading and repairing PCs 15th edition, Que Publishing, 2003 ISBN 0-7897-2974-1, pages 108-110
^ Scott Mueller, Upgrading and Repairing PCs, Second Edition, Que Publishing, 1992 ISBN 0-88022-856-3, pp. 412-413
^ William Ford, William R. Topp Assembly language and systems programming for the M68000 family Jones & Bartlett Learning, 1992 ISBN 0-7637-0357-5 page 892 and ff.
^ "Intel Delivers New Architecture for Discovery with Intel® Xeon Phi™ Coprocessors". Newsroom.intel.com. 2012-11-12. Retrieved 2013-06-16.
^ Erin Farquhar, Philip Bunce, The MIPS programmer's handbook,Morgan Kaufmann, 1994 ISBN 1-55860-297-6, Appendix A3 page 330

[Mueller03-1] Scott Mueller, Upgrading and repairing PCs 15th edition, Que Publishing, 2003 ISBN 0-7897-2974-1, pages 108-110

[Mueller92-2] Scott Mueller, Upgrading and Repairing PCs, Second Edition, Que Publishing, 1992 ISBN 0-88022-856-3, pp. 412-413

[Ford92-3] William Ford, William R. Topp Assembly language and systems programming for the M68000 family Jones & Bartlett Learning, 1992 ISBN 0-7637-0357-5 page 892 and ff.

[4] "Intel Delivers New Architecture for Discovery with Intel® Xeon Phi™ Coprocessors". Newsroom.intel.com. 2012-11-12. Retrieved 2013-06-16.

[5] Erin Farquhar, Philip Bunce, The MIPS programmer's handbook,Morgan Kaufmann, 1994 ISBN 1-55860-297-6, Appendix A3 page 330

[1]

[2]

[3]

[4]

[5]