||This article needs additional citations for verification. (March 2012)|
A floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some systems (particularly older, microcode-based architectures) can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines.
In most modern general purpose computer architectures, one or more FPUs are integrated with the CPU; however many embedded processors, especially older designs, do not have hardware support for floating-point operations.
In the past, some systems have implemented floating point via a coprocessor rather than as an integrated unit; in the microcomputer era, this was generally a single integrated circuit, while in older systems it could be an entire circuit board or a cabinet.
Not all computer architectures have a hardware FPU. In the absence of an FPU, many FPU functions can be emulated, which saves the added hardware cost of an FPU but is significantly slower. Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user space code.
In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some, like the Intel x86 have dedicated floating-point registers, while some take it as far as independent clocking schemes.
Floating-point operations are often pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations. Since the early and mid-1990s, many microprocessors for desktops and servers have more than one FPU.
When a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out:
- A floating-point unit emulator (a floating-point library)
- Add-on FPU
- Integrated FPU
Floating-point library 
|Wikibooks has a book on the topic of: Floating Point/Soft Implementations|
|Wikibooks has a book on the topic of: Embedded Systems/Floating Point Unit|
Some floating-point hardware only supports the simplest operations—addition, subtraction, and multiplication. But even the most complex floating-point hardware has a finite number of operations it can support—for example, none of them directly support arbitrary-precision arithmetic.
When a CPU is executing a program that calls for a floating-point operation not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit.
The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library.
Integrated FPUs 
In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware and/or microcode, while the more complex operations are implemented as machine code routines (i.e. written in assembly language or a compiled high level language).
In some current architectures, the FPU functionality is combined with units to perform SIMD computation; an example of this is the replacement of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.
Add-on FPUs 
In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs.
The IBM PC, XT, and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286-based systems were generally socketed for the 80287, and 80386/80386SX based machines for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet.
Starting with the i486 (Intel dropped the '80' prefix from the 48 series substituting 'i' instead), in x86 chips the floating-point unit was integrated with the CPU, something true for almost all later x86-architecture processors. One notable exception is the i486SX; it was also unusual in that no actual coprocessor was available. Early examples of the i486SX was a full CPU with an integrated FPU; If during hardware testing the FPU of a i486 chip failed while the rest of the CPU hardware passed the FPU would be disabled and the chip would be packaged as a lower cost i486SX. If the yields of i486DX chips were high enough, a fully working FPU would be physically disabled to meet the demands for the i486SX. Eventually i486SX chips were specifically manufactured without the FPU on the die. Later boards where the i486SX chips were soldered directly onto the board, it was not possible to replace the entire chip with a fully functional i486DX chip. Instead the i487SX chip was marketed as the coprocessor for these boards that had coprocessor sockets. In reality the i487SX chip was a full i486 chip that completely disabled the original i486SX chip and took over all CPU operations for the board. The i487SX was almost electrically identical to the i486SX. The sole difference was an extra pin whose sole purpose was to disable the existing i486SX when installed. The i487SX could be used as a substitute i486DX by clipping off the extra pin.
In addition to the Intel x87 series, several other companies manufactured co-processors for the x86 series. These included Cyrix which marketed its FasMath series as higher performance but fully x87 compatible, and Weitek which offered a high-performance but not fully x87 compatible series of coprocessors.
In addition to the Intel architectures, FPUs as coprocessors were available for the Motorola 68000 family line. These FPUs, the 68881 and 68882, were common in Motorola 68020/68030-based workstations like the Sun 3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower end systems. With the 68040, Motorola integrated the FPU and CPU, but like the x86 series, a lower cost 68LC040 without an integrated FPU was also available.
Also, there are add-on FPUs coprocessor units for microcontroller units (MCUs/µCs)/single-board computer (SBCs)' which serve to provide floating-point arithmetic capability in systems that might not otherwise possess said functionality. The difference in these types of FPU coprocessors, when compared to more traditional floating-point coprocessors such as the 80x87 series, is that these add-on FPUs are host-processor-independent, possess their own programming requirements, and are often provided with their own integrated development environments (IDE)s.
See also 
- Arithmetic logic unit (ALU)
- Execution unit
- IEEE floating-point standard (also known as IEEE 754)
- IBM Floating Point Architecture
- Newnes 8086 Family Pocket Book - Ian Sinclair (ISBN 0 4349 1872 5)
- Raymond Filiatreault (2003). "SIMPLY FPU".