Jump to content

Cell (processor): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Terrapin (talk | contribs)
mNo edit summary
Line 3: Line 3:
While the Cell chip can have a number of different configurations, the basic configuration is composed of one "Processing Element" ("PE"), and eight "Synergistic Processing Units" ("SPU"). The PE is based on the [[IBM POWER|POWER]] Architecture, basis of their existing POWER line and related to the [[PowerPC]] used by [[Apple Computer]] and others. The PE is not the primary processor for the system, but acts as a controller for the other eight SPUs, which handle most of the computational workload.
While the Cell chip can have a number of different configurations, the basic configuration is composed of one "Processing Element" ("PE"), and eight "Synergistic Processing Units" ("SPU"). The PE is based on the [[IBM POWER|POWER]] Architecture, basis of their existing POWER line and related to the [[PowerPC]] used by [[Apple Computer]] and others. The PE is not the primary processor for the system, but acts as a controller for the other eight SPUs, which handle most of the computational workload.


Each SPU is a [[VLIW]] 128-bit [[vector processor]] with 256 kB of local high speed memory, which is also visible to the PE to be loaded with data and programs as needed. The SPU's memory is also accessible from other SPUs , allowing data to be processed by one SPU and then handed off to the next at very high speed. In general use the system will load the SPUs with small programs, chaining the SPUs together to handle each step in a complex operation. For instance, a [[set top box]] could load up programs for reading a DVD, video and audio decoding, and display, and the data would be passed off from SPU to SPU until finally ending up on the TV. Each SPU gives 32 [[GFLOPS]] of performance, thereby giving the SPUs 256 GFLOPS of performance in single precision computation. Double precision computation performance is expected to be 25 to 30 GFLOPS in total. Performance of the PE's [[VMX]] unit is unclear, but should be around 32 GFLOPS in addition to the SPUs.
Each SPU is a [[VLIW]] 128-bit [[vector processor]] with 256 kB of local high speed memory, which is also visible to the PE to be loaded with data and programs as needed. The SPU's memory is also accessible from other SPUs , allowing data to be processed by one SPU and then handed off to the next at very high speed. In general use the system will load the SPUs with small programs, chaining the SPUs together to handle each step in a complex operation. For instance, a [[set top box]] could load up programs for reading a DVD, video and audio decoding, and display, and the data would be passed off from SPU to SPU until finally ending up on the TV. Each SPU gives 32 [[GFLOPS]] of performance, thereby giving the SPUs 256 GFLOPS of performance. Performance of the PE's [[VMX]] unit is unclear, but should be around 32 GFLOPS in addition to the SPUs.


In some ways the Cell system resembles early [[Seymour Cray]] designs in reverse. The famed [[CDC 6600]] used a single very fast processor to handle the math, while a series of ten slower systems were given smaller programs to keep the [[main memory]] fed with data. In the Cell the problem has been reversed, reading the data is no longer the difficult problem due to the complex encodings used in industry; today the problem is efficiently decoding that data into an ever-less-compressed version as quickly as possible.
In some ways the Cell system resembles early [[Seymour Cray]] designs in reverse. The famed [[CDC 6600]] used a single very fast processor to handle the math, while a series of ten slower systems were given smaller programs to keep the [[main memory]] fed with data. In the Cell the problem has been reversed, reading the data is no longer the difficult problem due to the complex encodings used in industry; today the problem is efficiently decoding that data into an ever-less-compressed version as quickly as possible.

Revision as of 20:45, 13 February 2005

The Cell is a microprocessor design being developed by IBM in cooperation with Toshiba and Sony. The Cell chip is intended to be scalable from handheld devices to mainframe computers by utilizing parallel processing. Sony plans to use the chip in their PlayStation 3 game console.

While the Cell chip can have a number of different configurations, the basic configuration is composed of one "Processing Element" ("PE"), and eight "Synergistic Processing Units" ("SPU"). The PE is based on the POWER Architecture, basis of their existing POWER line and related to the PowerPC used by Apple Computer and others. The PE is not the primary processor for the system, but acts as a controller for the other eight SPUs, which handle most of the computational workload.

Each SPU is a VLIW 128-bit vector processor with 256 kB of local high speed memory, which is also visible to the PE to be loaded with data and programs as needed. The SPU's memory is also accessible from other SPUs , allowing data to be processed by one SPU and then handed off to the next at very high speed. In general use the system will load the SPUs with small programs, chaining the SPUs together to handle each step in a complex operation. For instance, a set top box could load up programs for reading a DVD, video and audio decoding, and display, and the data would be passed off from SPU to SPU until finally ending up on the TV. Each SPU gives 32 GFLOPS of performance, thereby giving the SPUs 256 GFLOPS of performance. Performance of the PE's VMX unit is unclear, but should be around 32 GFLOPS in addition to the SPUs.

In some ways the Cell system resembles early Seymour Cray designs in reverse. The famed CDC 6600 used a single very fast processor to handle the math, while a series of ten slower systems were given smaller programs to keep the main memory fed with data. In the Cell the problem has been reversed, reading the data is no longer the difficult problem due to the complex encodings used in industry; today the problem is efficiently decoding that data into an ever-less-compressed version as quickly as possible.

In other ways the Cell resembles a modern desktop computer on a single chip. Modern graphics cards have multiple elements very similar to the APU's, known as vertex shader units, with an attached high speed memory. Programs, known as shaders, are downloaded onto the units to process the basic geometry fed from the computer's CPU, apply styles and display it. The main differences are that the Cell's SPUs appear to be much more general purpose than the average graphics card shader units, and the ability to chain the SPUs under program control offers considerably more flexibility, allowing the Cell to handle graphics, sound, or anything else. Given that the Cell is intended to be used in the PlayStation, the idea of a CPU+graphics card combination that is the fastest in the world is not entirely surprising.

Cell allows for multiple processing units to be put onto one die, and the patent showed four on one die, called the "Broadband Engine", potentially giving over 1 Teraflops of performance. It is unclear how many processing units will be incorporated into either the PlayStation 3 or workstations.

Early versions of Cell may clock around 4 to 5 GHz, and have been tested up to 5.2 GHz. According to Sony, the chips are in early production for workstations, using IBM's 90 nanometre process, with full production using Sony's 65-nm process, with 45-nm process a distinct possibility for PlayStation 3, at their Nagasaki fabrication plant. Sony currently is using its 90-nm process to produce the integrated GS/EE for the PSX, the Japan-only combination PlayStation2/DVR unit.

There will be several versions of the Cell chip with varying number of processing units depending on the device where the chip is used. The companies designing the chip have claimed that by scaling the number of units in the chip, as well as the number of PEs on a single die, or by linking multiple chips to each other via network or memory bus, supercomputer-like performance can be made available in consumer devices.

Similar multiple-core designs include Sun Microsystems' MAJC (pronounced "magic"). The first MAJC chip was originally designed for multimedia processing, although Sun have subsequently repositioned the MAJC chip as a high-end graphics processor for workstations. In addition, Stanford University's Imagine Stream Processor shares a similar conceptual underpinning.