|This article needs additional citations for verification. (September 2009)|
Each 64-bit Cyclops64 chip (processor) will run at 500 megahertz and contain 80 processors. Each processor will have two thread units and a floating point unit. A thread unit is an in-order 64-bit RISC core with 32 kB scratch pad memory, using a 60-instruction subset of the Power Architecture instruction set. Five processors share a 32 kB instruction cache.
The processors will be connected with a 96 port, 7 stage non-internally blocking crossbar switch. They will communicate with each other via global interleaved memory (memory that can be written to and read by all threads) in the SRAM.
The theoretical peak performance of a Cyclops64 chip is 80 gigaflops (this assumes a continuous stream of multiply–accumulate instructions, each of which are counted as two floating-point operations). A full system (consisting of 2 thread units per processor, 80 processors per chip, 1 chip per board, 48 boards per midplane, 3 midplanes per rack, and 96 (12 x 8) racks per system) would contain 13,824 C64 chips, consisting of 1,105,920 processors capable of running 2,211,840 concurrent threads.
Cyclops64 exposes much of the underyling hardware to the programmer, allowing the programmer to write very high performance, finely tuned software. One negative consequence is that efficiently programming Cyclops64 is difficult.
Design and fabrication
Verification testing and system software development is being done at the University of Delaware.