The Cray-3/SSS (Super Scalable System) was a pioneering massively parallel supercomputer project that bonded a two-processor Cray-3 to a new SIMD processing unit based entirely in the computer's main memory. It was apparently later considered as an add-on for the Cray T90 series in the form of the T94/SSS, but it seems highly unlikely this was ever built.
The SSS project started after a Supercomputing Research Center (SRC) engineer, Ken Iobst, noticed a novel way to implement a parallel computer. Previous massively SIMD designs, like the Connection Machines, consisted of a large number of individual processing elements consisting of a simple processor and some local memory. Results that needed to be passed from element to element were passed along networking links at relatively slow speeds. This was a serious bottleneck in most parallel designs, which limited their use to certain roles where these interdependancies could be reduced. Iobst's idea was to use the super-fast scatter/gather hardware from the Cray-3 to move the data around instead of using a separate network. This would offer at least an order of magnitude better performance than systems based on "commodity" hardware. Better yet, the machine would still include a complete Cray-3 CPU, allowing the machine as a whole to use either SIMD or vector instructions depending on the particulars of the problem.
Now all that remained was the selection of a processor. Since the machine had a vector processor for heavy computing, the SIMD processors themselves could be considerably simpler, handling only the most basic instructions. This is where the SSS concept was truly unique; since the problem with most SIMD machines was moving data around, Iobst suggested that the processors be built into the SRAM chips themselves. Memory is normally organized within the RAM chips in a row/column format, with a controller on the chip reading requested data from the chip in parallel across the rows, then assembling the results into 32- or 64-bit words for processing by the CPU. In the SSS concept the chips would also be equipped with a series of single-bit computers operating on a particular column of all the rows at once—this meant that the processors could access data at incredible speeds, about 100x as fast as normal. Add to this the speed of the "network" implemented by the scatter/gather hardware, and the system could be scaled to sizes considerably greater than existing SIMD systems.
Each processor could accept two commands every 200 nanoseconds, for an effective cycle rate of 100 ns (10 MHz). A fully equipped system with 1,024,000 processors would have an aggregate processing capability of 32 TFlops.
In August 1994 the NSA contracted CCC to build a 512,000 processor design with 2,048 processors per RAM chip. National Semiconductor was selected to produce Iobst's design, where Mark Norder and Jennifer Schrader modified the design and laid it out for production. The first half of the machine, with 256,000 processors, was run for the first time on 2 March 1995.
- http://www.techagreements.com/agreement-preview.aspx?num=121632 CCC Annual return to end 1994
- http://www.secinfo.com/dsVQy.a1u4.htm CCC 10Q May 1995
- Ken Iobst et al, "Processing in Memory: The Terasys Massively Parallel PIM Array", Computer, Volume 28 Issue 4 (April 1995)
- Norris Parker Smith, "Seymour & NSA Revive SIMD as Thinking Machines Buried", Supercomputing Review , 26 August 1994
- "Cray Computer Corp. Completes Initial Demonstration of the Cray-3 Super Scalable System", Cray Computer press release, 7 March 1995