Stream Processors, Inc
|Headquarters||Sunnyvale, California, United States|
|Key people||Bill Dally, Co-Founder and ex-Chairman|
|Products||Digital Signal Processor|
|Employees||Approximately 100 (2007)|
Stream Processors, Inc was a Silicon Valley-based fabless semiconductor company specializing in the design and manufacture of high-performance digital signal processors for applications including video surveillance, multi-function printers and video conferencing. The company ceased operations in 2009.
Foundational work in stream processing was initiated in 1995 by a research team led by MIT professor Bill Dally. In 1996, he moved to Stanford University where he continued this work, receiving a multi-million dollar grant from DARPA with additional resources from Intel and Texas Instruments to fund the development of a project called "Imagine" - the first stream processor chip and accompanying compiler tools.
The Imagine Project
The goal of the Imagine project was to develop a C programmable signal and image processor intended to provide both the performance density and efficiency of a special-purpose processor (such as a hard-wired ASIC). The project successfully demonstrated the advantages of stream processing. Details on the Imagine project and its results are posted on the Stanford Imagine project page. The work also showed that a number of applications ranging from wireless baseband processing, 3D graphics, encryption, IP forwarding to video processing could take advantage of the efficiency of stream processing. This research inspired other designs such as GPUs from ATI Technologies as well as the Cell microprocessor from Sony, Toshiba, and IBM.
The main deliverables from the Imagine program included:
- The Imagine Stream Architecture
- The Stream programming model
- Software development tools
- Programmable graphics and real-time media applications
- VLSI prototype (fabricated by TI)
- Stream processor development platform (a prototype development board)
Dally, together with other team members, obtained a license from Stanford to commercialize the resulting technology. Stream Processors, Incorporated (SPI) was incorporated in California in 2004. Professor Dally remained at Stanford and the company hired industry veteran Chip Stearns  to become the President and CEO in December of that year. Through June, 2006 SPI has been able to raise a total of $26M from a trio of notable venture capital firms - Austin Ventures, Norwest Venture Partners and the Woodside Fund.
In January 2009 Co-Founder Prof. Bill Dally accepted a position as Chief Scientist with NVIDIA Corporation. At the same time he resigned as chairman. In an interview Dally reflected on his experiences with startups: " I have done several chip startups myself. It’s getting hard. The ante is very high. If you do a chip startup, you need patient investors with very deep pockets. It’s many tens of millions of dollars to get to a first product and $50 million to get to profits. That’s very difficult to do because investors want an exit some multiple over that investment. I am hoping we return to the days of frequent IPOs and get beyond the fire-sale acquisitions. That’s not what you can see right now. If it’s a programmable chip, the cost is even more."
In September 2009 the company ceased operations.
Similar to graphics and scientific computing, media and signal processing are characterized by available data-parallelism, locality and a high computation to global memory access ratio. Stream processing exploits these characteristics using data-parallel processing fed by a distributed memory hierarchy managed by the compiler. The main challenge for next generation massively parallel processors is data bandwidth, not computational resources. Unlike most conventional processors, the technology does not rely on a hardware cache - instead data movement is explicitly mananged by the compiler and hardware.
The execution model is based on accelerating performance-critical functions (kernels) that process and produce data records (streams). Kernels and streams are scheduled at compile-time and moved to on-chip memory at runtime via a scoreboard. The compiler analyses data live times of streams to optimize allocation and minimize external memory bandwidth needs. Streams and kernels loads can overlap with execution to improve latency tolerance and the explicit data movement provides predictable performance. There are no CPU cache misses and the design presents a single-core model to the programmer – data-parallelism is within the kernels.
The architecture includes a host CPU (System MIPS) for system-level tasks and a DSP Coprocessor Subsystem where the DSP MIPS runs the main threads that make kernel function calls to the Data Parallel Unit (DPU). For users that use libraries, and don’t intend to develop DSP code, the architecture is a MIPS-based system-on-a-chip with an API to a “black box” coprocessor. The DPU Dispatcher receives kernel function calls to manage runtime kernel and stream loads. One kernel at a time is executed across the lanes, operating on local stream data stored in the Lane Register File of each lane. Each lane has a set of VLIW ALUs and distributed operand register files (ORF) allow for a large working data set and processing bandwidth exceeding 1 TeraByte/s. The Stream Load/Store Unit provides gather/scatter with a wide variety of access patterns. The InterLane Switch is a compiler-scheduled, full crossbar for high-speed access between lanes.
SPI’s RapiDev Tools Suite leverages the predictability of stream processing to provide a fast path to optimized results using C programming. Starting with C reference code, the Fast Functional Debugger (FFD) library plugs into standard tools, such as Microsoft Visual Studio and GNU, and simulates the DPU to support re-structuring code to kernels and streams. Because kernels are statically scheduled and data movement is explicit, DPU cycle-accuracy can be obtained even at this functional high level. This is one source of the predictability of the architecture. For targeting code to the device, the Stream Processor Compiler (SPC) generates the VLIW executable and pre-processed C code that is compiled/linked via standard GCC for MIPS. SPC allocates streams in the Lane Register Files and provides dependency information for the kernel function calls. Software pipelining and loop unrolling are supported. Branch penalties are avoided by predicated selects and larger conditionals use conditional streams. Running under Eclipse, the Target Code Simulator provides comprehensive Host or Device binary code simulation with breakpoint and single-stepping capabilities with bandwidth and load statistics. A kernel view shows the VLIW pipeline for kernel optimizations, and a stream view shows kernel execution and stream loads to review global data movement for system profiling.
SPI currently markets its Storm-1 family, that includes four fully software programmable DSPs of varying performance levels.
Note: GMACS stands for Giga (billions of) Multiply-Accumulate operations per Second, a common measure of DSP performance.
Support hardware and software
- The RapiDev tools suite delivers a fast, predictable path to optimized results, eliminating the complexities of assembly coding or manual cache management
- The Storm-1 DevKit is a PCI-based software development platform
- IP Camera Reference Design runs standard Linux 2.6 and supports multiple simultaneous codecs (e.g. H.264, MPEG-4 and MJPEG), arbitrary resolutions, CMOS and CCD sensor processing as well as video analytics in a fully software programmable platform
- Video Streamer Reference Design supports eight 4CIF input channels of video compressed to H.264 and a Gigabit Ethernet output