Super Harvard Architecture Single-Chip Computer
|This article relies on references to primary sources. (September 2010)|
The Super Harvard Architecture Single-Chip Computer (SHARC) is a high performance floating-point and fixed-point DSP from Analog Devices. SHARC is used in a variety of signal processing applications ranging from single-CPU guided artillery shells to 1000-CPU over-the-horizon radar processing computers. The original design dates to about January 1994.
SHARC processors are or were used because they have offered good floating-point performance per watt.
SHARC processors are typically intended to have a good number of serial links to other SHARC processors nearby, to be used as a low-cost alternative to SMP.
The SHARC is a Harvard architecture word-addressed VLIW processor; it knows nothing of 8-bit or 16-bit values since each address is used to point to a whole 32-bit word, not just a byte. It is thus neither little-endian nor big-endian, though a compiler may use either convention if it implements 64-bit data and/or some way to pack multiple 8-bit or 16-bit values into a single 32-bit word. Analog Devices chose to avoid the issue by using a 32-bit char in their C compiler.
The word size is 48-bit for instructions, 32-bit for integers and normal floating-point, and 40-bit for extended floating-point. Code and data are normally fetched from on-chip memory, which the user must split into regions of different word sizes as desired. Small data types may be stored in wider memory, simply wasting the extra space. A system that does not use 40-bit extended floating-point might divide the on-chip memory into two sections, a 48-bit one for code and a 32-bit one for everything else. Most memory-related CPU instructions can not access all the bits of 48-bit memory, but a special 48-bit register is provided for this purpose. The special 48-bit register may be accessed as a pair of smaller registers, allowing movement to and from the normal registers.
Off-chip memory can be used with the SHARC. This memory can only be configured for one single size. If the off-chip memory is configured as 32-bit words to avoid waste, then only the on-chip memory may be used for code execution and extended floating-point. Operating systems may use overlays to work around this problem, transferring 48-bit data to on-chip memory as needed for execution. A DMA engine is provided for this. True paging is impossible without an external MMU.
The SHARC has a 32-bit word-addressed address space. Depending on word size this is 16 GB, 20 GB, or 24 GB.
SHARC instructions may contain a 32-bit immediate operand. Instructions without this operand are generally able to perform two or more operations simultaneously. Many instructions are conditional, and may be preceded with "if condition " in the assembly language. There are a number of condition choices, similar to the choices provided by the x86 flags register.
There are two delay slots. After a jump, two instructions following the jump will normally be executed.
The SHARC processor has built-in support for loop control. Up to 6 levels may be used, avoiding the need for normal branching instructions and the normal bookkeeping related to loop exit.
The SHARC has two full sets of general-purpose registers. Code can instantly switch between them, allowing for fast context switches between an application and an OS or between two threads.
The Mercury multicomputer
Mercury Computer Systems sold a SHARC-based product that used byte addressing. To get this, the compiler was modified to transform addresses. To access memory, the compiler would shift the high 30 bits down by 2 bits. For a load, the compiler would need to rotate the obtained value according to the original low 2 address bits. The compiler would then mask off the higher data bits as needed to obtain the desired 8-bit or 16-bit value. Stores would require loads to be performed so that a masking and merging operation could be performed, much as is done by normal compilers to deal with bit fields. There were some additional complexities related to the desire to avoid low memory addresses (which would allow a NULL pointer to scramble critical motherboard registers) and the desire to avoid this overhead for 32-bit and 64-bit values. Both big-endian and little-endian layouts were possible.
Mercury also implemented 64-bit floating-point in software. They used 40-bit floating-point in the on-chip memory, with interrupts disabled to avoid corrupting the 40-bit registers via storage in 32-bit memory.
Together, these choices allowed Mercury to offer a compiler that would lay out data structures in a way that was fully compatible with their Intel i860 (big-endian) and PowerPC (either big-endian or little-endian) offerings. It was in fact possible to install all three types of CPU in a single shared-memory system, with a distributed OS running on all CPUs.