The ZPU is a relatively recent stack machine with a small economic niche, and it has a growing number of users and implementations. It has been designed to require very small amounts of electronic logic, making more electronic logic available for other purposes in the FPGA. To make it easily usable, it has a port of the GNU Compiler Collection. This makes it much easier to apply than CPUs without compilers. Sacrificing speed in exchange for small size, it keeps the intermediate results of calculations in memory, in a push-down stack, rather than in registers.
Zylin Corp. made the ZPU open-source in 2008.
Many electronic projects include electronic logic in an FPGA. It's wasteful to also have a microprocessor, so it is commonplace to add a CPU to the electronic logic in the FPGA. Often, a smaller, less-expensive FPGA could be used if only the CPU used less resources. This is the exact situation that the ZPU was designed to address.
The ZPU is designed to handle the miscellaneous tasks of a system that are best handled by software, for example, a user interface. The ZPU is very slow, but its small size helps to place any needed high-speed algorithm in the FPGA.
Another issue is that most CPUs for FPGAs are closed-source, available only from a particular maker of FPGAs. Occasionally a project needs to have a design that can be widely distributed, for security inspections, educational uses or other reasons. The licenses on these proprietary CPUs can prevent these uses. The ZPU is open-sourced.
Some projects need code that must be small, but run on a CPU that inherently has larger code. Alternatively, a project may benefit from the wide selection of code, compilers and debugging tools for the GNU Compiler Collection. In these cases, an emulator can be written to implement the ZPU's instruction set on the target CPU, and the ZPU's compilers can be used to produce the code. The resulting system is slow, but packs code into less memory than many CPUs and enables the project to use a wide variety of compilers and code.
The ZPU was designed explicitly to minimize the amount of electronic logic. It has a minimal instruction set, yet can be encoded for the GNU Compiler Collection. It also minimizes the number of registers that must be in the FPGA, minimizing the number of flip-flops. Instead of registers, intermediate results are kept on the stack, in memory.
It also has small code, saving on memory. Stack machine instructions do not need to contain register IDs, so the ZPU's code is smaller than other RISC CPUs, said to need only about 80% of the space of ARM Holdings Thumb2. For example, the signed immediate helps the ZPU store a 32-bit value in at most 5 bytes of instruction space, and as little as one. Most RISC CPUs require at least eight bytes.
Finally, about 2/3 of its instructions can be emulated by firmware implemented using the other 1/3 "required" instructions. Although the result is very slow, the resulting CPU can require as little as 446 lookup-tables (a measure of FPGA complexity, roughly equivalent to 1700 electronic logic gates).
The ZPU has a reset vector, consisting of 32-bytes of code space starting at location zero. It also has a single edge-sensitive interrupt, with a vector consisting of 32 bytes of code space beginning at address 32. Vectors 2 through 63 each have 32 bytes of space, but are reserved for code to emulate instructions 33 through 63.
The base ZPU has a 32-bit data path. The ZPU also has a variant with a 16-bit-wide data path, to save even more logic.
Tools and resources
The ZPU has a well-tested port of the GNU Compiler Collection. Enthusiasts and firmware engineers have ported ECos, FreeRTOS and μClinux. At least one group of enthusiasts have copied the popular development environment of the Arduino and adapted it to the ZPU.
There are now multiple models of the ZPU core. Besides the original Zylin cores, there are also the ZPUino cores, and the ZPUFlex core. The Zylin core is designed for a minimal FPGA footprint, and includes a 16-bit version. The ZPUino has practical improvements for speed, can replace emulated instructions with hardware, and is embedded in a system-on-chip framework. The ZPUFlex is designed to use external memory blocks and can replace emulated instructions with hardware.
To improve speed, most implementors have implemented the emulated instructions, and added a stack cache. Beyond this, one implementor said that a two-stack architecture would permit pipelining (i.e. improving speed to one instruction per clock cycle), but this might also require compiler changes.
One implementor reduced power usage by 46% with a stack cache and automated insertion of clock gating. The power usage was then roughly equivalent to the small open-source Amber core, which implements the ARM v2a architecture.
The parts of the ZPU that would be most aided by fault-tolerance are the address bus, stack pointer and program counter.
"TOS" is an abbreviation of the "Top Of Stack." "NOS" is an abbreviation of the "Next to the top Of Stack."
|BREAKPOINT||00000000||Halt the CPU and/or jump to the debugger.|
|IM_x||1xxxxxxx||Push or append a signed 7-bit immediate to the TOS.|
|STORESP_x||010xxxxx||Pop the TOS and store it into the stack at an offset from the top.|
|LOADSP_x||011xxxxx||Fetch from a value indexed in the stack and push it into the TOS.|
|EMULATE_x||001xxxxx||Emulate an instruction with code at vector x.|
|ADDSP_x||0001xxxx||Fetch from a value indexed in the stack and add the value to the TOS.|
|POPPC||00000100||Pop an address from the TOS and store it to the PC.|
|LOAD||00001000||Pop an address and push the loaded memory value to the TOS.|
|STORE||00001100||Store the NOS into the memory pointed-to by the TOS. Pop both.|
|PUSHSP||00000010||Push the current SP into the TOS.|
|POPSP||00001101||Pop the TOS and store it to the SP.|
|ADD||00000101||Integer addition of TOS and NOS.|
|AND||00000110||Bitwise AND of the TOS and NOS.|
|OR||00000111||Bitwise OR of the TOS and NOS.|
|NOT||00001001||Bitwise NOT of the TOS.|
|FLIP||00001010||Reverse the bit order of the TOS.|
|NOP||00001011||No-Operation. (Usually used for delay loops or tables of code.)|
Code points 33 to 63 may be emulated by code in vectors 2 through 32: LOADH and STOREH (16-bit memory access), LESSTHAN (comparisons set 1 for true, 0 for false), LESSTHANOREQUAL, ULESSTHAN, ULESSTHANOREQUAL, SWAP (TOS with NOS), MULT, LSHIFTRIGHT, ASHIFTLEFT, ASHIFTRIGHT, CALL, EQ, NEQ, NEG, SUB, XOR, LOADB and STOREB (8-bit memory access), DIV, MOD, EQBRANCH, NEQBRANCH, POPPCREL, CONFIG, PUSHPC, SYSCALL, PUSHSPADD, HALFMULT, CALLPCREL
- "ZPU - the worlds smallest 32-bit CPU with GCC toolchain : Overview". opencores.org. opencores.org, Zylin Corp. Retrieved 7 February 2015.
- Hennesy, John L.; Patterson, David A. (2012). Computer Architecture, A Quantitative Approach (5th ed.). Waltham, MA: Elsevier. ISBN 978-0-12-383872-8.
- Asanovic, Krste. "RISC-V: An Open Standard for SoCs". EE Times. Universal Business Media. Retrieved 7 February 2015.
- "ZOG - A ZPU processor core for Propeller with GNU C + Fortran". Parallax Forum. Parallax. Retrieved 7 February 2015.
- Antonio, Anton. "ZPUino-HDL/zpu/sw/freertos/". GitHub. Antonio Anton. Retrieved 7 February 2015.
- Lopes, Alvaro. "alvieboy/Linux". GitHub. Alvaro Lopes. Retrieved 7 February 2015.
- Lopes, Alvaro. "ZPUino". www.alvie.com. Retrieved 7 February 2015.
- AMR. "ZPU Flex". Retro Ramblings. Retrieved 9 February 2015.
- Eriksen, Stein Ove. "Low Power microcontroller core". NTNU Open. Norges teknisk-naturvitenskapelige universitet. Retrieved 9 February 2015.
- Zandrahimi, M. "An analysis of fault effects and propagations in ZPU: The world's smallest 32 bit CPU". IEEE Explore. IEEE. Retrieved 9 February 2015.