C.mmp

From Wikipedia, the free encyclopedia
Jump to: navigation, search
The C.mmp memory unit, with three racks visible, including the front panel of the crossbar switch.

The C.mmp was an early MIMD multiprocessor system developed at Carnegie Mellon University by William Wulf (1971). The notation C.mmp came from the PMS notation of Bell and Newell, where a CPU was designated as C and a variant was noted by the dot notation; mmp stood for Multi-Mini-Processor

Sixteen PDP-11 minicomputers were used as the processing elements (named Compute Modules in the system). Each CM had a local memory of 8K and a local set of peripheral devices. One of the challenges was that a device was only available through its unique connected processor, so the I/O system (designed by Roy Levin) hid the connectivity of the devices and routed the requests to the hosting processor. If a processor went down, the devices connected to its Unibus became unavailable, which became a problem in overall system reliability. Processor 0 (the boot processor) had the disk drives attached.

Each of the Compute Modules shared these communication pathways:

  • An Interprocessor bus - used for distribution of system-wide clock, interrupt and process control messaging among the CMs
  • A 16x16 crossbar switch - used to connect the 16 CMs on one side and 16 banks of shared memory on the other. If all 16 processors were accessing different banks of memory, the memory accesses would all be concurrent. If two or more processors were trying to access the same bank of memory, one of them would be granted access on one cycle and the remainder would be negotiated on subsequent memory cycles.

Since the PDP-11 only had an address space of 16-bits, an additional address translation unit was added to expand the address space to 25 bits for the shared memory space. The UniBus architecture provided 18 bits of address, and the two high-order bits were used to select one of four relocation registers which selected a bank of memory. Proper management of the relocation registers was one of the challenges of programming the operating system kernel.

The original C.mmp design used magnetic core memory, but during its lifetime, higher performance dynamic RAM became available and the system was upgraded.

The original processors were PDP-11/20 processors, but in the final system, only five of these were used; the remaining 11 were PDP-11/40 processors, which were modified by having additional writeable microcode space. All modifications to these machines were designed and built at CMU.

Most of the 11/20 modifications were custom changes to the motherboard, but because the PDP-11/40 was implemented in microcode, a separate "proc-mod" board was designed that intercepted certain instructions and implemented the protected operating system requirements. For example, it was necessary, for operating system integrity, that the stack pointer register never be odd. On the 11/20, this was accomplished by clipping the lead to the low-order bit of the stack register. On the 11/40, any access to the stack was intercepted by the proc-mod board and generated an illegal data access trap if the low-order bit was 1.

The operating system was called HYDRA. It was an capability-based object-oriented multi-user operating system. System resources were represented as objects and protected through capabilities.

Among the programming languages available on this system was a subset of ALGOL 68. This language was in fact more a superset than a subset, as the features supporting parallelism were vastly improved, to make good use of the C.mmp. The operating system and most applications were written in the language Bliss-11, which required cross-compilation from a PDP-10. The Algol-68 compiler ran native on the Hydra operating system. There was very little assembly code used in the operating system.

Because overall system reliability depended on having all 16 CPUs running, there were serious problems with overall hardware reliability; if the MTBF of one processor was 24 hours, then the overall system reliability was 16/24 hours, or about 40 minutes. Many of these failures were due to timing glitches in the many custom circuits added to the processors. Considerable effort was expended in getting the hardware reliability improved, and when a processor was noticeably failing, it was partitioned out, and would run diagnostics for several hours. When it had passed a first set of diagnostics, it was partitioned back in as an "I/O processor" and would not run application code (but its peripheral devices were now available); it continued to run diagnostics. If it passed these after several more hours, it was reinstated as a full member of the processor set. Similarly, if a block of memory (one page) was detected as faulty, it was removed from the pool of available pages, and until otherwise notified, the operating system would ignore this page. Thus, the operating system became an early example of a fault-tolerant system, able to deal with hardware problems which would inevitably arise.

External links[edit]