This article needs additional citations for verification. (April 2016) (Learn how and when to remove this template message)
|Designed by||Simon Peyton Jones and Norman Ramsey|
|Typing discipline||static, weak|
C-- (pronounced cee minus minus) is a C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-level languages rather than written by human programmers. Unlike many other intermediate languages, its representation is plain ASCII text, not bytecode or another binary format.
There are two main branches of C--. One is the original C-- branch, with the final version 2.0 released in May 2005. The other is the Cmm fork actively used by the Glasgow Haskell Compiler as its intermediate representation.
C-- is a "portable assembly language", designed to ease the task of implementing a compiler which produces high quality machine code. This is done by having the compiler generate C-- code, delegating the harder work of low-level code generation and optimisation to a C-- compiler.
Work on C-- began in the late 1990s. Since writing a custom code generator is a challenge in itself, and the compiler back ends available to researchers at that time were complex and poorly documented, several projects had written compilers which generated C code (for instance, the original Modula-3 compiler). However, C is a poor choice for functional languages: it does not guarantee tail call optimization, or support accurate garbage collection or efficient exception handling. C-- is a simpler, tightly-defined alternative to C which does support all of these things. Its most innovative feature is a run-time interface which allows writing of portable garbage collectors, exception handling systems and other run-time features which work with any C-- compiler.
The language's syntax borrows heavily from C. It omits or changes standard C features such as variadic functions, pointer syntax, and aspects of C's type system, because they hamper certain essential features of C-- and the ease with which code-generation tools can produce it.
The name of the language is an in-joke, indicating that C-- is a reduced form of C, in the same way that C++ is basically an expanded form of C. (In C-like languages, "--" and "++" are operators meaning "decrement" and "increment".)
The first version of C-- was released in April 1998 as a MSRA paper, accompanied by a January 1999 paper on garbage collection. A revised manual was posted in HTML form in May 1999. Two sets of major changes proposed in 2000 by Norman Ramsey ("Proposed Changes") and Christian Lindig ("A New Grammar") lead to C-- version 2, which was finalized around 2004 and officially released in 2005.
The C-- type system is deliberately designed to reflect constraints imposed by hardware rather than conventions imposed by higher-level languages. In C--, a value stored in a register or memory may have only one type: bit vector. However, bit vector is a polymorphic type and may come in several widths, e.g., bits8, bits32, or bits64. A separate 32-or-64 bit family of floating-point types is supported. In addition to the bit-vector type, C-- also provides a Boolean type bool, which can be computed by expressions and used for control flow but cannot be stored in a register or in memory. As in an assembly language, any higher type discipline, such as distinctions between signed, unsigned, float, and pointer, is imposed by the C-- operators or other syntactic constructs in the language.
C-- version 2 removes the distinction between bit-vector and floating-point types. Programmers are allowed to annotate these types with a string "kind" tag to distinguish, among other things, a variable's integer vs float typing and its storage behavior (global or local). The first part is useful on targets that have separate registers for integer and floating-point values. In addition, special types for pointers and the native word is introduced, although all they do is mapping to a bit vector with a target-dependent length.:10 C-- is not type-checked, nor does it enforce or check the calling convention.:28
The specification page of C-- lists a few implementations of C--. The "most actively developed" compiler, Quick C--, was abandoned in 2013.
A C-- dialect called Cmm is the intermediate representation for the Glasgow Haskell Compiler. GHC backends are responsible for further transforming C-- into executable code, via LLVM IR, slow C, or directly through the built-in native backend.
Some of the developers of C--, including Simon Peyton Jones, João Dias, and Norman Ramsey, work or have worked on the Glasgow Haskell Compiler. Work on GHC has also led to extensions in the C-- language, forming the Cmm dialect. Cmm uses the C preprocessor for ergonomics.
Despite the original intention, GHC does perform many of its generic optimizations on C--. As with other compiler IRs, GHC allows for dumping the C-- representation for debugging. Target-specific optimizations are, of course, performed later by the backend.
- Nordin, Thomas; Jones, Simon Peyton; Iglesias, Pablo Nogueira; Oliva, Dino (1998-04-23). "The C– Language Reference Manual". Cite journal requires
- Reig, Fermin; Ramsey, Norman; Jones, Simon Peyton (1999-01-01). "C–: a portable assembly language that supports garbage collection". Cite journal requires
- Ramsey, Norman; Jones, Simon Peyton. "The C-- Language Specification, Version 2.0" (PDF). Retrieved 11 December 2019.
- GHC Commentary: What the hell is a .cmm file?
- Nordin, Thomas; Jones, Simon Peyton; Iglesias, Pablo Nogueira; Oliva, Dino (1999-05-23). "The C– Language Reference Manual".
- "C-- Downloads". www.cs.tufts.edu. Retrieved 11 December 2019.
- "An improved LLVM backend".
- GHC Backends
- Debugging compilers with optimization fuel