Metacompiler

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Metacompilers are a subset of a specialized class of compiler writing tools called compiler-compilers. The feature that sets a metacompiler apart from a standard compiler-compiler is that a metacompiler is written in its own language and translates itself.

Metacompilers are not only useful for generating parsers and code generators, they are also useful for generating a wide range of other software engineering and analysis tools.[1]

Besides being useful for parsing domain-specific languages, a metacompiler is itself a prime example of a domain-specific language, designed for the domain of compiler writing.

A metacompiler is defined by a set of grammar productions defining itself, written in its own specialized language. The metacompiler translates this grammar definition into the executable form of itself. Usually the grammar reduction rules are intermixed with semantic translation rules. Defining itself and translating itself this way constitutes the meta-step that sets a metacompiler apart from other compiler-compilers.

Since the metacompiler defines and translates itself, the question arises as to how it is initially created (a chicken and egg problem). This is solved in one of two ways: by cross-compiling or by bootstrapping. Cross-compiling involves translating the new metacompiler using some other compiler or metacompiler running on some other platform. This is similar to how you make more sourdough starter.

Bootstrapping, the other method, is an elegant (and usually mind-bending) process whereby the metacompiler is defined in progressively sophisticated stages and "bootstraps" itself. The first version of the metacompiler translation is executed by hand. That is, the programmer pretends to be the metacompiler, parsing its rules and generating the code just as the metacompiler would do if it existed, which it doesn't, at least not at that first stage. Once the initial metacompiler is up and running, in a simple embryonic form, the full-strength metacompiler is created by successively defining and translating more sophisticated versions of itself. That is, at each succeeding stage, version n of the metacompiler is used to generate its successor, version n+1.

A runtime module consisting of support functions required for the translation process usually rounds out the full metacompiler package. This would include input/output, symbol table, and string processing functions.

Historical[edit]

Metacompilers have played a significant role in both computer science and the build-up of the computer industry. Initial metacompilers included Meta-II[2] and its descendant TreeMeta.[3]

A MetaII Tutorial provides an online way to learn about MetaII.

Information about later descendants of these metacompilers is not generally available. With the resurgence of domain-specific languages and the need for parser generators which are easy to use, easy to understand, and easy to maintain, metacompilers are becoming a valuable tool for advanced software engineering projects.

The ideas about grammars, self-generation and extensions are widely used by Program transformation systems.

CWIC[edit]

In 1969-1970, Erwin Book, Val Schorre, and Steve Sherman developed the Compiler for Writing and Implementing Compilers (CWIC) at System Development Corporation.[4] CWIC includes three languages:

  • Syntax language: a language that specified the valid syntax of the language to be compiled. It is similar to Backus-Naur Form, but adds operators for creating the parse tree.
  • MOL-360: a language for writing the underlying support library. MOL-360 is similar to C but lacks the concept of formal structures and typed pointers.
  • Generators language: a language for translating the parse tree into IBM 360 machine language (or some other binary or character-oriented form).[4]

Generators Language[edit]

Generators Language had semantics similar to Lisp. The parse tree was thought of as a recursive list. The general form of a Generators Language function is:

  function-name(first-tree-description) => code-to-process-that-tree
               (second-tree-description) => code-to-process-that-tree
               (third-tree-description) => code-to-process-that-tree
               ...

The code to process a given tree included the features of a general purpose programming language, plus a form: <stuff>, which would emit (stuff) onto the output file. For example:

  process-arithmetic(ADD[x,y]) => r1 = getreg(); r2 = getreg();
                                  load(r1, x); load(r2, y);
                                  <AR + (r1*16)+r2;>
                                  popreg(); popreg();
                    (MUL[x,y])=>...

That is, if the parse tree looks like ('ADD' (something) (something)), it would generate the 360 "AR" (Add Register) instruction with the appropriate values in general registers. The pattern matching in a real compiler would be more complex than this example, because it would include special code to handle the situation where X or Y was the name of a location in memory, or where X or Y was already in a general register because of previous instructions[4]

See also[edit]

References[edit]

  1. ^ Neighbors, J.M. Software Construction using Components. Technical Report 160, Department of Information and Computer Sciences, University of California, Irvine, 1980.
  2. ^ Shorre, D.V., META II a syntax-oriented compiler writing language, Proceedings of the 1964 19th ACM National Conference, pp. 41.301-41.3011, 1964
  3. ^ C. Stephen Carr, David A. Luther, Sherian Erdmann, The TREE-META Compiler-Compiler System: A Meta Compiler System for the Univac 1108 and General Electric 645, University of Utah Technical Report RADC-TR-69-83.
  4. ^ a b c Book, Erwin; Dewey Val Schorre; Steven J. Sherman (June 1970). "The CWIC/36O system, a compiler for writing and implementing compilers". ACM SIGPLAN Notices 5 (6): 11–29. doi:10.1145/954344.954345.