Metacompiler

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Metacompilers are a subset of a specialized class of compiler writing tools called compiler-compilers. The feature that sets a metacompiler apart from a standard compiler-compiler is that a metacompiler is written in its own language and translates itself.

Metacompilers are not only useful for generating parsers and code generators, they are also useful for generating a wide range of other software engineering and analysis tools.[1]

Besides being useful for parsing domain-specific languages, a metacompiler is itself a prime example of a domain-specific language, designed for the domain of compiler writing.

A metacompiler is defined by a set of grammar productions defining itself, written in its own specialized language. The metacompiler translates this grammar definition into the executable form of itself. Usually the grammar reduction rules are intermixed with semantic translation rules. Defining itself and translating itself this way constitutes the meta-step that sets a metacompiler apart from other compiler-compilers.

Since the metacompiler defines and translates itself, the question arises as to how it is initially created (a chicken and egg problem). This is solved in one of two ways: by cross-compiling or by bootstrapping. Cross-compiling involves translating the new metacompiler using some other compiler or metacompiler running on some other platform. This is similar to how you make more sourdough starter.

Bootstrapping, the other method, is an elegant (and usually mind-bending) process whereby the metacompiler is defined in progressively sophisticated stages and "bootstraps" itself. The first version of the metacompiler translation is executed by hand. That is, the programmer pretends to be the metacompiler, parsing its rules and generating the code just as the metacompiler would do if it existed, which it doesn't, at least not at that first stage. Once the initial metacompiler is up and running, in a simple embryonic form, the full-strength metacompiler is created by successively defining and translating more sophisticated versions of itself. That is, at each succeeding stage, version n of the metacompiler is used to generate its successor, version n+1.

A runtime module consisting of support functions required for the translation process usually rounds out the full metacompiler package. This would include input/output, symbol table, and string processing functions.

Historical[edit]

Metacompilers have played a significant role in both computer science and the build-up of the computer industry. Initial metacompilers included Meta-II[2] and its descendant TreeMeta.[3]

A MetaII Tutorial provides an online way to learn about MetaII.

Information about later descendants of these metacompilers is not generally available. With the resurgence of domain-specific languages and the need for parser generators which are easy to use, easy to understand, and easy to maintain, metacompilers are becoming a valuable tool for advanced software engineering projects.

The ideas about grammars, self-generation and extensions are widely used by Program transformation systems.

CWIC[edit]

In 1969-1970, Erwin Book, Val Schorre, and Steve Sherman developed the Compiler for Writing and Implementing Compilers (CWIC) at System Development Corporation.[4] CWIC includes three languages:

  • Syntax language: a language that specified the valid syntax of the language to be compiled. It is similar to Backus-Naur Form, but adds operators for creating the parse tree.
  • MOL-360: a language for writing the underlying support library. MOL-360 is similar to C but lacks the concept of formal structures and typed pointers.
  • Generators language: a language for translating the parse tree into IBM 360 machine language (or some other binary or character-oriented form).[4]

Generators Language[edit]

Generators Language had semantics similar to Lisp. The parse tree was thought of as a recursive list. The general form of a Generators Language function is:

  function-name(first-unparse_rule) => first-production_code_generator
                   (second-unparse_rule) => second-production_code_generator
                   (third-unparse_rule) => third-production_code_generator
               ...

The code to process a given tree included the features of a general purpose programming language, plus a form: <stuff>, which would emit (stuff) onto the output file. A generator call may be used in the unparse_rule. The generator is passed the element of unparse_rule pattern in which it is placed and returns values are listed in (). For example:

  expr_gen(ADD[expr_gen(x),expr_gen(y)]) =>
                                <AR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (SUB[expr_gen(x),expr_gen(y)])=>
                                <SR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (MUL[expr_gen(x),expr_gen(y)])=>
                              .
                              .
                              .
               (x)=>     r1 = getreg();
                            load(r1, x);
                            return r1;
...

That is, if the parse tree looks like ('ADD' (something1) (something2)), expr_gen(x) would be called with something1 and return x. A variable in the unparse rule is a local variable that can be used in the production_code_generator. expr_gen(y) is called with something2 and returns y. Of note here is a generator call in an unparse rule is passed the element in the position it occupies. Hopefully in the above x and y will be registers on return. The last transforms is intended to load an atomic into a register and return the register. The first production would be used to generate the 360 "AR" (Add Register) instruction with the appropriate values in general registers. The above example is only a part of a generator.

See also[edit]

References[edit]

  1. ^ Neighbors, J.M. Software Construction using Components. Technical Report 160, Department of Information and Computer Sciences, University of California, Irvine, 1980.
  2. ^ Shorre, D.V., META II a syntax-oriented compiler writing language, Proceedings of the 1964 19th ACM National Conference, pp. 41.301-41.3011, 1964
  3. ^ C. Stephen Carr, David A. Luther, Sherian Erdmann, The TREE-META Compiler-Compiler System: A Meta Compiler System for the Univac 1108 and General Electric 645, University of Utah Technical Report RADC-TR-69-83.
  4. ^ a b Book, Erwin; Dewey Val Schorre; Steven J. Sherman (June 1970). "The CWIC/36O system, a compiler for writing and implementing compilers". ACM SIGPLAN Notices 5 (6): 11–29. doi:10.1145/954344.954345.