From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.


This article is a perfect example of an explanation written by and for people who already know what the subject is in detail. I am a tech writer specializing in software engineering and even though I have more than just a few notions of what this is about I am struggling to make sense of this. What it is missing is real world anchors - e.g. "A compiler-compiler or compiler generator is a tool" should read "A compiler-compiler or compiler generator is a computer program" - there are many more examples throughout the text that show that the author assumes that not only are all the readers of the article software engineers but that they all work in the same field. Some readers may be well informed onlookers; others may have no knowledge whatsoever of software engineering but have enough intelligence to understand what compiler-compilers do. Those people are not being catered for in this article. Tolken (talk) 05:57, 28 May 2010 (UTC)


Main description:

The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler.

What was the earliest parser generator? Referance!!

The earliest Compiler-Compilers I can find referenced are:

"A SYNTAX DIRECTED COMPILER FOR ALGOL 60" Edgar T. Irons, Communications of the ACM Volume 4 Issue 1, Jan. 1961.

"A General Translation Program for Phrase Structure Languages" Brooker and Morris, Journal of the ACM (JACM) JACM Homepage archive Volume 9 Issue 1, Jan. 1962

"A compiler-building system developed by Brooker and Morris" Brooker and Morris, Communications of the ACM Volume 7 Issue 7, July 1964

"A general-purpose table-driven compiler" Stephen Warshall and Robert M. Shapiro , AFIPS '64 (Spring) Proceedings of the April 21-23, 1964, spring joint computer conference Found no early parser generators. Maybe just not used back then. The earliest ACM topic is in 1971

Also: The compiler-compiler, a seminal idea first presented at a British Computer Society Conference in July 1960 by Brooker and Morris.

As far as I know the parser generators separated for a compiler compiler system didn't appear until the mid to late 60's

--Steamerandy (talk) 20:04, 1 November 2014 (UTC)--


I think it might be useful to merge this article (Compiler-compiler) with the article on Parsers (subsection "parser generator"). (comment by Netizen 2003-10-31)

Compiler-Compiler vs. Metacompiler[edit]

Are "metacompiler" and "compiler-compiler" synonyms? I'm not sure, but I believe that a metacompiler is a selfcompiler, i.e. it can compile itself, so it's not equal to a "compiler-compiler". --surueña 10:14, 2005 Jun 18 (UTC)

there is no such term as a "selfcompiler" [1], [2]
All compilers can compile themselves, since their grammar is deterministic/complete. By definition of the term "meta-" (read Gödel, Escher, Bach for a thorough explaination) a meta-compiler is a compiler-compiler and vise-versa. yacc is a meta-compiler. A meta-meta-compiler would be a compiler that creates compilers that compile something - dynamic/on-the-fly creation of compiler modules from the BNF/ontological syntax of a language/ontology included within a file, which in turn is used to compile the actual file.
in short: Message, within a message, within the bottle. Project2501a 11:05, 18 Jun 2005 (UTC)
Yes, it seems that "selfcompiler" is the name of a specific compiler, but you know what I mean. But not every compiler can compile itself:
  • Some specialized compilers (e.g. a lot of C compilers) only implements a part of the language, so small that they cannot compile themselves.
  • A great bunch of compilers are implemented in another language. For example, when there is no compiler for a given language (every language has a first compiler, and of course it cannot be implemented in the same language), and for that languages which have a bad performance (the compiler would be very slow, so it is written in another language)
In short, a compiler that can compile itself is a property worth noting, and I knew that "type" of compilers as metacompilers. But reading some pages it seems that somebody have another completely different meaning for metacompiling [3]. Do you know any source with a good definition? In my opinion, we shouldn't put that metacompiler is a synonym of compiler-compiler until we found more info about it. --surueña 12:34, 2005 Jun 18 (UTC)
hokay, are we talking computer science here or software engineering? "some specialized compilers" are the exception to the rule as afaik. that's software engineering. specific implementation of a compiler/set of bnf instructions. if we are talking computer science, then this is a non-issue: all compilers can compile themselves.
when there is no compiler for a given language that's what a compiler-compiler/metacompiler is. it's a compiler that will give you your first compiler. it's like a candy-dispencing machine, only you insert BNF for coins.
the page you mention still talks about metacompiling. using Forth to build a meta-compiler, basically to use forth to generate applications in meta mode. Good source with good definition? Knuth! and Hofsader! and yes, a metacompiler is synonym of a compiler-compiler ever since 1969, when yacc came around.
what's your major, anyway? Project2501a 12:58, 18 Jun 2005 (UTC)
I don't understand your point. Actually I don't know whether the percentage of compilers that can build themselves is high or low, but it is a fact that there are compilers that cannot compile themselves. I don't understad the disntiction between CS or SE, but the fact is no every compiler can compile itself. Anyway, if you are sure that "metacompiler" is a synonym of "compiler-compiler" we should put it in the article, with those references. Hope this helps --surueña 08:28, 2005 Jun 20 (UTC)
it's not a matter of percentages, it's a matter of compiler theory. Compiler theory is part of Computer science. if there are some compilers out there in the field, that do not compile themselves due to management/engineering decisions, it's a matter of applied computer science, aka Software engineering. Matter of fact is, the set of BNF instructions that makes up any given computer language, can always compile itself. Try it with a pen an paper. Computer science theory is about the data, as the computer science article says, not about the computer. something like organising information theory into patterns. Project2501a 12:18, 20 Jun 2005 (UTC)
OK, I understand now your point, thanks for the explanation. But although in theory every compiler can compile itself in practice this isn't true, and no Computer Science book can change that... :-) I try to find the name given to that type of compilers... --surueña 13:21, 2005 Jun 20 (UTC)
well, yeah, see, an article on compilers of compilers/metacompilers/how ever you want to call it, falls under computer science. not software engineering, so, in this case, the practice is irrelevant, a note in the article, at best. Project2501a

Actually, hmmm...

Request to move to metacompiler


Zanaq 5 July 2005 21:19 (UTC)Removed the request to move because:
  • The consensus seems to be that a metacompiler and a compiler-compiler might not be the same.
  • The google reveals "compiler-compiler" is the more common term. "Compiler Compiler" 114000 hits, metacompiler 763 hits
  • I personally have heard compiler-compiler occasionaly, but not metacompiler.

Consensus has nothing to do about this, it's a matter of definition, look up your compiler theory book. So, metacompiler shows up less times that compiler-compiler. I will admit that the request to move to metacompiler was a bad idea. compiler-compiler is more popular. but still, char *compiler_compiler; char *metacompiler; compiler_compiler = metacompiler; Project2501a 5 July 2005 22:16 (UTC)

This has been hashed out a lot here. However metacompiler and compiler compiler are not quite the same. Not all compilers are able to compile their self.
In all metacompile documentation I have found the description "A metaacompileris a computer program that takes as input source code written in a metalanguage." Except the FORTH fringe who have idea it means self compiler. However using the metalanguage compiler def FORTH would be a metacompiler. FORTH is simply a language of defined words. Programming in FORTH is defining new words, adding to or extending the language. So FORTH is a programming metalanguage.
The first metacompiler could not compile its self. It toke as input a metalanguage specifying language constructs and produced a program that generated random string conforming to the metalanguage specification.
All documented metacompiler can take as input a program written in a metalanguage and produce a running program. I think this is the difference that separates them from compiler compiler. yacc screwed up the terminologie by calling itself a compiler compiler. It's a matter of computer science history. Because some new author uses termonolgy differently should't make terms concrete. These term should firstly be explained as found in the majority of cases. I would think that the ACM would be one of the best sources for referance.
Originally metacompiler and compiler compiler were the same. When yacc came out that changed. It would have been better to call yacc a metacompiler. Keeping compiler compiler to mean exactly what it states.TREE-META is more powerful then yacc. In 1968 it produced compilers that output assembly code. In 1970 CWIC produced optimizing compilers that output IBM 360 binary machine code. SLIC in 1974 was producing compilers that cold target different processos. Compiler produced ran on a DEC-SYSTEM-10 and produced DEC-10 and TI990 machine code.
What I have found is that all programs documented as a metacompiler take as input a program whose source code is writen in a metalanguage. The metalanguge program is compiled. into executable code. If the metalanguage program is a complete specification a programming language and its translation to machine code the result is a compiler for the specified language.
Compiler compiler has degraded to include parser generators and other such things Of course a metacompiler could produce a parser module.
Not all compilers can compile their self. A logic equation comiler that produces an electronic circuit for instance.
Steamerandy (talk) 12:37, 13 November 2015 (UTC)

METACOMPIER example[edit]

The history section looks a bit confusing, is there any example of a real compiler-compiler?

I am posting this for use here. I developed SLIC. It was based on work done by Val Schorre

TREE-Meta and Mata II are compiler-compilers / translator systems that document their self as metacompilers.

CWIC (Compiler for Writing and Implementing Compilers) is a compiler compiler developed by Val Schorre and Erwin Book of the System Development Corporation. Also documents itself as a metacompiler.

CWIC produced binary IBM 360 executable code. It was a further development of V Schorre the developer of Meta II.

The syntax language was similar to Tree Meta producing a tree structure. But very different in that generator procedures were called to process them. The Generator language was far more powerful having several unique features.

A simple arithmetic expression example in the syntax language.

EXPR = TERM $(('+':ADD / '-':SUB) TERM !2);

TERM = FACTOR $(('*':MPY  / '/':DIV) FACTOR !2);


The syntax language is composed of rules that specific the valid syntax of input. CWIC maintains two stacks. One containing nodes and the other, the parse tree. The ":" operator puts a node on the node stack and the "!" operator combines into a list the top node stack entry and the top n entries of the parse stack.

The EXPR rule defines an expression to be a TERM followed by a series of, zero or more, TERMs separated + OR - OPERATORS. The "$" operator specifies zero or more or the following may be present.

The expression "a + 2*(b+c) + D" would be turned into a tree representation as

ADD[a,ADD[MPY[2.ADD[b,c]],D]] or in list form:

CWIC provide 3 types of rules. Syntax, Token and character class.

A digit character class rule for example:

DIGIT: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

A simple integer number.


MAKENUMBER[] is a call to a library generator function that takes the token parsed in the token buffer, converts it to a numeric value and puts it on the parse stack. In the absence of a generator function as MAKENUMBER the token buffer would be simply entered into the symbol table and placed on the parse stack.

The GENERATOR language was an algorithmic procedural language that emitted 360 machine code.

The CWIC generator language provided pattern matched arguments. For example:

gen(ADD[gen(x),gen(y))) => {...}
   {SUB[gen(x),gen(y)]) => {...}

   integer(x) => return x;
   varable(x) => return x;

A generator consisted of it name (gen above), and pattern productions pairs.

 (<pattern>) => production

The pattern could be simply a varable list or as in the above a specific pattern match. The pattern valadated the input and could brake it apart. The unique feature of function aall in a pattern is that the function is called with that argument and it's parameter list are returned values.

In 1969 I implemented a Mata compiler system called SLIC. SLIC was developed at Cerritos Collage in Norwalk Calif. on a DEC-SYSTEM-10.

It consisted of several special purpose languages. The syntax language much like CWIC and TREEMETA was a rule based language producing tree or list structures. The Generator language emitted PSEUDO code instructions to named section lists. Section were simply a data structure that defined the section parameters and a list containing emitted pseudo instructions. A call to flush a section would sequentially process the sections pseudo code list, calling the pseudo code procedures. The execution of the pseudo code procedures output relocatable machine code.

The MACHOP defined a target machine instruction or instruction class. A MACHOP could define a single instruction using it's mnemonic name or a set of instructions having the same binary format using a vectored entry. In the vectored entry the MACHOP name was instead a variable name preceded by a "#" character. In the vectored entry the MACHOP name was set to the opcode value. A list of opcode mnemonic , value pairs followed the instruction definition.

Machine code was defined in a bit addressable memory space.

An example of a vectored entry MACHOP:

 .MORG 8: H(16): $/8;
  H(8): OP;

  #AAA     0b00110111;
  #AAS     0b00111111;
  #CBW     0b10011000;
  #CLC     0b11111000;
  #CLD     0b11111100;
  #CLI     0b11111010;
  #CMC     0b11110101;
  #CWD     0b10011001;
  #DAA     0b00100111;
  #DAS     0b00101111;
  #HLT     0b11110100;
  #INT3    0b11001100;
  #INTO    0b11001110;
  #IRET    0b11001111
  #LAHF    0b10011111;
  #LEAVE   0b11001001;
  #NOP     0b10010000;
  #POPA    0b01100001;
  #POPF    0b10011101;
  #PUSHA   0b01100000;
  #PUSHF   0b10011100;
  #SAHF    0b10011110;
  #STC     0b11111001;
  #STD     0b11111101;
  #STI     0b11111011;
  #XLAT    0b11010111;

The above is a very simple instruction consisting of only an 8 bit opcode. #OP specifies a vectored entry point. OP id a variable that contains the opcode binary value. The line:

  #AAA     0b00110111;

is an AAA opcode entry. OP would contain the binary value 0b00110111

The .MORG operation is a modular org. The first operand 8 specifies to align the plant location to an 8 bit boundary.

 .MORG 8: H(16): $/8;

H(16) is specifying listing output of a 16 bit field $/8 in hex format. $ is the plant location in bit addressable memory space

 H(8): OP;

The above outputs OP to the output code file. H(8) specifies listing output in HEX of the 8 bit opcode in OP. Assembly listing output is optional.

Another MACHOP example of a more complex instruction:

 .MORG 8: H(16): $/8;
  if isnumber(source) then {
    if isreg(dest) then {
        (1); if size:(dest) = 8 then 0 else 1;
        (3): dest;}
    else {
      if !frame:(dest) and segment:(dest) != curSegment then
        +H(8): overide(dest)

          (1): if size:(dest) = 8 then 0 else 1;
        +H(2): mod(dest);
          (3): 0;
          (3): R_M(dest);
        +H(szoff): offset:(dest/8);}
    +H(size:(dest)):  source}

  else if isreg(source) && isreg(dest) then {
      (1): if size:(dest) = 8 then 0 else 1;
  else if isreg(source) then {
      (1): if size:(dest) = 8 then 0 else 1;
    +H(2): mod(dest);
  else  isreg(dest) then {
      (1): if size:(source) = 8 then 0 else 1;
    +H(2): mod(source);
  else if issegreg(dest) then {
      (2): mod(source);
      (3): issegreg(dest);
      (3): R_M(source);
    if !isreg(source);
      +H(szoff): offset:(source/8);}
  else {
      (2): mod(dest);
      (3): issegreg(source);
      (3): R_M(dest);
    if !isreg(source);
      +H(szoff): offset:(dest/8);}

Many optimazations can be done in the generator language. To provide optimization on the sequential instructions an ISO language was planed but so far not implemented. The ISO language was planed to operate on the pseudo instruction lists.

The main thing SLIC did was to separate the target machine code generation from Semitics process. The pseudo instruction were designed with the language being compiled in mind. Pseudo instruction were much like an assembly macro. I.E. The Pseudo code for a function call:

Output a procedure call instruction to the object file
PSEUDO call proc {

; if a 'far proc' use a far call
; if a 'proc' use a near call
; if a memory referance use memory size to deduce far or near
;                       call needed
; Some of this could be done in the call and farcall MACHOP's

   if TypeAttr:(proc) == farproc
     if segment:(proc) == CurrentSegment {
        ^pushreg cs                     ; Simulate far call
        ^call    proc}
        ^farcall proc;
   else if TypeAttr:(proc) == proc
        ^call    proc;
   else if SizeAttr(proc) == 2
        ^call    proc;
        ^farcall proc}

The basic concept was to produce code for any target machine architecture make it a simple matter to target different machines. By simply defining PSEUDO code using a target instructions.

In 1972 SLIC was used to produce a COBOL compiller running on a DEC- SYSTEM-10 generating code for a TI990 mini computer.

Development time for the COBOL compiler was 6 months for myself and one other programer. — Preceding unsigned comment added by Steamerandy (talkcontribs) 10:49, 13 November 2012 (UTC)


I added a section about CWIC to the "metacompiler" article. (We might want to consider merging "Metacompiler" and "Compiler-compiler" -- does anybody know how to propose that?)

The problem I'm seeing is that CWIC is probably too small in the great scheme of things to deserve its own article, but adding it as a section to "metacompiler" means it takes up too much of the article, compared with other material. I'm not sure what the "right" thing to do is.

I came into the CWIC project (we called it "The CWIC Mafia" because Erwin Book always wore a suit and looked a little bit like a mafioso) shortly after the initial development, and I basically became the maintenance engineer while Erwin, Val, and Steve worked on direct and indirect applications of the language. The Generators Language compiler was a masterpiece of passing the buck, aka "recursion".

One thing that has bothered me is that the work done on code generation in CWIC seems to have been lost to posterity. YACC, Bison, and friends concentrate on the easy problem: parsing, but offer no help with code generation. CWIC had many features that aided code generation -- the pattern matching, the <...> operator that emitted code, and operators that automatically handled repetitive sections of the tree.

If you used a pattern like (...$x...)=>, then "$x" was a list of things, and in the right hand side of the production you could write something like:

  .do (... x ...)

This meant to repeatedly execute the code in parens, assigning successive elements of the list $x to "x" in each iteration. Or you could write $(...x...), which would generate a new list, with whatever transformation you specified on each element of $x. Similarly,

  $&(...x...) was true only if the parenthesized expression was true for every element of $x
  $|(...x...) was true if the expression was true for any element
  $+(...x...) was the sum , $* was the product.

So CWIC was a significant advance on parser-only metalanguages. What it lacked (and AFAIK no metacompiler has tackled since) was assistance in maintaining the symbol dictionary. Of course, every "atom" that got scanned turned into a symbol that you could assign Lisp-style attributes to, but by now we should have meta-compilers that assist the compiler writer in at least the following two problems:

  1. The syntax says there's supposed to be an ID here. There should be operators to check that (a) the ID has been previously defined (or not defined), and (b) the ID represents a specific type of name (e.g., a variable name, a function name, etc.)
  2. Block-structured languages: you should be able to say "start a new block", which allows re-defining already defined variable names (and even function names).

Bgoldnyxnet (talk) 18:27, 5 June 2013 (UTC)

This article should probably be moved to Parser generator, as that seems to be the main focus. Compiler-compiler can then be made into a disambiguation page to both Parser generator and Metacompiler.
You might also want to have a look at attribute grammars, they add a semantic layer to parser generators to e.g. maintain symbol tables. —Ruud 20:02, 5 June 2013 (UTC)
Thank you, Ruud, I'll look at attribute grammar. Bgoldnyxnet (talk) 04:55, 7 November 2014 (UTC)
I thought CWIC had a symbol table stack. SLIC has a symbol table stack. Built in generaters are used to push and pop the symbol table stack. GETSYMTBL returns the top stack entry. It can be assigned to a symbol attribute for example. Scoped variables locals are handled that way. Debuger needed scope info. I am fairly sure CWIC had those functions. Block structured languages were handled by CWIC acording to Erwin Book. POPSYMTBL also returned the poped symbol table object.
Bgoldnyxnet you may be a bit rusty on CWIC symbol tables. Remember MAKEINT[], MAKESTR[], MAKEOCT[] etc. Intercede function stopped the default symbol table entry.
Earley Schorre meta compiler all produced code. META II generated stack machine code. TREE-META added unpardonable rules and tree making constructs : and [ ]. : followed by a node name is exact the same as CWIC. The [ ] inclosed a number and created a tree of with the number of specified branches just like the ! operator in CWIC. TREE-META did not have the power of CWIC. But was an intermediate step creating a tree and using unparse rules precursors to CWIC's generators. One could describe CWIC's generators as named grouped unparse rules having LISP 2 actions. I do not find Schorre mentioned in the TREE-META documents.
I would say that CWIC should have its own artical. Compared to TREE-META OR META II,there is a lot more to CWIC. CWIC features that earlier ones did not have. 1)Symbol table. 2)Linkable binary output. 3)Token making rules instead of built in functions. .ID, .NUM, .STR. 4)Programed backtracking. Backtrack alternative operator. 5)The much more powerful LISP 2 generater language action code. More powerful unparse rules calling generators and returning results to named local variables. etc. Steamerandy (talk) 09:17, 10 November 2015 (UTC)


As was stated in #CWIC, we don't need two (or more) pages on the same topic:

I would pick name of the first public occurrence. Thus, not "Language workbench". Ushkin N (talk) 05:22, 23 May 2016 (UTC)