Talk:Compiler-compiler

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Clarity[edit]

This article is a perfect example of an explanation written by and for people who already know what the subject is in detail. I am a tech writer specializing in software engineering and even though I have more than just a few notions of what this is about I am struggling to make sense of this. What it is missing is real world anchors - e.g. "A compiler-compiler or compiler generator is a tool" should read "A compiler-compiler or compiler generator is a computer program" - there are many more examples throughout the text that show that the author assumes that not only are all the readers of the article software engineers but that they all work in the same field. Some readers may be well informed onlookers; others may have no knowledge whatsoever of software engineering but have enough intelligence to understand what compiler-compilers do. Those people are not being catered for in this article. Tolken (talk) 05:57, 28 May 2010 (UTC)

[edit]

I think it might be useful to merge this article (Compiler-compiler) with the article on Parsers (subsection "parser generator"). (comment by Netizen 2003-10-31)

Compiler-Compiler vs. Metacompiler[edit]

Are "metacompiler" and "compiler-compiler" synonyms? I'm not sure, but I believe that a metacompiler is a selfcompiler, i.e. it can compile itself, so it's not equal to a "compiler-compiler". --surueña 10:14, 2005 Jun 18 (UTC)

there is no such term as a "selfcompiler" [1], [2]
All compilers can compile themselves, since their grammar is deterministic/complete. By definition of the term "meta-" (read Gödel, Escher, Bach for a thorough explaination) a meta-compiler is a compiler-compiler and vise-versa. yacc is a meta-compiler. A meta-meta-compiler would be a compiler that creates compilers that compile something - dynamic/on-the-fly creation of compiler modules from the BNF/ontological syntax of a language/ontology included within a file, which in turn is used to compile the actual file.
in short: Message, within a message, within the bottle. Project2501a 11:05, 18 Jun 2005 (UTC)
Yes, it seems that "selfcompiler" is the name of a specific compiler, but you know what I mean. But not every compiler can compile itself:
  • Some specialized compilers (e.g. a lot of C compilers) only implements a part of the language, so small that they cannot compile themselves.
  • A great bunch of compilers are implemented in another language. For example, when there is no compiler for a given language (every language has a first compiler, and of course it cannot be implemented in the same language), and for that languages which have a bad performance (the compiler would be very slow, so it is written in another language)
In short, a compiler that can compile itself is a property worth noting, and I knew that "type" of compilers as metacompilers. But reading some pages it seems that somebody have another completely different meaning for metacompiling [3]. Do you know any source with a good definition? In my opinion, we shouldn't put that metacompiler is a synonym of compiler-compiler until we found more info about it. --surueña 12:34, 2005 Jun 18 (UTC)
hokay, are we talking computer science here or software engineering? "some specialized compilers" are the exception to the rule as afaik. that's software engineering. specific implementation of a compiler/set of bnf instructions. if we are talking computer science, then this is a non-issue: all compilers can compile themselves.
when there is no compiler for a given language that's what a compiler-compiler/metacompiler is. it's a compiler that will give you your first compiler. it's like a candy-dispencing machine, only you insert BNF for coins.
the page you mention still talks about metacompiling. using Forth to build a meta-compiler, basically to use forth to generate applications in meta mode. Good source with good definition? Knuth! and Hofsader! and yes, a metacompiler is synonym of a compiler-compiler ever since 1969, when yacc came around.
what's your major, anyway? Project2501a 12:58, 18 Jun 2005 (UTC)
I don't understand your point. Actually I don't know whether the percentage of compilers that can build themselves is high or low, but it is a fact that there are compilers that cannot compile themselves. I don't understad the disntiction between CS or SE, but the fact is no every compiler can compile itself. Anyway, if you are sure that "metacompiler" is a synonym of "compiler-compiler" we should put it in the article, with those references. Hope this helps --surueña 08:28, 2005 Jun 20 (UTC)
it's not a matter of percentages, it's a matter of compiler theory. Compiler theory is part of Computer science. if there are some compilers out there in the field, that do not compile themselves due to management/engineering decisions, it's a matter of applied computer science, aka Software engineering. Matter of fact is, the set of BNF instructions that makes up any given computer language, can always compile itself. Try it with a pen an paper. Computer science theory is about the data, as the computer science article says, not about the computer. something like organising information theory into patterns. Project2501a 12:18, 20 Jun 2005 (UTC)
OK, I understand now your point, thanks for the explanation. But although in theory every compiler can compile itself in practice this isn't true, and no Computer Science book can change that... :-) I try to find the name given to that type of compilers... --surueña 13:21, 2005 Jun 20 (UTC)
well, yeah, see, an article on compilers of compilers/metacompilers/how ever you want to call it, falls under computer science. not software engineering, so, in this case, the practice is irrelevant, a note in the article, at best. Project2501a

Actually, hmmm...

Request to move to metacompiler

\{\{move|metacompiler\}\}


Zanaq 5 July 2005 21:19 (UTC)Removed the request to move because:
  • The consensus seems to be that a metacompiler and a compiler-compiler might not be the same.
  • The google reveals "compiler-compiler" is the more common term. "Compiler Compiler" 114000 hits, metacompiler 763 hits
  • I personally have heard compiler-compiler occasionaly, but not metacompiler.


Consensus has nothing to do about this, it's a matter of definition, look up your compiler theory book. So, metacompiler shows up less times that compiler-compiler. I will admit that the request to move to metacompiler was a bad idea. compiler-compiler is more popular. but still, char *compiler_compiler; char *metacompiler; compiler_compiler = metacompiler; Project2501a 5 July 2005 22:16 (UTC)


The history section looks a bit confusing, is there any example of a real compiler-compiler?

I am posting this for use here. I developed SLIC. It was based on work done by Val Schorre

TreeMeta and Mata II are compiler-compilers / translator systems.

CWIC (Compiler for Writing and Implementing Compilers) is a compiler compiler developed by Val Schorre and Erwin Book of the System Development Corporation.

CWIC produced binary IBM 360 executable code. It was a further development of V Schorre the developer of TreeMeta.

The syntax language was similar to Tree Meta producing a tree structure. But very different in that generator procedures were called to process them. The Generator language was far more powerful having several unique features.

A simple arithmetic expression example in the syntax language.

EXPR = TERM $(('+':ADD / '-':SUB) TERM !2);

TERM = FACTOR $(('*':MPY / '/':DIV) FACTOR !2);

FACTOR = '(' EXPR ')' / NUMBER / VAR;

The syntax language is composed of rules that specific the valid syntax of input. CWIC maintains two stacks. One containing nodes and the other, the parse tree. The ":" operator puts a node on the node stack and the "!" operator combines into a list the top node stack entry and the top n entries of the parse stack.

The EXPR rule defines an expression to be a TERM followed by a series of, zero or more, TERMs separated + OR - OPERATORS. The "$" operator specifies zero or more or the following may be present.

The expression "a + 2*(b+c) + D" would be turned into a tree representation as ADD[a,ADD[MPY[2.ADD[b,c]],D]] or in list form: [ADD,a,[ADD,[MPY,2.[ADD,b,c]],D]]

CWIC provide 3 types of rules. Syntax, Token and character class.

A digit character class rule for example:

DIGIT: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

A simple integer number.

NUMBER .. DIGIT $DIGIT MAKENUMBER[];

MAKENUMBER[] is a call to a library generator function that takes the token parsed in the token buffer, converts it to a numeric value and puts it on the parse stack. In the absence of a generator function as MAKENUMBER the token buffer would be simply entered into the symbol table and placed on the parse stack.

The GENERATOR language was an algorithmic procedural language that emitted 360 machine code.

The CWIC generator language provided pattern matched arguments. For example:

gen(ADD[gen(x),gen(y))) => {...}

  {SUB[gen(x),gen(y)]) => {...}
   ...
  integer(x) => return x;
  varable(x) => return x;

A generator consisted of it name (gen above), and pattern productions pairs.

  (<pattern>) => production

The pattern could be simply a varable list or as in the above a specific pattern match. The pattern valadated the input and could brake it apart. The unique feature of function aall in a pattern is that the function is called with that argument and it's parameter list are returned values.


In 1969 I implemented a Mata compiler system called SLIC. SLIC was developed at Cerritos Collage in Norwalk Calif. on a DEC-SYSTEM-10.

It consisted of several special purpose languages. The syntax language much like CWIC and TREEMETA was a rule based language producing tree or list structures. The Generator language emitted PSEUDO code instructions to named section lists. Section were simply a data structure that defined the section parameters and a list containing emitted pseudo instructions. A call to flush a section would sequentially process the sections pseudo code list, calling the pseudo code procedures. The execution of the pseudo code procedures output relocatable machine code.

The MACHOP defined a target machine instruction or instruction class. A MACHOP could define a single instruction using it's mnemonic name or a set of instructions having the same binary format using a vectored entry. In the vectored entry the MACHOP name was instead a variable name preceded by a "#" character. In the vectored entry the MACHOP name was set to the opcode value. A list of opcode mnemonic , value pairs followed the instruction definition.

Machine code was defined in a bit addressable memory space.

An example of a vectored entry MACHOP:

.MACHOP #OP;

.MORG 8: H(16): $/8;
 H(8): OP;
 #AAA     0b00110111;
 #AAS     0b00111111;
 #CBW     0b10011000;
 #CLC     0b11111000;
 #CLD     0b11111100;
 #CLI     0b11111010;
 #CMC     0b11110101;
 #CWD     0b10011001;
 #DAA     0b00100111;
 #DAS     0b00101111;
 #HLT     0b11110100;
 #INT3    0b11001100;
 #INTO    0b11001110;
 #IRET    0b11001111
 #LAHF    0b10011111;
 #LEAVE   0b11001001;
 #NOP     0b10010000;
 #POPA    0b01100001;
 #POPF    0b10011101;
 #PUSHA   0b01100000;
 #PUSHF   0b10011100;
 #SAHF    0b10011110;
 #STC     0b11111001;
 #STD     0b11111101;
 #STI     0b11111011;
 #XLAT    0b11010111;

The above is a very simple instruction consisting of only an 8 bit opcode. #OP specifies a vectored entry point. OP id a variable that contains the opcode binary value. The line:

 #AAA     0b00110111;
 

is an AAA opcode entry. OP would contain the binary value 0b00110111

The .MORG operation is a modular org. The first operand 8 specifies to align the plant location to an 8 bit boundary.

.MORG 8: H(16): $/8;

H(16) is specifying listing output of a 16 bit field $/8 in hex format. $ is the plant location in bit addressable memory space

H(8): OP;

The above outputs OP to the output code file. H(8) specifies listing output in HEX of the 8 bit opcode in OP. Assembly listing output is optional.


Another MACHOP example of a more complex instruction:

.MORG 8: H(16): $/8;
 if isnumber(source) then {
   if isreg(dest) then {
     +H(4):0b1011;
       (1); if size:(dest) = 8 then 0 else 1;
       (3): dest;}
   else {
     if !frame:(dest) and segment:(dest) != curSegment then
       +H(8): overide(dest)
       +H(7):0b1100011;
         (1): if size:(dest) = 8 then 0 else 1;
       +H(2): mod(dest);
         (3): 0;
         (3): R_M(dest);
       +H(szoff): offset:(dest/8);}
   +H(size:(dest)):  source}


 else if isreg(source) && isreg(dest) then {
   +H(7):0b1000100;
     (1): if size:(dest) = 8 then 0 else 1;
   +H(2):0b11;
     (3):source;
     (3):dest;}
 else if isreg(source) then {
   +H(7):0b1000100;
     (1): if size:(dest) = 8 then 0 else 1;
   +H(2): mod(dest);
     (3):source;
     (3):R_M(dest);}
 else  isreg(dest) then {
   +H(7):0b1000101;
     (1): if size:(source) = 8 then 0 else 1;
   +H(2): mod(source);
     (3):source;
     (3):R_M(source);}
 else if issegreg(dest) then {
   +H(8):0b10001110;
     (2): mod(source);
     (3): issegreg(dest);
     (3): R_M(source);
   if !isreg(source);
     +H(szoff): offset:(source/8);}
 else {
   +H(8):0b10001100;
     (2): mod(dest);
     (3): issegreg(source);
     (3): R_M(dest);
   if !isreg(source);
     +H(szoff): offset:(dest/8);}

Many optimazations can be done in the generator language. To provide optimization on the sequential instructions an ISO language was planed but so far not implemented. The ISO language was planed to operate on the pseudo instruction lists.

The main thing SLIC did was to separate the target machine code generation from Semitics process. The pseudo instruction were designed with the language being compiled in mind. Pseudo instruction were much like an assembly macro. I.E. The Pseudo code for a function call:

Output a procedure call instruction to the object file

PSEUDO call proc {

if a 'far proc' use a far call
if a 'proc' use a near call
if a memory referance use memory size to deduce far or near
call needed
Some of this could be done in the call and farcall MACHOP's
  if TypeAttr:(proc) == farproc
    if segment:(proc) == CurrentSegment {
       ^pushreg cs                      ; Simulate far call
       ^call    proc}
    else
       ^farcall proc;
  else if TypeAttr:(proc) == proc
       ^call    proc;
  else if SizeAttr(proc) == 2
       ^call    proc;
  else
       ^farcall proc}

The basic concept was to produce code for any target machine architecture make it a simple matter to target different machines. By simply defining PSEUDO code using a target instructions.

In 1972 SLIC was used to produce a COBOL compiller running on a DEC- SYSTEM-10 generating code for a TI990 mini computer.

Development time for the COBOL compiler was 6 months for myself and one other programer. — Preceding unsigned comment added by Steamerandy (talkcontribs) 10:49, 13 November 2012 (UTC)

CWIC[edit]

I added a section about CWIC to the "metacompiler" article. (We might want to consider merging "Metacompiler" and "Compiler-compiler" -- does anybody know how to propose that?)

The problem I'm seeing is that CWIC is probably too small in the great scheme of things to deserve its own article, but adding it as a section to "metacompiler" means it takes up too much of the article, compared with other material. I'm not sure what the "right" thing to do is.

I came into the CWIC project (we called it "The CWIC Mafia" because Erwin Book always wore a suit and looked a little bit like a mafioso) shortly after the initial development, and I basically became the maintenance engineer while Erwin, Val, and Steve worked on direct and indirect applications of the language. The Generators Language compiler was a masterpiece of passing the buck, aka "recursion".

One thing that has bothered me is that the work done on code generation in CWIC seems to have been lost to posterity. YACC, Bison, and friends concentrate on the easy problem: parsing, but offer no help with code generation. CWIC had many features that aided code generation -- the pattern matching, the <...> operator that emitted code, and operators that automatically handled repetitive sections of the tree.

If you used a pattern like (...$x...)=>, then "$x" was a list of things, and in the right hand side of the production you could write something like:

  .do (... x ...)

This meant to repeatedly execute the code in parens, assigning successive elements of the list $x to "x" in each iteration. Or you could write

  $(...x...), which would generate a new list, with whatever transformation you specified on each element of $x.  Similarly,
  $&(...x...) was true only if the parenthesized expression was true for every element of $x
  $|(...x...) was true if the expression was true for any element
  $+(...x...) was the sum , $* was the product.

So CWIC was a significant advance on parser-only metalanguages. What it lacked (and AFAIK no metacompiler has tackled since) was assistance in maintaining the symbol dictionary. Of course, every "atom" that got scanned turned into a symbol that you could assign Lisp-style attributes to, but by now we should have meta-compilers that assist the compiler writer in at least the following two problems:

  1. The syntax says there's supposed to be an ID here. There should be operators to check that (a) the ID has been previously defined (or not defined), and (b) the ID represents a specific type of name (e.g., a variable name, a function name, etc.)
  2. Block-structured languages: you should be able to say "start a new block", which allows re-defining already defined variable names (and even function names).

Bgoldnyxnet (talk) 18:27, 5 June 2013 (UTC)

This article should probably be moved to Parser generator, as that seems to be the main focus. Compiler-compiler can then be made into a disambiguation page to both Parser generator and Metacompiler.
You might also want to have a look at attribute grammars, they add a semantic layer to parser generators to e.g. maintain symbol tables. —Ruud 20:02, 5 June 2013 (UTC)