Literate programming: Difference between revisions

Content deleted Content added

Inline

Revision as of 18:59, 26 December 2008

LITERATE PROGRAMMING is an approach to programming (a "paradigm") which was introduced by one of the programming legends of the earlier generation, Donald Knuth.

When presenting L.P. in a 1983 paper (see "Literate Programming" by Donald E. Knuth, in Computer Journal), he directly positioned it as an alternative to "Structural Programming" of the 1970s (which in turn was a successor to programming practices on earlier computers before the advent of higher-level programming languages), i.e. as a programming paradigm, and sometimes called it a new "language" - written on top of Pascal, or C, or any other language of machine instructions. When asked about his preferred programming language, Knuth replies "WEB" or "CWEB", which is the name of his L.P. system.

Overivew

The major idea of L.P. is to stop coding in the manner and order imposed by the computer, and provide tools to enable the human programmer to develop a program in the order demanded by the logic and flow of his thought: imagine a human trying to explain the program concepts to you, rather than reading and deciphering machine source code line by line.

In contrast to the traditional practice an L.P. program is not machine code, sequenced to best suit the machine, with some side comments to help humans to decipher it post-factum.

Rather in L.P. the program is written as an uninterrupted exposition of logic in an ordinary human language, as if a text of an essay, in which macros which hide abstractions and/or direct code are included. Full code with macros expanded can then in one step be extracted from this text by a literate programming tool and "entangled" (i.e. convoluted, made non-natural for a human) into what the computer demands for further compilation and/or running. Fully formatted documentation is also produced in one step ("weaved") from the same L.P. source file.

While the first generation Literal Programming tools were computer language-specific, the later ones are language-agnostic and exist "above" the programming languages.

Technically this is achieved by means of:

a tool which provides a macro preprocessor for the program which itself is written "in an L.P. language". An L.P. language file is an explanation of the program logic in a natural tongue, e.g. English, interspersed with snippets of macros and/or machine language code
the preprocessor tool substitutes arbitrary hierarchies or rather "interconnected 'webs' of macros", to produce the machine source code with one command ("tangle") and documentation (which is typically fully formatted) with another ("weave").
Macros in an L.P. source file are simply title-like or explanatory phrases in a human language, hiding chunks of code and/or lower-level macros, describing human abstractions created while solving the problem. The best way to understand these is to think of an "algorithm in pseudocode", typically used in CS teaching. These arbitrary explanatory token phrases become precise new "operators of the language" in L.P., which are created on the fly by the programmer.
the preprocessor also provides an ability to add to already created macros in any place in the text of the source file of the L.P. program, so disposing with the need to keep in one's mind restrictions imposed by machine languages and interrupt the flow of thought.

As a result L.P. programming (according to its originator) provides for

Higher-quality programs: when forced to explicitly state his own thinking the programmer checks himself from kludging which cannot now be hidden in the code as easily.
A first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program creation. Both the author can restart his own thinking process at any later time, and another programmer can understand the construction of it much easier. This differs from traditional documentation when a human is presented with a convoluted set of machine instructions and is deciphering their meaning from some side notes, which follow the machine-imposed order of those instructions

L.P. also facilitates thinking in general, giving higher "bird's eye view" of the code and increasing the quantity of items, concepts etc. the mind can keep simultaneously and then successfully transform and handle.

Literate Programming is a much misunderstood paradigm, with people sometimes picking bits like "documentation" or "formatted documentation produced from a common file with both machine code and comments" or "voluminous commentaries to code" to stand for the concept, which leads to claims that simple comment formatters which are comong today (like perl POD) are "literate programming tools".

The usually missing or misunderstood part is that of a "web of abstract concepts" hiding behind the system of natural-language macros and the "change of order from machine-imposed to that convenient to the human mind operation".

Example

From Word Count program in noweb

Chapter 12 of "Literal Programming book, a collection of essays by Knuth, presents the word count (wc) program from UNIX, rewritten in CWEB to demonstrate "literate programming" in C.

The same example was redone later for the "noweb" L.P. tool, which is language-agnostic and very simple (literally, just 2 text markup conventions and 2 tool invocations are needed to use it), and allows for text formatting in HTML rather than going through the Knuth original TeX system.

Let's see the major concepts of L.P. illustrated in snippets from this example.

1. How macros are created

The purpose of wc is to count lines, words, and/or characters in a list of files. The 
number of lines in a file is ......../more explanations/

Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw: 
    <<*>>=
    <<Header files to include>>
    <<Definitions>>
    <<Global variables>>
    <<Functions>>
    <<The main program>>
    @
    
We must include the standard I/O definitions, since we want to send formatted output 
to stdout and stderr. 
    <<Header files to include>>=
    #include <stdio.h>
    @

The first snippet shows how arbitrary descriptive phrases in a human language (English) are used in an L.P. to create macros, i.e. precise new "operators" of the L.P. language, which hide chunks of code and/or other macros. The mark-up notation is very simple: double angle brackets. The code section in an L.P. file written in "noweb" is ended with the "@" symbol The "<<*>>" symbol stands for the "root", topmost node the L.P. tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign, so one L.P. file can contain several files with machine source code

Note also that the unravelling of the chunks can be done in any place in the L.P. text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text, which envelops the whole program.

2. Program as a Web: Macros are not just section names Secondly, these macros are not simply the same as "section names" in standard documentation. L.P. macros can hide any chunk of code behind themselves (and/or other macros), and be used INSIDE low-level machine language operators, e.g. "if", "while" or case" would be typical. In fact, they can stand for any arbitrary chunk of code or other macros and are more general than top-down, or bottom-up "chunking" or than subsectioning. D. Knuth says that when he realized this, he began to think of a program as a web of various parts, from which the name of the original L.P. system ("WEB") obviously comes from. (Illustrated in the next code snippet, see "Fill buffer.." inside the C "while" loop)

3. Order of human logic, not that of machine codes In L.P. the chunks behind macros, once introduced with "<<...>>=" can be grown later in any place in the file by simply writing "<<name of the chunk>>+=" and adding more content to it. For example, in "C" language all includes or declarations must be made at the beginning, so forcing the human to keep this housekeeping information in his mind and interupting his thinking. This is not a small irritation but a limiting factor on thought, as memory capacity is limited, and a source of inferiour code and errors. In L.P. the programmer avoids this casually and generally speaking he programs in the order determinged by the logic of his thinking, rather than that demanded by machine code (note macros with +=) :

 The grand totals must be initialized to zero at the beginning of the program. 
If we made these variables local to main, we would have to do this  initialization 
explicitly; however, C globals are automatically zeroed. (Or rather,``statically 
zeroed.'' (Get it?) 

    <Global variables>+=
    long tot_word_count, tot_line_count, 
         tot_char_count;
      /* total number of words, lines, chars */
    @

The present chunk, which does the counting that is wc's raison d'etre, was actually one of 
the simplest to write. We look at each character and change state if it begins or ends 
a word. 

    <<Scan file>>=
    while (1) {
      <<Fill buffer if it is empty; break at end of file>>
      c = *ptr++;
      if (c > ' ' && c < 0177) {
        /* visible ASCII codes */
        if (!in_word) {
          word_count++;
          in_word = 1;
        }
        continue;
      }
      if (c == '\n') line_count++;
      else if (c != ' ' && c != '\t') continue;
      in_word = 0;
        /* c is newline, space, or tab */
    }
    @

4. And finally, this record of the train of thought creates superior documentation at the same time as creating a program. No more side comments to instructions in machine code, but the explanation of concepts on each level, with subconcepts deferred to their appropriate place, which allows for better communication of thought.

Such exposition of ideas creates the flow of thought that is like a literary work. Knuth famously wrote "novel" which explains the code of a computer strategy game, perfectly readable. Applicability of the concept to programming on a large scale, that of commercial-grade programs is proven by an edition of TeX code as a Literate Program in 5 volumes.

Quotes: author about the concept

History

Current implementations

The first published literate programming environment was WEB, introduced by Donald Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation.

The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979; he was inspired by the ideas^[1] of Pierre Arnoul de Marneffe. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems and can produce TeX and PDF documentation. Other implementations of the concept are noweb and FunnelWeb, both of which are independent of the programming language of the source code.

Noweb is language-agnostic and is well-known for its simplicity: literally, just 2 text markup conventions and 2 tool invocations are needed to use it, and it allows for text formatting in HTML rather than going through the Knuth original TeX system.

FunnelWeb is another program without dependency on TeX which can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options. This tool is also completely language-agnostic.

The Leo text editor is an outlining editor which supports optional noweb and CWEB markup.

The author of Leo (written in Python) actually mixes two different approaches: first, it's an outlining editor, and this feature helps enormously with management of large texts, relieving the creator's attention and increasing the "horizon" of information he can keep in his mind.

But secondly, Leo incorporates some of the ideas of L.P., which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to L.P or even extending L.P. with outlines, thus increasing the "bird's eye view" of code and concepts. ^[2]

The Haskell programming language has native support for literate programming, inspired by CWEB but with a simpler implementation. When aiming for TeX output, one writes a plain LaTeX file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. Haskell's functional, modular nature^[3] makes literate programming directly in the language sensible, making a separate code generation pass unnecessary (compare WEB's TANGLE pass that generates imperative Pascal code). This is made possible by Haskell's declarative, purely functional, lazy semantics: arbitrary sections of code can be factored into separate functions and documented as separate conceptual units, without changing the semantics of the compiled program.

Perl (as one example of the mentioned misconception) does NOT support literate programming. Its Plain Old Documentation or POD format allows just human readable documentation inserts with basic formatting in the same file where the source code is written, but no "new operators" out of arbitrary phrases nor changing order from language-prescribed to that of logical thinking is possible. This embedded perl documentation is also commonly parsed from the code into other formats, including HTML or LaTeX, which is an absolutely standard feature with modern scripting languages, none of which have anything to do with Literate Programming per ce, although it remains always possible when external language-agnostic tools such as the most popular and simple in use "noweb" are employed.

References

^ Pierre Arnoul de Marneffe, Holon Programming. Univ. de Liege, Service d'Informatique (December, 1973).
^ Leo homepage cites support for noweb and CWEB
^ Why Functional Programming Matters

External links

comp.programming.literate FAQ at internet FAQ archives
Literate Programming newsgroup
Literate Programming website
LiteratePrograms is a literate programming wiki.
Select: A literate programming example using noweb
Softpanorama page on literate programming
Haskell literate programming, Specification of literate programming in the Haskell Report, the accepted Haskell standard
Noweb — A Simple, Extensible Tool for Literate Programming, Example: "word count" program as a literate program

[1] Pierre Arnoul de Marneffe, Holon Programming. Univ. de Liege, Service d'Informatique (December, 1973).

[2] Leo homepage cites support for noweb and CWEB

[3] Why Functional Programming Matters

[1]

[2]

[3]