Jump to content

Literate programming: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edits by 62.140.253.8 to last version by Tilla2501 (HG)
reverted THIRD vandalization by a robot; no doubt it will ban me forever now
Line 1: Line 1:
{{POV|date=December 2008}}
'''''LITERATE PROGRAMMING''''' is an approach to programming (a "paradigm")
{{Unreferencedsection|date=December 2008}}
which was introduced by one of the programming legends of the earlier generation,
'''Literate programming''' in a nutshell is a programming paradigm which uses a "substitution macro processor" that allows arbitrary phrases in any human language to stand for meaningful chunks of code, which can build upon each other thus forming hierarchies of arbitrary depth.{{Fact|date=December 2008}}
Donald Knuth.
This is done in aid of thinking and to disentangle the programmer from the demands imposed on thinking by a lower lever computer language.{{Fact|date=December 2008}}


When presenting L.P. in a 1983 paper (see "Literate Programming" by Donald E. Knuth, in Computer Journal), he directly positioned it as an alternative to "Structural Programming" of the 1970s (which in turn was a successor to programming practices on earlier computers before
L.P. is very often wrongly thought of as based singly on the premise that a [[computer program]] should be written similar to [[literature]], with human readability as a primary goal.{{Fact|date=December 2008}} According to this misconception, programmers should aim just for a “literate” style in their programming as writers aim for an intelligible and articulate style in their writing.{{Fact|date=December 2008}} This is claimed to contrast with the mainstream view that the programmer’s primary or sole objective is to create source code and that documentation should only be a secondary objective.{{Fact|date=December 2008}}
the advent of higher-level programming languages), i.e. as a programming paradigm,
and sometimes called it a new "language" - written on top of Pascal, or C, or any other
language of machine instructions. When asked about his preferred programming language, Knuth
replies "WEB" or "CWEB", which is the name of his L.P. system.


The misconception appeared as a result of attempts to explain the major real premise, "the continuity of thought", by the author of the concept, Donald Knuth.{{Fact|date=December 2008}} Secondly, it's not ONLY readability by others that is the main point of the technique. Literate Programming is not a documentation tool or convention, although logicaly laid out explanations serve as great documentation.{{Fact|date=December 2008}} First and foremost it is a PROGRAMMING PARADIGM in aid of the creator of the program.{{Fact|date=December 2008}}


This means that a program with few comments can be a fully-fledged L.P. program, or that plenty of included documentation is not by itself a sufficient condition for the file to be considered an L.P. program


== Overivew ==
There are two main ideas Literate Programming is based on:


''The major idea'' of L.P. is to stop coding in the manner and order imposed by the
1. Humans should be able to create arbitrarily complex hierarchies of abstractions while consctructing a program. Those abstractions are made out of phrases of ordinary human language, similar to the imprecise usage of such phrases often employed as examples of "pseudocode" in CS teaching. In Literate Programming these phrases become precise "new operators" during program creation.{{Fact|date=December 2008}}
computer, and provide tools to enable the human programmer to develop
These new operators can both hide the lower-level implementation in actual programming code (e.g. in languages like "C", or "Perl" - the technique is not language-specific), and/or intersperse the lower programming language constructs, hiding behind themselves meaningful chunks of code. [http://www.cs.tufts.edu/~nr/noweb/examples/wc.html Example: "word count" program as a literate program]
a program in the order demanded by the logic and flow of his thought: imagine
a human trying to explain the program concepts to you, rather than reading and
deciphering machine source code line by line.


2. The second leg on which the concept of Literate Programming rests, is that humans should not discontinue or disrupt their thinking because machine coding order demands it.{{Fact|date=December 2008}}
For example, adding some code to a program in C may demand that additions to header files and/or new inclusions be made in one or several other places in the program (an actual example used in an explanatory article by Knuth)
The literate programming tools ensure that the human does not have to keep such "housekeeping information" in his mind, allowing for all additions to be done in the place where logical development is conducted. [http://www.cs.tufts.edu/~nr/noweb/examples/wc.html Example: "word count" program as a literate program]


''In contrast to the traditional practice'' an L.P. program is ''not'' machine code,
As a result of these two ideas, continuity of thinking and creating programs in the logical orger of thinking is achieved out of arbitrarily complex systems of abstractions.
sequenced to best suit the machine, with some side comments to help humans to decipher
Attempts to explain this concept by Knuth led to his metaphorical likening of the work of a programmer to the creation of a fine essay by an accomplished writer.{{Fact|date=December 2008}}
it post-factum.


''Rather'' in L.P. the program is written as an uninterrupted exposition of logic in
Further misinterpretation by the public led to a wide-spread belief, that Literate Programming is nothing more but voluminous commentaries interspersed with code written in the same file (from which the actual L.P. tools will create pure source code and fully formatted documentation), and claims that any system that inserts commentaries into code is in some way "literate", especially if it allows for some print formatting.{{Fact|date=December 2008}}
an ordinary human language, as if a text of an essay, in which macros which hide
abstractions and/or direct code are included. Full code with macros
expanded can then in one step be extracted from this text by a literate programming
tool and "entangled" (i.e. convoluted, made non-natural for a human) into what
the computer demands for further compilation and/or running.
Fully formatted documentation is also produced in one step ("weaved") from the same
L.P. source file.


While the first generation Literal Programming tools were computer language-specific,
In practice, L.P. is achieved by combining human-readable [[software documentation|documentation]] and machine-readable [[source code]] into a single [[source file]], which is best explained by [http://www.cs.tufts.edu/~nr/noweb/examples/wc.html example: "word count" program as a literate program]. The order and structure of this source file are specifically designed to aid human comprehension: code and documentation together are organized in logical and/or hierarchical order (typically according to a scheme that accommodates detailed explanations and commentary as necessary to implement the two major premises as described above). At the same time, the structure and format of the source files accommodate external utilities that generate program documentation and/or extract the machine-readable code from the same source file(s) (''e. g.'', for subsequent processing by compilers or interpreters).
the later ones are language-agnostic and exist "above" the programming languages.



== History and current implementations ==
''Technically this is achieved by'' means of:
The first published literate programming environment was [[WEB]], introduced by [[Donald Knuth]] in 1981 for his [[TeX]] typesetting system; it uses [[Pascal programming language|Pascal]] as its underlying programming language and <nowiki>TeX</nowiki> for typesetting of the documentation.

* a tool which provides a macro preprocessor for the program which itself is written "in an L.P. language". An L.P. language file is an explanation of the program logic in a natural tongue, e.g. English, interspersed with snippets of macros and/or machine language code
* the preprocessor tool substitutes arbitrary hierarchies or rather "interconnected 'webs' of macros", to produce the machine source code with one command ("tangle") and documentation (which is typically fully formatted) with another ("weave").
* Macros in an L.P. source file are simply title-like or explanatory phrases in a human language, hiding chunks of code and/or lower-level macros, describing human abstractions created while solving the problem. The best way to understand these is to think of an ''"algorithm in pseudocode", typically used in CS teaching''. These arbitrary explanatory token phrases become precise new "operators of the language" in L.P., which are created on the fly by the programmer.
* the preprocessor also provides an ability to add to already created macros in any place in the text of the source file of the L.P. program, so disposing with the need to keep in one's mind restrictions imposed by machine languages and interrupt the flow of thought.

''As a result'' L.P. programming (according to its originator) provides for

* Higher-quality programs: when forced to explicitly state his own thinking the programmer checks himself from kludging which cannot now be hidden in the code as easily.
* A first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program creation. Both the author can restart his own thinking process at any later time, and another programmer can understand the construction of it much easier. This differs from traditional documentation when a human is presented with a convoluted set of machine instructions and is deciphering their meaning from some side notes, which follow the machine-imposed order of those instructions
L.P. also facilitates thinking in general, giving higher "bird's eye view" of the code and increasing the quantity of items, concepts etc. the mind can keep simultaneously and then successfully transform and handle.


Literate Programming is a ''much misunderstood paradigm'', with people sometimes picking
bits like "documentation" or "formatted documentation produced from a common file with
both machine code and comments" or "voluminous commentaries to code" to stand for the
concept, which leads to claims that simple comment formatters which are comong today
(like perl POD) are "literate programming tools".

''The usually missing or misunderstood part is'' that of a "web of abstract concepts" hiding
behind the system of natural-language macros and the "change of order from machine-imposed
to that convenient to the human mind operation".



== Example ==

From [http://www.cs.tufts.edu/~nr/noweb/examples/wc.html Word Count program in noweb]

Chapter 12 of "Literal Programming book, a collection of essays by Knuth, presents the word count (wc) program from UNIX, rewritten in CWEB to demonstrate "literate programming" in C.

The same example was redone later for the "noweb" L.P. tool, which is language-agnostic and very simple (literally, just 2 text markup conventions and 2 tool invocations are needed to use it), and allows for text formatting in HTML rather than going through the Knuth original TeX system.

Let's see the major concepts of L.P. illustrated in snippets from this example.


''1. How macros are created''

<blockquote><source lang="c">
The purpose of wc is to count lines, words, and/or characters in a list of files. The
number of lines in a file is ......../more explanations/

Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw:
<<*>>=
<<Header files to include>>
<<Definitions>>
<<Global variables>>
<<Functions>>
<<The main program>>
@
We must include the standard I/O definitions, since we want to send formatted output
to stdout and stderr.
<<Header files to include>>=
#include <stdio.h>
@
</source></blockquote>

The first snippet shows how arbitrary descriptive phrases in a human language (English) are
used in an L.P. to create macros, i.e. precise new "operators" of the L.P. language, which
hide chunks of code and/or other macros.
The mark-up notation is very simple: double angle brackets. The code section in an L.P. file
written in "noweb" is ended with the "@" symbol
The "<<*>>" symbol stands for the "root", topmost node the L.P. tool will start expanding
the web of macros from.
Actually, writing out the expanded source code can be done from any section or subsection
(i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign, so one
L.P. file can contain several files with machine source code

Note also that the unravelling of the chunks can be done in any place in the L.P. text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text, which envelops the whole program.


''2. Program as a Web: Macros are not just section names''
Secondly, these macros are not simply the same as "section names" in standard documentation.
L.P. macros can hide any chunk of code behind themselves (and/or other macros), and be
used ''INSIDE'' low-level machine language operators, e.g. "if", "while" or case" would be typical.
In fact, they can stand for any arbitrary chunk of code or other macros and are more general than top-down, or bottom-up "chunking" or than subsectioning. D. Knuth says that when he realized this, he began to think of a program as a ''web'' of various parts, from which the name of the original L.P. system ("WEB") obviously comes from. ''(Illustrated in the next code snippet, see "Fill buffer.." inside the C "while" loop)''

''3. Order of human logic, not that of machine codes''
In L.P. the chunks behind macros, once introduced with "<<...>>=" can be grown later
in any place in the file by simply writing "<<name of the chunk>>+=" and adding more content to it. For example, in "C" language all includes or declarations must be made at the beginning, so forcing the human to keep this housekeeping information in his mind and interupting his thinking. This is not a small irritation but a limiting factor on thought, as memory capacity is limited, and a source of inferiour code and errors. In L.P. the programmer avoids this casually and generally speaking he programs in the order determinged by the logic of his thinking, rather than that demanded by machine code (note macros with +=) :
<blockquote><source lang="c"> The grand totals must be initialized to zero at the beginning of the program.
If we made these variables local to main, we would have to do this initialization
explicitly; however, C globals are automatically zeroed. (Or rather,``statically
zeroed.'' (Get it?)

<Global variables>+=
long tot_word_count, tot_line_count,
tot_char_count;
/* total number of words, lines, chars */
@

The present chunk, which does the counting that is wc's raison d'etre, was actually one of
the simplest to write. We look at each character and change state if it begins or ends
a word.

<<Scan file>>=
while (1) {
<<Fill buffer if it is empty; break at end of file>>
c = *ptr++;
if (c > ' ' && c < 0177) {
/* visible ASCII codes */
if (!in_word) {
word_count++;
in_word = 1;
}
continue;
}
if (c == '\n') line_count++;
else if (c != ' ' && c != '\t') continue;
in_word = 0;
/* c is newline, space, or tab */
}
@

</source></blockquote>


''4.'' And finally, ''this record of the train of thought creates superior documentation''
at the same time as creating a program. No more side comments to instructions in
machine code, but the explanation of concepts on each level, with subconcepts deferred to
their appropriate place, which allows for better communication of thought.

Such exposition of ideas creates the flow of thought that is like a literary work. Knuth famously wrote "novel" which explains the code of a computer strategy game, perfectly readable.
Applicability of the concept to programming on a large scale, that of commercial-grade programs is proven by an edition of TeX code as a Literate Program in 5 volumes.



== Quotes: author about the concept ==
Literate Programming is a much misunderstood paradigm, with people sometimes picking bits like "documentation" or "formatted documentation produced from a common file with both machine code and comments" or "voluminous commentaries to code" to stand for the concept, which leads to claims that simple comment formatters which are comong today (like perl POD) are "literate programming tools".

The usually missing or misunderstood part is that of a "web of abstract concepts" hiding behind the system of natural-language macros and the "change of order from machine-imposed to that convenient to the human mind operation".

Therefore it is necessary to confirm the outlined understanding with direct quotes
from the author of the concept:


1. On productivity and L.P. as aid of thinking

From: Interview with Donald Knuth - By Donald E. Knuth, Andrew Binstock - Apr 25, 2008
<blockquote>
Yet to me, literate programming is certainly the most important thing that
came out of the TeX project. Not only has it enabled me to write and
maintain programs faster and more reliably than ever before, and been one
of my greatest sources of joy since the 1980s-it has actually been
indispensable at times. Some of my major programs, such as the MMIX
meta-simulator, could not have been written with any other methodology
that I've ever heard of. The complexity was simply too daunting for my
limited brain to handle; without literate programming, the whole
enterprise would have flopped miserably.
...
Literate programming is what you need to rise
above the ordinary level of achievement.
</blockquote>

From: ibid.
<blockquote>
According to the current directories on my machine, I've written 68
different CWEB programs so far this year. There were about 100 in 2007, 90
in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely
convenient "change file" mechanism, with which I can rapidly create
multiple versions and variations on a theme; so far in 2008 I've made 73
variations on those 68 themes. (Some of the variations are quite short,
only a few bytes; others are 5KB or more. Some of the CWEB programs are
quite substantial, like the 55-page BDD package that I completed in
January.) Thus, you can see how important literate programming is in my
life.

</blockquote>



2. On the program as a "web of ideas"

From: "Literate Programming", submitted to Computer Journal in Sep 1983
<blockquote>
I chose the name WEB partly because it was one of
the few three-letter words of English that hadn't al-
ready been applied to computers. But as time went on,
I've become extremely pleased with the name, because
I think that a complex piece of software is, indeed, best
regarded as a web that has been delicately pieced to-
gether from simple materials. We understand a compli-
cated system by understanding its simple parts, and by
understanding the simple relations between those parts
and their immediate neighbors. If we express a pro-
gram as a web of ideas, we can emphasize its structural
properties in a natural and satisfying way.
</blockquote>


3. On reordering: human logic and continuity of thought, not machine order

From: ibid.
<blockquote>
I had the feeling that
top-down and bottom-up were opposing methodologies:
one more suitable for program exposition and the other
more suitable for program creation.
But after gaining experience with WEB, I have come to
realize that there is no need to choose once and for all
between top-down and bottom-up, because a program
is best thought of as a web instead of a tree. A hi-
erarchical structure is present, but the most important
thing about a program is its structural relationships. A
complex piece of software consists of simple parts and
simple relations between those parts; the programmer's
task is to state those parts and those relationships,
in whatever order is best for human comprehension
not in some rigidly determined order like top-down or
bottom-up.

</blockquote>

Same even more forcefully.
Note also that Web is called "a language":

<blockquote>
Thus the WEB language allows a person to express
programs in a ''"stream of consciousness" order''. TANGLE
is able to scramble everything up into the arrangement
that a PASCAL compiler demands. This feature of WEB
is perhaps its greatest asset; it makes a WEB-written
program much more readable than the same program
written purely in PASCAL, even if the latter program is
well commented. And the fact that there's no need to
be hung up on the question of top-down versus bottom-
up, since a programmer can now ''view a large program as a web, to be explored in a '''psychologically correct order''' is perhaps the greatest lesson'' I have learned
from my recent experiences.
</blockquote>


4. L.P. is much more than a tool of documentation.
Originally thought of as "mere" documentation, it is
now a programming tool:

From: ibid.
<blockquote>
Another surprising thing that I learned while using
WEB was that traditional programming languages had
been causing me to write inferior programs, although I
hadn't realized what I was doing. My original idea was
that WEB would be merely a tool for documentation, but
I actually found that my WEB programs were better than
the programs I had been writing in other languages.
</blockquote>


5. On the system of macros in the original WEB implementation
(based on Pascal); included as evidence because the modern misconception
that sees L.P. simply as a documentation tool does not even realize
the nature of this macro language on top of the computer coding language:

From: ibid.
<blockquote>
WEB's macros are allowed to have at most one pa-
rameter. Again, I did this in the interests of simplicity,
because I noticed that most applications of multiple pa-
rameters could in fact be reduced to the one-parameter
case. For example, suppose that you want to de?ne
something like......
......
In other words, the name of one macro can usefully be
a parameter to another macro. This particular trick
makes it possible to.......
</blockquote>


== History ==



== Current implementations ==

'''The first''' published literate programming environment was '''[[WEB]]''', introduced by [[Donald Knuth]] in 1981 for his [[TeX]] typesetting system; it uses [[Pascal programming language|Pascal]] as its underlying programming language and <nowiki>TeX</nowiki> for typesetting of the documentation.


The complete commented <nowiki>TeX</nowiki> source code was published in Knuth's ''TeX: The program'', volume B of his 5-volume ''[[Computers and Typesetting]]''. Knuth had privately used a literate programming system called DOC as early as 1979; he was inspired by the ideas<ref>[[Pierre Arnoul de Marneffe]], ''Holon Programming''. Univ. de Liege, Service d'Informatique (December, 1973).</ref> of [[Pierre Arnoul de Marneffe]]. The free [[CWEB]], written by Knuth and Silvio Levy, is WEB adapted for [[C (programming language)|C]] and [[C++]], runs on most operating systems and can produce <nowiki>TeX</nowiki> and [[Portable Document Format|PDF]] documentation. Other implementations of the concept are [[noweb]] and FunnelWeb, both of which are independent of the programming language of the source code.
The complete commented <nowiki>TeX</nowiki> source code was published in Knuth's ''TeX: The program'', volume B of his 5-volume ''[[Computers and Typesetting]]''. Knuth had privately used a literate programming system called DOC as early as 1979; he was inspired by the ideas<ref>[[Pierre Arnoul de Marneffe]], ''Holon Programming''. Univ. de Liege, Service d'Informatique (December, 1973).</ref> of [[Pierre Arnoul de Marneffe]]. The free [[CWEB]], written by Knuth and Silvio Levy, is WEB adapted for [[C (programming language)|C]] and [[C++]], runs on most operating systems and can produce <nowiki>TeX</nowiki> and [[Portable Document Format|PDF]] documentation. Other implementations of the concept are [[noweb]] and FunnelWeb, both of which are independent of the programming language of the source code.


'''Noweb''' is language-agnostic and is well-known for its simplicity: literally, just 2 text markup conventions and 2 tool invocations are needed to use it, and it allows for text formatting in HTML rather than going through the Knuth original TeX system.
The [[Leo (text editor)|Leo text editor]] supports optional noweb and CWEB markup.

The author of Leo (written in Python) actually mixes two different approaches: first, it's an outlining editor, and this feature helps enormously with management of large texts, relieving the creator's attention and increasing the "horizon" of information he can keep in his mind.
'''FunnelWeb''' is another program without dependency on TeX which can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options. This tool is also completely language-agnostic.

'''The [[Leo (text editor)|Leo text editor]]''' is an ''outlining'' editor which supports optional noweb and CWEB markup.

The author of Leo (written in Python) actually ''mixes two different approaches'': first, it's an outlining editor, and this feature helps enormously with management of large texts, relieving the creator's attention and increasing the "horizon" of information he can keep in his mind.


But secondly, Leo incorporates some of the ideas of L.P., which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes).
But secondly, Leo incorporates some of the ideas of L.P., which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes).
Line 38: Line 333:
<ref>[http://webpages.charter.net/edreamleo/front.html Leo homepage cites support for noweb and CWEB]</ref>
<ref>[http://webpages.charter.net/edreamleo/front.html Leo homepage cites support for noweb and CWEB]</ref>


The [[Haskell (programming language)|Haskell]] programming language has native support for literate programming, inspired by CWEB but with a simpler implementation. When aiming for <nowiki>TeX</nowiki> output, one writes a plain [[LaTeX]] file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. Haskell's functional, modular nature<ref>[http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html Why Functional Programming Matters]</ref> makes literate programming directly in the language sensible, making a separate code generation pass unnecessary (compare WEB's TANGLE pass that generates imperative Pascal code). This is made possible by Haskell's declarative, purely functional, lazy semantics: arbitrary sections of code can be factored into separate functions and documented as separate conceptual units, without changing the semantics of the compiled program.
'''The [[Haskell (programming language)|Haskell]] programming language''' has native support for literate programming, inspired by CWEB but with a simpler implementation. When aiming for <nowiki>TeX</nowiki> output, one writes a plain [[LaTeX]] file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. Haskell's functional, modular nature<ref>[http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html Why Functional Programming Matters]</ref> makes literate programming directly in the language sensible, making a separate code generation pass unnecessary (compare WEB's TANGLE pass that generates imperative Pascal code). This is made possible by Haskell's declarative, purely functional, lazy semantics: arbitrary sections of code can be factored into separate functions and documented as separate conceptual units, without changing the semantics of the compiled program.


[[Perl]] (as one example of the mentioned misconception) does NOT support literate programming. Its [[Plain Old Documentation]] or POD format allows just human readable documentation inserts with basic formatting in the same file where the source code is written, but no "new operators" out of arbitrary phrases nor changing order from language-prescribed to that of logical thinking is possible. This embedded perl documentation is also commonly parsed from the code into other formats, including HTML or LaTeX, which is an absolutely standard feature with modern scripting languages, none of which have anything to do with Literate Programming per ce, although it remains always possible when external language-agnostic tools such as the most popular and simple in use "noweb" are employed.
'''[[Perl]] (as one example of the mentioned misconception)''' does NOT support literate programming. Its [[Plain Old Documentation]] or POD format allows just human readable documentation inserts with basic formatting in the same file where the source code is written, but no "new operators" out of arbitrary phrases nor changing order from language-prescribed to that of logical thinking is possible. This embedded perl documentation is also commonly parsed from the code into other formats, including HTML or LaTeX, which is an absolutely standard feature with modern scripting languages, none of which have anything to do with Literate Programming per ce, although it remains always possible when external language-agnostic tools such as the most popular and simple in use "noweb" are employed.


==See also==
==See also==
* [[Intentional programming]]
* [[Leo (text editor)]]
* [[Leo (text editor)]]



Revision as of 20:21, 26 December 2008

LITERATE PROGRAMMING is an approach to programming (a "paradigm") which was introduced by one of the programming legends of the earlier generation, Donald Knuth.

When presenting L.P. in a 1983 paper (see "Literate Programming" by Donald E. Knuth, in Computer Journal), he directly positioned it as an alternative to "Structural Programming" of the 1970s (which in turn was a successor to programming practices on earlier computers before the advent of higher-level programming languages), i.e. as a programming paradigm, and sometimes called it a new "language" - written on top of Pascal, or C, or any other language of machine instructions. When asked about his preferred programming language, Knuth replies "WEB" or "CWEB", which is the name of his L.P. system.


Overivew

The major idea of L.P. is to stop coding in the manner and order imposed by the computer, and provide tools to enable the human programmer to develop a program in the order demanded by the logic and flow of his thought: imagine a human trying to explain the program concepts to you, rather than reading and deciphering machine source code line by line.


In contrast to the traditional practice an L.P. program is not machine code, sequenced to best suit the machine, with some side comments to help humans to decipher it post-factum.

Rather in L.P. the program is written as an uninterrupted exposition of logic in an ordinary human language, as if a text of an essay, in which macros which hide abstractions and/or direct code are included. Full code with macros expanded can then in one step be extracted from this text by a literate programming tool and "entangled" (i.e. convoluted, made non-natural for a human) into what the computer demands for further compilation and/or running. Fully formatted documentation is also produced in one step ("weaved") from the same L.P. source file.

While the first generation Literal Programming tools were computer language-specific, the later ones are language-agnostic and exist "above" the programming languages.


Technically this is achieved by means of:

  • a tool which provides a macro preprocessor for the program which itself is written "in an L.P. language". An L.P. language file is an explanation of the program logic in a natural tongue, e.g. English, interspersed with snippets of macros and/or machine language code
  • the preprocessor tool substitutes arbitrary hierarchies or rather "interconnected 'webs' of macros", to produce the machine source code with one command ("tangle") and documentation (which is typically fully formatted) with another ("weave").
  • Macros in an L.P. source file are simply title-like or explanatory phrases in a human language, hiding chunks of code and/or lower-level macros, describing human abstractions created while solving the problem. The best way to understand these is to think of an "algorithm in pseudocode", typically used in CS teaching. These arbitrary explanatory token phrases become precise new "operators of the language" in L.P., which are created on the fly by the programmer.
  • the preprocessor also provides an ability to add to already created macros in any place in the text of the source file of the L.P. program, so disposing with the need to keep in one's mind restrictions imposed by machine languages and interrupt the flow of thought.


As a result L.P. programming (according to its originator) provides for

  • Higher-quality programs: when forced to explicitly state his own thinking the programmer checks himself from kludging which cannot now be hidden in the code as easily.
  • A first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program creation. Both the author can restart his own thinking process at any later time, and another programmer can understand the construction of it much easier. This differs from traditional documentation when a human is presented with a convoluted set of machine instructions and is deciphering their meaning from some side notes, which follow the machine-imposed order of those instructions

L.P. also facilitates thinking in general, giving higher "bird's eye view" of the code and increasing the quantity of items, concepts etc. the mind can keep simultaneously and then successfully transform and handle.


Literate Programming is a much misunderstood paradigm, with people sometimes picking bits like "documentation" or "formatted documentation produced from a common file with both machine code and comments" or "voluminous commentaries to code" to stand for the concept, which leads to claims that simple comment formatters which are comong today (like perl POD) are "literate programming tools".

The usually missing or misunderstood part is that of a "web of abstract concepts" hiding behind the system of natural-language macros and the "change of order from machine-imposed to that convenient to the human mind operation".


Example

From Word Count program in noweb

Chapter 12 of "Literal Programming book, a collection of essays by Knuth, presents the word count (wc) program from UNIX, rewritten in CWEB to demonstrate "literate programming" in C.

The same example was redone later for the "noweb" L.P. tool, which is language-agnostic and very simple (literally, just 2 text markup conventions and 2 tool invocations are needed to use it), and allows for text formatting in HTML rather than going through the Knuth original TeX system.

Let's see the major concepts of L.P. illustrated in snippets from this example.


1. How macros are created

The purpose of wc is to count lines, words, and/or characters in a list of files. The 
number of lines in a file is ......../more explanations/

Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw: 
    <<*>>=
    <<Header files to include>>
    <<Definitions>>
    <<Global variables>>
    <<Functions>>
    <<The main program>>
    @
    
We must include the standard I/O definitions, since we want to send formatted output 
to stdout and stderr. 
    <<Header files to include>>=
    #include <stdio.h>
    @

The first snippet shows how arbitrary descriptive phrases in a human language (English) are used in an L.P. to create macros, i.e. precise new "operators" of the L.P. language, which hide chunks of code and/or other macros. The mark-up notation is very simple: double angle brackets. The code section in an L.P. file written in "noweb" is ended with the "@" symbol The "<<*>>" symbol stands for the "root", topmost node the L.P. tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign, so one L.P. file can contain several files with machine source code

Note also that the unravelling of the chunks can be done in any place in the L.P. text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text, which envelops the whole program.


2. Program as a Web: Macros are not just section names Secondly, these macros are not simply the same as "section names" in standard documentation. L.P. macros can hide any chunk of code behind themselves (and/or other macros), and be used INSIDE low-level machine language operators, e.g. "if", "while" or case" would be typical. In fact, they can stand for any arbitrary chunk of code or other macros and are more general than top-down, or bottom-up "chunking" or than subsectioning. D. Knuth says that when he realized this, he began to think of a program as a web of various parts, from which the name of the original L.P. system ("WEB") obviously comes from. (Illustrated in the next code snippet, see "Fill buffer.." inside the C "while" loop)

3. Order of human logic, not that of machine codes In L.P. the chunks behind macros, once introduced with "<<...>>=" can be grown later in any place in the file by simply writing "<<name of the chunk>>+=" and adding more content to it. For example, in "C" language all includes or declarations must be made at the beginning, so forcing the human to keep this housekeeping information in his mind and interupting his thinking. This is not a small irritation but a limiting factor on thought, as memory capacity is limited, and a source of inferiour code and errors. In L.P. the programmer avoids this casually and generally speaking he programs in the order determinged by the logic of his thinking, rather than that demanded by machine code (note macros with +=) :

 The grand totals must be initialized to zero at the beginning of the program. 
If we made these variables local to main, we would have to do this  initialization 
explicitly; however, C globals are automatically zeroed. (Or rather,``statically 
zeroed.'' (Get it?) 

    <Global variables>+=
    long tot_word_count, tot_line_count, 
         tot_char_count;
      /* total number of words, lines, chars */
    @

The present chunk, which does the counting that is wc's raison d'etre, was actually one of 
the simplest to write. We look at each character and change state if it begins or ends 
a word. 

    <<Scan file>>=
    while (1) {
      <<Fill buffer if it is empty; break at end of file>>
      c = *ptr++;
      if (c > ' ' && c < 0177) {
        /* visible ASCII codes */
        if (!in_word) {
          word_count++;
          in_word = 1;
        }
        continue;
      }
      if (c == '\n') line_count++;
      else if (c != ' ' && c != '\t') continue;
      in_word = 0;
        /* c is newline, space, or tab */
    }
    @


4. And finally, this record of the train of thought creates superior documentation at the same time as creating a program. No more side comments to instructions in machine code, but the explanation of concepts on each level, with subconcepts deferred to their appropriate place, which allows for better communication of thought.

Such exposition of ideas creates the flow of thought that is like a literary work. Knuth famously wrote "novel" which explains the code of a computer strategy game, perfectly readable. Applicability of the concept to programming on a large scale, that of commercial-grade programs is proven by an edition of TeX code as a Literate Program in 5 volumes.


Quotes: author about the concept

Literate Programming is a much misunderstood paradigm, with people sometimes picking bits like "documentation" or "formatted documentation produced from a common file with both machine code and comments" or "voluminous commentaries to code" to stand for the concept, which leads to claims that simple comment formatters which are comong today (like perl POD) are "literate programming tools".

The usually missing or misunderstood part is that of a "web of abstract concepts" hiding behind the system of natural-language macros and the "change of order from machine-imposed to that convenient to the human mind operation".

Therefore it is necessary to confirm the outlined understanding with direct quotes from the author of the concept:


1. On productivity and L.P. as aid of thinking

From: Interview with Donald Knuth - By Donald E. Knuth, Andrew Binstock - Apr 25, 2008

Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s-it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably. ... Literate programming is what you need to rise above the ordinary level of achievement.

From: ibid.

According to the current directories on my machine, I've written 68 different CWEB programs so far this year. There were about 100 in 2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely convenient "change file" mechanism, with which I can rapidly create multiple versions and variations on a theme; so far in 2008 I've made 73 variations on those 68 themes. (Some of the variations are quite short, only a few bytes; others are 5KB or more. Some of the CWEB programs are quite substantial, like the 55-page BDD package that I completed in January.) Thus, you can see how important literate programming is in my life.


2. On the program as a "web of ideas"

From: "Literate Programming", submitted to Computer Journal in Sep 1983

I chose the name WEB partly because it was one of the few three-letter words of English that hadn't al- ready been applied to computers. But as time went on, I've become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced to- gether from simple materials. We understand a compli- cated system by understanding its simple parts, and by understanding the simple relations between those parts and their immediate neighbors. If we express a pro- gram as a web of ideas, we can emphasize its structural properties in a natural and satisfying way.


3. On reordering: human logic and continuity of thought, not machine order

From: ibid.

I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for program creation. But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hi- erarchical structure is present, but the most important thing about a program is its structural relationships. A complex piece of software consists of simple parts and simple relations between those parts; the programmer's task is to state those parts and those relationships, in whatever order is best for human comprehension not in some rigidly determined order like top-down or bottom-up.

Same even more forcefully. Note also that Web is called "a language":

Thus the WEB language allows a person to express programs in a "stream of consciousness" order. TANGLE is able to scramble everything up into the arrangement that a PASCAL compiler demands. This feature of WEB is perhaps its greatest asset; it makes a WEB-written program much more readable than the same program written purely in PASCAL, even if the latter program is well commented. And the fact that there's no need to be hung up on the question of top-down versus bottom- up, since a programmer can now view a large program as a web, to be explored in a psychologically correct order is perhaps the greatest lesson I have learned from my recent experiences.


4. L.P. is much more than a tool of documentation. Originally thought of as "mere" documentation, it is now a programming tool:

From: ibid.

Another surprising thing that I learned while using WEB was that traditional programming languages had been causing me to write inferior programs, although I hadn't realized what I was doing. My original idea was that WEB would be merely a tool for documentation, but I actually found that my WEB programs were better than the programs I had been writing in other languages.


5. On the system of macros in the original WEB implementation (based on Pascal); included as evidence because the modern misconception that sees L.P. simply as a documentation tool does not even realize the nature of this macro language on top of the computer coding language:

From: ibid.

WEB's macros are allowed to have at most one pa- rameter. Again, I did this in the interests of simplicity, because I noticed that most applications of multiple pa- rameters could in fact be reduced to the one-parameter case. For example, suppose that you want to de?ne something like...... ...... In other words, the name of one macro can usefully be a parameter to another macro. This particular trick makes it possible to.......


History

Current implementations

The first published literate programming environment was WEB, introduced by Donald Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation.

The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979; he was inspired by the ideas[1] of Pierre Arnoul de Marneffe. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems and can produce TeX and PDF documentation. Other implementations of the concept are noweb and FunnelWeb, both of which are independent of the programming language of the source code.

Noweb is language-agnostic and is well-known for its simplicity: literally, just 2 text markup conventions and 2 tool invocations are needed to use it, and it allows for text formatting in HTML rather than going through the Knuth original TeX system.

FunnelWeb is another program without dependency on TeX which can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options. This tool is also completely language-agnostic.

The Leo text editor is an outlining editor which supports optional noweb and CWEB markup.

The author of Leo (written in Python) actually mixes two different approaches: first, it's an outlining editor, and this feature helps enormously with management of large texts, relieving the creator's attention and increasing the "horizon" of information he can keep in his mind.

But secondly, Leo incorporates some of the ideas of L.P., which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to L.P or even extending L.P. with outlines, thus increasing the "bird's eye view" of code and concepts. [2]

The Haskell programming language has native support for literate programming, inspired by CWEB but with a simpler implementation. When aiming for TeX output, one writes a plain LaTeX file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. Haskell's functional, modular nature[3] makes literate programming directly in the language sensible, making a separate code generation pass unnecessary (compare WEB's TANGLE pass that generates imperative Pascal code). This is made possible by Haskell's declarative, purely functional, lazy semantics: arbitrary sections of code can be factored into separate functions and documented as separate conceptual units, without changing the semantics of the compiled program.

Perl (as one example of the mentioned misconception) does NOT support literate programming. Its Plain Old Documentation or POD format allows just human readable documentation inserts with basic formatting in the same file where the source code is written, but no "new operators" out of arbitrary phrases nor changing order from language-prescribed to that of logical thinking is possible. This embedded perl documentation is also commonly parsed from the code into other formats, including HTML or LaTeX, which is an absolutely standard feature with modern scripting languages, none of which have anything to do with Literate Programming per ce, although it remains always possible when external language-agnostic tools such as the most popular and simple in use "noweb" are employed.

See also

References

  1. ^ Pierre Arnoul de Marneffe, Holon Programming. Univ. de Liege, Service d'Informatique (December, 1973).
  2. ^ Leo homepage cites support for noweb and CWEB
  3. ^ Why Functional Programming Matters

Further reading

  • Donald E. Knuth, Literate Programming, Stanford, California: Center for the Study of Language and Information, 1992, CSLI Lecture Notes, no. 27.
  • Eitan M. Guari, TeX & LaTeX: Drawing and Literate Programming. McGraw Hill 1994. ISBN 0-07-911616-7 (includes software).
  • Kurt Nørmark, Literate Programming - Issues and Problems

External links