Monday banner

Literate Programming

Literate programming is a philosophy of computer programming pioneered by Donald Knuth. The main idea of literate programming is that computer programs should be written to be read by people, and not just by the computer. Programs should be works of literature, intended to be understood and enjoyed by others.

Most computer languages allow commentary to be included in programs, but with literate programming the commentary is the focus. For this reason:

  • Literate programming systems generally try to provide high-quality formatted output. Commentary is usually expressed using a document markup language such as TeX/LaTeX, HTML, or SGML/XML. Facilities for including mathematics or explanatory diagrams are often provided.
  • In a literate program, the program text is embedded in the commentary, and not the other way around. Some systems “pretty-print” the program text, while others maintain the original formatting (or lack of it).
  • Literate programming systems allow the programmer to order the program in a way that makes sense for the reader. This may not be the same order that makes sense to the compiler, however, so a literate programming system lets the programmer leave “placeholders” for fragments of the program that are explained elsewhere in the text.

The two main things you may want to do to a literate program are tangling and weaving. (Knuth's original system was called WEB; these terms are a pun on “What a tangled web we weave...”.) Tangling takes the literate source and extracts the program code, replacing placeholders with the source code they refer to. Tangled output can then be fed to the compiler and linked to build a library, component, or executable.

Weaving takes the same program and prepares it for typesetting or online display. Depending on the system, this may involve:

  • Pretty-printing the program
  • Generating cross-references for program functions and variables, and for placeholders
  • Converting the markup used by the literate programming system into some sort of output formatting markup.

Literate programming requires a bit more discipline than “seat-of-the-pants” programming, and requires a time investment to learn the literate programming system, but the potential rewards are great. Explaining the program to others ensures that the programmer is clear in his own mind about how the program works, contributing to the quality of the result. The knowledge that other people will probably be reading the program and the joy of producing a “work of art” discourage shoddy work and make programming more fun.

Monday's System

The Monday literate programming system is based around a fairly simple XML architecture. This architecture can be embedded any of several different document architectures using their extension mechanisms. Currently the TEI P4 DTD is used for marking up commentary in the existing Monday libraries and tools, but other architectures such as DocBook would be equally suitable. (In fact, a DocBook customization is planned in the near future.)

In the Monday literate programming system, a single document corresponds to a library or executable program/application. This was inspired by the fact that the Dylan compilers compile programs one complete library at a time, though it is also a good fit for many other programming languages. In particular, most libraries are organized around a single “theme”, suitable for exposition in a single article-sized document.

A library contains one or more modules. This corresponds exactly to Dylan language program organization; in C, C++ or Java a module generally corresponds to a single source file. Modules consist of program source code fragments and placeholders for references to sections, which are smaller fragments of source code. Sections can, in turn, contain placeholders referencing other sections.

It is simple to extract (tangle) such a structure into source code files. For portability and simplicity, the Monday literate programming system currently uses a short XSLT script to do the job.

Additional XSLT transforms are used to weave literate source programs into documents for printing or online viewing. For printing, documents are normally transformed to XSL (formatting objects) for printing using FOP or another XSL formatter. For online viewing, a transform to HTML will be used. Currently, no program indexing or cross-referencing is done, though hopefully integration with the compiler can provide this in the future.

The Monday build tool mmk is designed to provide specialized build support for literate programs. It uses a completely declarative project description (as opposed to make's partly declarative, partly procedural description), improving build parallelism and making it possible to use project descriptions for many different purposes.

Other Systems

  • Knuth's original WEB system (designed for Pascal) has been succeeded by CWEB (adapted for C and C++).
  • The most popular programming language-independent system is Norman Ramsey's noweb.
  • The xmLP system is another XML-based literate programming system. The author of xmLP is the moderator of the (low-traffic) xml-litprog-l mailing list.
  • Norman Walsh has designed his own DocBook-based literate programming system.
  • Many other literate programming resources are available at