2008-10-19

General-purpose text processing library

Using Novelang to produce real-word documents (I mean: more than Novelang documentation), I discovered how it is convenient for producing custom idioms without touching the main grammar. I mean: Novelang syntax supports well-known artefacts like quotes, parenthesis, square brackets, punctuation signs, chapter headers, and so on. The text gets abstracted into a tree-like structure which is processed by a stylesheet that may be a custom one. The default stylesheet recognizes the "bracketed" item of the structure and outputs brackets around the text inside the "bracketed" tree fragment and everything looks fine. Now consider the case where:
  1. Your text doesn't need square brackets.
  2. You need to express something else, like a special name with special typographical effect.
Quickly, you start attributing a new effect to the square brackets. Because it corresponds to a new meaning, you just started building your own semantic markup. And, let me say it again: without touching the main grammar. It's even possible to assign different semantics to different parts of a document. From a Book you can tag an inserted Part with a special style:
insert file:mybibliography.nlp
  $style=bibliography
Then the content of the Part has a style element containing the "bibliography" string. So the stylesheet may use a special template to process entries like this, where italics inside a section don't mean it's italics, but the text to sort the author list on:
=== Paul //Graham//

On Lisp [Prentice Hall]

=== Allen //Holub//

Taming Java Threads [APress]
That's incredibly lightweight compared to semantic markups like DocBook's one. The magic only comes from:
  • The choice to avoid too-specific markup whenever possible.
  • The choice of a distinct presentation layer.
With this in mind I see a chance to turn parts of Novelang to a general-purpose text processing library, with pluggable presentation layer.

No comments: