2008-12-05

Tree manipulation languages

As a background task I'm thinking on how to improve the Book definition. By now, Novelang books are a sequence of imperative tasks:
insert file:the-preamble.nlp 
  $style=preamble

insert file:. 
  $recurse 
  $createchapter

mapstylesheets 
    $pdf=my-pdf-stylesheet.xsl
Apart of the mapstylesheets command, insert command is all about inserting trees in the main document. Default behavior is to create one tree from a Part file. The $recurse command creates many trees (one per Part file) from given directory. The $style command creates a style node right under the root of the inserted tree. One of the planned features of Novelang is to build Books from identifiers, which are subtrees. As it's all manipulating trees one can note that XSLT transforming a Book tree into a PDF or a HTML document are all about manipulating trees, too, but they act at a different stage: once the content of the document is well-defined. And XSLT work as producing a changed version of one input tree, they don't build a tree from multiple sources. As XSLT make a good work, I googled on "tree manipulation language" to see if there is something useful here, at least to take inspiration from. TXL I found TXL, which seems backed by serious research. Unfortunately it doesn't come an an embeddable Java library. TXL scripts define an input grammar for building up the tree, then rules for creating / adding / moving / deleting nodes. It looks sweet for experimenting with languages. Tregex and Tsurgeon Tregex is a regex-like language for extracting nodes from a tree. Tsurgeon is Tregex extension for manipulating the trees extracted by Tregex. The Tregex language itself looks good. Given a tree like this:
    NP
  / |  \
NP  PP  PP
The following Tregex expression means something like "Call 'n' the node that has a 'NP' node as a parent and with a sibling which has a 'PP' node as a sibling while this last node should be called 'pp2' by the way." :
NP=np < (NP $+ (PP $+ PP=pp2))
Then here is a Tsurgeon expression :
adjoin (NP=new_np NP@) np
move pp2 >- new_np
Applying the Trex and Tsurgeon expressions on the tree above give a new tree like this:
      NP
     /  \  
   NP    PP
  / \ 
NP   PP
Tregex and Tsurgeon are bundled in a Java library. I don't like their design of the Tree class I don't discuss the fact their Tree class should be mutable, at least because this may save memory with some algorithms. But the Tree class is a concrete class declaring more than one hundred of public methods. Most of them could have been part of a utility class. You are dishonestly invited to compare to Novelang's Tree definition! Conclusion As far as I see, there is much work done on tree manipulation languages. While they enable to do anything on a given tree, they are a very special jargon that won't help non-geeks to create Novelang Book files. So I should find other areas to investigate if I want new ideas in this area.

No comments: