As a background task I'm thinking on how to improve the Book definition. By now, Novelang books are a sequence of imperative tasks:
insert file:the-preamble.nlp
$style=preamble
insert file:.
$recurse
$createchapter
mapstylesheets
$pdf=my-pdf-stylesheet.xsl
Apart of the
mapstylesheets
command,
insert
command is all about inserting trees in the main document. Default behavior is to create one tree from a Part file.
The
$recurse
command creates many trees (one per Part file) from given directory.
The
$style
command creates a
style
node right under the root of the inserted tree.
One of the planned features of Novelang is to build Books from
identifiers, which are subtrees.
As it's all manipulating trees one can note that XSLT transforming a Book tree into a PDF or a HTML document are all about manipulating trees, too, but they act at a different stage: once the content of the document is well-defined. And XSLT work as producing a changed version of one input tree, they don't build a tree from multiple sources.
As XSLT make a good work, I googled on "tree manipulation language" to see if there is something useful here, at least to take inspiration from.
TXL
I found
TXL, which seems backed by serious research. Unfortunately it doesn't come an an embeddable Java library.
TXL scripts define an input grammar for building up the tree, then rules for creating / adding / moving / deleting nodes. It looks sweet for experimenting with languages.
Tregex and Tsurgeon
Tregex is a regex-like language for extracting nodes from a tree. Tsurgeon is Tregex extension for manipulating the trees extracted by Tregex.
The Tregex language itself looks good. Given a tree like this:
NP
/ | \
NP PP PP
The following Tregex expression means something like "Call 'n' the node that has a 'NP' node as a parent and with a sibling which has a 'PP' node as a sibling while this last node should be called 'pp2' by the way." :
NP=np < (NP $+ (PP $+ PP=pp2))
Then here is a Tsurgeon expression :
adjoin (NP=new_np NP@) np
move pp2 >- new_np
Applying the Trex and Tsurgeon expressions on the tree above give a new tree like this:
NP
/ \
NP PP
/ \
NP PP
Tregex and Tsurgeon are bundled in a Java library. I don't like their design of the
Tree
class I don't discuss the fact their
Tree
class should be mutable, at least because this may save memory with some algorithms. But the
Tree
class is a concrete class declaring more than one hundred of public methods. Most of them could have been part of a utility class. You are dishonestly invited to compare to
Novelang's Tree definition!
Conclusion
As far as I see, there is much work done on tree manipulation languages. While they enable to do anything on a given tree, they are a very special jargon that won't help non-geeks to create Novelang Book files. So I should find other areas to investigate if I want new ideas in this area.
No comments:
Post a Comment