2008-05-17

The Book feature

I'm thinking on the new Book feature: a way to define how to fetch different parts from here and here and assemble them in something called a Book. Splitting down the work in small files is more convenient for version tracking, collaborative work, more accurate search and so on. A Book is a separate file with its own syntax. It's a perfect placeholder to hack the Parts in many ways, without polluting their structure. Which kind of hacking? Parts are parsed into an AST (Abstract Syntax Tree) where Chapters, Sections and Words are well-identified nodes. Nodes can be added and removed, then be handled at the rendering stage. Such a hack is word counting: a function walks recursively through the whole tree, counting occurences of the WORD node, then adds a METADATA node right under the root. Another hack is defining scoped styles (for one given Chapter or Section). So different pieces of text will look different, while keeping underlying Part files clean and simple. In fact, the Book file holds the internal "logic" of the final document, while Part files hold simple content. That's how I like to think about Novelang: a platform to parse, hack, then render trees representing text documents. Sure, that's sounds much like Cocoon which does the same thing using XML. But I already played with Cocoon and its generic approach adds a lot of burden to things that should remain simple, like picking parts of different files with the XPath syntax. Of course AST hacking should be open to any developer, with a syntax for embedding custom functions while keeping the Book syntax unchanged. That's an additional constraint when defining Book syntax. First, I'd like to insert a file as it is, e.g. insert its AST inplace. For doing this I introduce the :: symbol (double colon) announcing a function name. The ::insert function takes a single parameter : an URL.
::insert file://foo.nlp
I can also include several files at once. As insertion order may not be determinate, I have to think on passing a parameter here, or add a global sort option, but I won't solve all the problems now. So far we have this:
::insert file://*.nlp
Now I want to pick some elements inside a Part. Chapters and Sections support identifiers. This is the way to say "load this file and get aware of all Chapters and Sections which have an identifier":
::import file://foo.nlp
Of course wildcard is supported:
::import file://*.nlp
Now I create a Chapter which doesn't exist in any Part. As with Part syntax, a Chapter title may have punctuation signs, parenthesis and so on:
*** Chapter title
I want to change the style for this very Chapter, so I use another function which inserts a STYLE node in the AST tree. The style identifier must be a well-formed Word. The ::style function is understood as relative to the previous Chapter defition.
::style define-chapter-specific-style
I can include a Section the same way :
=== Section title
Now I add a Section which identifier is "Some chapter or section without its title." (ending dot included). There should be one and only one Section / Chapter with such identifier in all imported Parts.
::add Some chapter or section without its title.
If some Part element has a title, I may want to preserve it. Note the introduction of a function parameter (colon prefix):
::add :withtitle Some section
Now the tricky thing: add Paragraphs. This implies finding a way to identify paragraphs inside a Section, but it's about Part file syntax and won't be discussed here, let's just pretend it works. I must reference the Section (like above, except that :withtitle parameter is illegal), then I add some valued parameters for each Paragraph.
::add Some Section
 :p  Some Paragraph  
 :p+ Some other      
I'm proud of this syntax because it is non-ambiguous, while Section and Paragraph identifiers may contain an unknown amount of Words (as identifiers are Paragraphs). The AST looks like this:
 (MACRO
   (MACRO_NAME (WORD add))
   (PARAGRAPH (WORD Some) (WORD Chapter) (WORD or)  (WORD Section))
   (MACRO_PARAMETER p (PARAGRAPH (WORD paragraph-id1)))
   (MACRO_PARAMETER p+ (PARAGRAPH (WORD paragraph-id1)))
 )
This will add some burden to the function code which has to interpret many things but the goal was to provide a clean syntax with no delimiter like a closing brace to mark the end of a list. In a general manner, the Book syntax relies on the idea to declare a tree of a known maximum depth, with well-known relationships between nodes (a Books contains Chapter containing Sections containing Paragraphs; and Functions may have parameters). So it's always clear which node a Function invocation refers to, and this avoids things like closing braces.

No comments: