I started working on a refactoring of the Novelang grammar and it's a big job. By the way I switched to ANTLR-3.1.1, the latest version of
ANTLR. The development occurs in the
ANTLR-3.1.1 branch. ANTLR is the greatest tool for generating parsers. With version 3.1, it supports grammar imports, which means a complex grammar can split in smaller files. With careful design it would be possible to let third-party developers extend Novelang grammar, as ANTLR supports rule overriding. Alas, ANTLR-3.1.1 doesn't work well with multiple import levels so I'm keeping one huge grammar file for now. You can have a look at
current Novelang grammar (master branch). For the end-user, the biggest feature brought by this refactoring is support for "monoline" text items. Basically this is for stuff delimited by a pair of line breaks and that may stand in the middle of a paragraph. By now, Novelang only recognizes URLs when delimited by two pairs of line breaks.
(This is some paragraph before the URL.)
http://novelang.sourceforge.net
(This is some paragraph after the URL.)
Recogninizing a URL as "monoline" text item would allow something like this:
This is a paragraph.
http://with-url-inside.com
...Same paragraph, continued.
That's a lot more natural. The URL still
must start at the start of the line, because it's much easier to copy from the text editor. I previously discussed
URL syntax here. Coding Horror blog has a
nice post that should deter anyone to include URLs in plain text with no machine-understandable delimiter. The full-blown URL syntax supports URL decorations like this:
Go to
"Novelang website"
[Novelang website on Sourceforge.net]
http://novelang.sourceforge.net
and see all useful links.
The quoted and bracketed text blocks are optional and provide display text and alternate text. I can see no reasonable way to support them at grammar level. The best way to handle them is at tree-mangling stage (reordering the Abstract Syntax Tree generated by the parser). This means, the parser-generated AST should include nodes describing whitespace and line breaks. Support for monoline items is helpful (necessary?) for supporting lists. As
previously discussed, here is how I want to write a list:
Here is a list on two levels:
* First item
* Second item
* First subitem
* Second subitem
* Third item
...And the paragraph continues here.
As for URL decoration, the grouping of list items is made at tree-mangling stage. Because identation matters, whitespaces in AST should tell how big they are. A list which can appear inside a paragraph will be called a
small list. There is the need for another kind of list where items are paragraphs, to be called
big list. The symbols for designating list items ("
*
", "
#
", "
-
", "
---
",...) are left to another discussion.
No comments:
Post a Comment