The Novelang blog: June 2008

2008-06-30

Novelang-0.5.0 is there!

I just posted about this first release on a French mailing list gathering friends and former colleagues. I received valuable feedback and ideas.

Factory templates: come with nice PDF / HTML templates.
Split output: multiple HTML files from one single Book.
Write something like a roadmap somewhere (this is how this post looks like this, right?).
Links inside documents.
Point out differences with other wikis.

Nice factory templates mean ability to switch between stylesheets. By now the stylesheet is determined by the extension (MIME type) of requested document. Split output means adding some metadata:

On the stylesheet itself. Does it support split output?
On the document to be rendered: it's nice to know where it is located inside the list of all subdocuments for generating a navigation bar.

2008-06-26

How to log in on Sourceforge for website maintenance

SourceForge documentation is hard to find. So here is the link: http://sourceforge.net/docman/?group_id=1 And that's what I'm looking for most of time, for website maintenance:

ssh -l caillette shell.sf.net
cd /home/groups/n/no/novelang/htdocs

Aside of this it's nice that SourceForge supports OpenId. But it works in parallel with Unix account that must still be used for ssh sessions.

Novelang project first-rated on Google!

"Novelang" also appears to be a word of Tagalog, the principal language in Philippines. It seems to mean "novel".

2008-06-25

Now I'm working hard for a first public release. I don't expect many people to use Novelang because it lacks many features and is still subject to change, but "think big, start small". Here comes the question of versioning. APR versioningprovides a good conceptual framework for thinking about about to set version numbers:

Versions are denoted using a standard triplet of integers: MAJOR.MINOR.PATCH. The basic intent is that MAJOR versions are incompatible, large-scale upgrades of the API. MINOR versions retain source and binary compatibility with older minor versions, and changes in the PATCH level are perfectly compatible, forwards and backwards.

Novelang is not unlike a code library: you provide sources (documents) and I deliver software interpreting it. Specification of the software may change, including breaking backward compatibility (that I'll try to avoid at least because of my own Novelang documents). But wait, there is more. Since the Book feature is checked in, Novelang becomes a platform running arbitrary Java code that foreign developers may plug in. Same compatibility problems occur on a different dimension. How to reflect changes on those two dimensions? Should I mix them just telling if backward compatibility is maintained for both, as users are just users, whether they are developers or text writers? I don't need to find an answer by now but the solution may not be obvious so I have to start thinking on it by now. Do you know about other tools having the same problem?

2008-06-17

Book feature just started working!

Oh yes I just committed the Book feature and it looks good! A book file looks like this:

insert file:relative/path/part1.nlp

section
  This Section was defined from inside a Book file!

insert file:part2.nlp

This means: "fetch a pair of Part files and define some Section between them". After launching the HTTP Daemon you can see resulting document with that URL: http://localhost:8080/samples/book.html. First it attempts to load a samples/book.nlb file ("nlb" like NoveLang Book). If the file doesn't exist, it attempts to load its .nlp counterpart ("nlp" like NoveLang Part). If the .nlp file doesn't exit then the preview fails with an error message. The syntax for Book files is easy to expand, as functions like section or insert are not defined inside the grammar. Instead, they are Java classes that receive arguments and apply to a document tree. For Novelang, a Book is made of function calls. The general form of a function call is a function name, then a URL or an identifier or a paragraph body (all optional), then an unlimited number of arguments wich can be identifiers, or words preceded with a dollar sign ("$"). Here is the formal definition as an ANTLR grammar fragment:

functionCall
  : word 
     (   smallBreak url 
       | smallBreak headerIdentifier 
       | WHITESPACE? SOFTBREAK WHITESPACE? paragraphBody
     )?
    ( mediumBreak valuedArgument )*
  ; 

valuedArgument
  : ( PLUS_SIGN? blockIdentifier )
  | ( DOLLAR_SIGN word )
  ;

In order to make the grammar simple, there must be a line break between the function name and paragraph body. This constraint is ok as it makes the Book file more readable. A paragraph body is the same as for a Part: a sequence of words, punctuation signs, and blocks made of quotes, parenthesis, square brackets and interpolated clauses (all can be nested). A header identifier is a reference to some block of text defined elsewhere (more on this here). The dollar-hinted words indicate some other options. On the Java side, functions are defined by implementing the FunctionDefinition interface. This interface defines the contract for instantiating a FunctionCall which captures all the values for a given call (like file:part2.nlp parameter value in the example above). The FunctionRegistry knows FunctionDefinitions by their names. As a Book evaluates itself, it receives the result of the parsing of a Book file: basically an Abstract Syntax Tree with FUNCTION_CALL nodes. For each nodes it attempts to get a FunctionDefinition instance, then a FunctionCall object. The FunctionDefinition checks parameters sanity and consistency, for a fail-fast approach. The FunctionCall operate on an immutable tree-like structure called Treepath. A Treepath indicates the relative position of a Tree inside another "bigger" Tree. It solves the problem of changing the value of an immutable Tree (by creating an evolved copy) while original Tree is the child of some other immutable Tree. Immutable objects make easier to get safer and cleaner code. Before implementing more functions, I'll play a bit with the insert and section functions and investigate error cases like having a broken Part file.

2008-06-12

Images

My goal is to not introduce syntax constructs into Novelang for every need, but handling images with paired delimiters quickly becomes a mess. So image: syntax is fine but it collides with what a plain Paragraph can be made of. Let's twist it a bit using a double colon for now.

image::path/file.ext

As with URLs it seems more readable to force image declaration to start at the beginning of a line. The all-on-the-same-line requirement comes with the same advantages: it's easy to add metadata.

image::path/file.ext "title"

There are several ways to reference an image: the image itself for displaying it, or its name. Many document have textual references to images (like Fig. 5). Numbering must be automated, and text must reference images through a symbolic name. It's a clear need for identifiers. Defining identifiers works the same as for Paragraphs:

  \\image-identifier
image::path/file.ext "title"

Referencing the image is done with a new image-ref:: keyword, that should appear at the start of a new line, too:

Have a look at
image-ref::\\image-identifier
.

Ok I'm not happy with the full stop coming on its own line but I'm just trying to be honest so don't hit me. Depending on numbering scheme this should translate to something like:

Have a look at Figure 5-12.

Image display can happen through a reference, too:

image::\\image-identifier

With all that in mind, here is how to use images in big documents. Define one Part file or more with all images, then reference them as needed. The Part files containing images may define Sections with their own absolute identifiers corresponding to some themes or categories, and all images inside the Sections have a relative identifier. Another advantage of having all images in separate files is to get an overview for free.

More on identifiers

Since I've been blogging on identifiers something came to my mind. There are absolute identifiers (for Chapters and Sections) and relative ones (for paragraphs). The syntax should reflect this. What about :

  \\absolute-identifier

  \relative-identifier

The reverse solidus (well-known of Windows users) is convenient for expressing path-like structures, while not conflicting with the solidus used in URLs. A relative identifier requires an absolute identifier. For a Part like this:

== Section
  \\section-identifier

  \paragraph-identifier-1
Blah. 

  \paragraph-identifier-2
Blah blah. 

  \paragraph-identifier-3
Blah blah blah.

Here is a valid reference to the paragraph:

\\section-identifier\paragraph-identifier

The absolute identifier may be carried by the context. As described in the Book post, only some Paragraphs in a Section may be included so referencing the Section is a way to define such a context.

expand \\section-identifier
  \paragraph-identifier-1
  \paragraph-identifier-3

Ok the syntax has slightly evolved since but you get the idea. The reverse-solidus based notation looks good to me. Double path separator for an absolute path is a well-known pattern (URL spec). I like the idea to ask a bit more work for an absolute identifier as they should be used carefully, in order to avoid global namespace pollution. With that syntax we now support an uniform relationship model between Chapters, Sections and Paragraphs.

Unstyled inline litteral and collateral damages

I'm not satisfied with the need for escaping characters one by one in situations like an acronym. There should be whole blocks of text looking like normal text, but disabling most transformations. Could be like:

I want my ``T.L.A``.

The double grave accent makes the text inside quite readable but there is a consistency issue with the rest of the syntax. The blockquotes use double square brackets and the block litteral use a triple square brackets and it looks good.

To keep consistent, the unstyled inline litteral must use single grave accents and the code-like inline litteral use double grave accents ("plus one" rule).

So rewriting examples in the previous post gives this:

I want my `T.L.A.`
This is double slash ``//`` delimiter.

I'm starting to like it because unstyled inline litteral is just eye candy while codelike inline litteral has a stronger meaning that is better carried by thicker delimiter.

2008-06-09

Novelang syntax for Parts

I haven't documented the Novelang syntax yet but there are already plenty of things to change. WikiCreole's reasoning and Markdown give a great start for yet another discussion on Wiki markup.

In this document, character names refer to Unicode specification.

Headings

The chapter should be a double equals, and the section a triple one. There is a single character to know about and it's dedicated (asterisk are used by bold and unordered lists, see below). And it's eye-catching without the crippling effect of many asterisks.

== Chapter

=== Section

This makes the markup "scalable" in the sense it becomes easy to support a subsection level (though it may reflect that Parts are becoming too complex).

Identifiers and Tags may decorate Headers as they appear just below Header declaration (one linebreak away). Header identifiers are prefixed by an ampersand. Tags are prefixed by a commercial at.

== Chapter
  &identifier @tag

Paragraphs

Paragraphs are just lines of text. They are delimited from the rest by two linebreaks or more (aka hardbreak). They support identifiers and tags immediately above (one linebreak away no more). Paragraph identifiers are prefixed with a plus sign immediately followed by a commercial at, to indicate they don't work the same as Header identifiers, which are global.

This is one paragraph, continuing
on this line.

  +&identifier
This is another paragraph with an identifier.

Words are any sequence of letters and numbers. There can be a single dash between two letters or number. Apostrophe is a word delimiter.

C'mon, just a two-worded word!

There are some combinations which require character escaping, like acroynyms with dots. That looks messy but trying to turn this into a generic case seems to make things even worse.

I want my T~.~L~.~A (Three-Letter Acronym)!

Character escaping

I've been discussing character escaping and now I think that there should be no difference between single and multiple character escape, in order to avoid confusion.

Ampersand: ~&~
O and E ligatured: ~OE~ 
Tilde: ~tilde~

Backslash character was an option but I like the tilde character as it carries the meaning of something linked to the rest.

Inline litteral

Inline litteral requires a delimiter with reduced visual cripple, available on most keyboard, generating minimal conflict with casual use. The grave accent (backquote) is such a gem.

This is double slash `//` delimiter.

Tilde was a serious candidate but it has a better meaning for escaping, while the grave accent looks more like quotation.

Bold

Double asterisk looks good and is consistent with italics' double slash.

This is **bold**.

Subscript and superscript

A delimiter made of a single character is more concise than a double one. As it takes less visual space it reflects semantically weaker meaning. Circumflex means superscript and low line (underscore) means subscript.

L^A^T_E_X is expected to render as L^AT_EX.

Supporting subscript and superscript will be a mess because wether it is attached to a word or not does matter for the rendering.

Links

Often I got annoyed when copying an URL in the middle of the line. Now I'm reinventing a better world and I want to force the URL to appear at the beginning of the line.

Many Wikis have messy syntax for URLs / URIs because of related title and text. This can be avoided by some contextualization like the quotes immediately following an URL become the text to show. Same for the link title that could be a parenthesized block.

The URL here belongs to current paragraph:
http://novelang.sf.net "Go there" (Novelang home page)

Then HTML output is expected to look like this :

The URL here belongs to current paragraph: Go there

If the quoted text really should appear as quoted text then a line break cuts it away from the URL while keeping it inside the paragraph.

URLs are an easy case as its starts with a scheme ("http:" or "file:") but URIs are harder to handle. They are left out for the moment.

Lists

Unordered lists have items starting with an asterisk.

Ordered lists have items starting with a number sign.

Sublevels could repeat the list item sign but a level 2 unordered list item marker would clash with the bold marker. The trick is to use indentation.

* Item 1
  * Item 1.1
  * Item 1.2
* Item 2

And it goes the same for ordered lists. Some text editors recognize intentation and perform wrapping under the first indented character.

Blockquotes

A pair of angled brackets look fine for defining blockquotes. They must be on the start of the line and alone on the line where they appear.

<< 
This is a blockquote. 
>>

I try to avoid closing delimiter whenever possible, but the alternative approach, which is to use a special character at the beginning of a paragraph, would require to edit each paragraph when pasting foreign text.

Litteral

Litteral is text appearing the same as in the Novelang markup. It appears inside triple angled brackets, opening and closing brackets must be on the very beginning of the line and their trailing space is not rendered. To render triple angled brackets at the beginning of a line, character escaping is required. But such combination is quite rare and shouldn't be a hassle.

<<< 
This is litteral, 
  preserving indentation.
>>>

The need of a closing delimiter is a no-brainer here.

Tables

As the Book feature supports including other file's content there should be a function to read a CSV or whatever and display it nicely. So we don't pollute the markup with a feature bloating other wiki's syntax with more and more complex style stuff.

Features I'm happy with

Interpolated clauses must be more than a special character like —. They must be declared as blocks (with opening and closing), so rendering can insert non-breakable spaces after opening dash and before closing dash. And the closing may be hinted to be not-renderable.

Interpolated clause delimiter is double dash. A dash then a low line define a non-renderable ("silent") closing.

Interpolated clauses -- like this one -- do rock.

Silent ends rock, too -- yeah-_.

Parenthesis, brackets and quotes are blocks, too, using conventional character.

(parenthesis) [brackets] "quotes"

Italics use double slash delimiter. A double character looks "big" so it refrains from overuse.
This is //italics//.

Punctuation signs come with no surprise.
Question mark ? Exclamation mark ! Colon : Semicolon; Comma, Ellipsis... Full stop.

Comments. Because of italics, the double slash made popular by Java and C++ is not an option. In order to reduce confusion, corresponding slash-asterisk combination cannot be used because for most people, they are both parts of the same set of conventions.
So line comments starts with a double percent sign, and block comments are delimited with double accolades.
%% Single-line comment.

{{ Block 
   comment }}

Single accolades may have a special meaning but they are free for now.