The Novelang blog: July 2008

2008-07-31

Novelang-0.8.0 released!

Latest release of Novelang can be downloaded here. Coolest features of this release:

Directory listing.
New --port option for HttpDaemon.

See release notes (in the "Status" chapter) for details.

2008-07-29

Roadmap

Priorities are:

Bug fixing.
Documentation.
Error handling.
New features.

Stability is the key feature for adoption. Novelang will grow slowly, and won't advertise a lot until all features I think necessary are present. I don't want to harass my testers with bugs or missing features I already know about. Improvements on short term:

Report location on every error.
Try to recover on unmatched delimiter (like missing closing parenthesis).
Document some tricks.

Requirements for 1.0:

Better URLs: inside paragraphs, alt and text properties.
Fix potential punctuation problems.
Lists, ordered and unordered.
Images. May turn to an infinite feature list -- be careful!
Accolades and angled brackets (used for footnotes and index entries).
Identifiers. These are needed for generating table of content.
Bold, small caps, superscript, subscript, a few levels of headers below section.
"Beautiful" PDF generation with a look inspired from Manning's books, table of content, index and so on.

After release 1.0 there are two different paths to follow in parallel (will depend on feedback I suppose):

Improve content generation.
Open Novelang to other developers, as an embeddable / extensible software component.

Content generation improvements include:

Identifier-based inclusions.
Multi-document output (useful for generating web sites with several pages).
Resource scan for automatic copy in batch mode.
Some optimizations for speed / memory consumption.

Componentizing Novelang means:

Remove dependency to Jetty and rely on pure Servlet API.
Pluggable tree manipulation functions. By now such a mechanism is used internally but it deserves to get open. Would require some Generics to support custom Environment class.
Extensible grammars. Thanks to ANTLR 3.1 it will be possible to write a grammar reusing parts of an existing one. So developers could writer their own additions to Novelang's standard grammar, while ANTLR performs all consistency checks. By the way, making a grammar evolve is not a quiet game.
Extensible grammars mean redefining token list.
Component weaving with Guice. Guice is the coolest way to assemble components which are, basicall, functions.
Configurable escape codes, and whitespace triggers.

Novelang as component can be advertised through some plugin for a tool like Maven or Eclipse.

2008-07-28

Directory listing

I just finished the directory listing feature and it seems terribly addictive. Let's say you started Novelang HTTP Daemon from $NOVELANG_HOME. The sample directory is full of samples. Given a URL like http://localhost:8080/samples, your browser displays a page listing all Novelang documents, including those in subdirectories.

..
samples/
 samples/book.html
 samples/broken.html
samples/scanned/
 samples/scanned/book.html
 samples/scanned/file1.html
 samples/scanned/file2.html
samples/scanned/sub/
 samples/scanned/sub/file3.html
samples/showcase/
 samples/showcase/showcase.html
 samples/simple-structure.html
 samples/unicode-1.html

All lines are links to subdirectories and documents. There is also a link to the parent directory, while it's not a parent of the content root itself (for security reasons). For a consistent URL scheme, a directory listing ends with "/". In the example above, the browser is forwarded to http://localhost:8080/samples/ (note the trailing solidus). There is another trick required by Safari. Safari doesn't take the MIME type of the document in account, just the resource extension. No matter how loud you say "it's HTML, stupid" it tries to download the file instead of displaying the page. So Safari is handled as a special case which is redirected to a URL like http://localhost:8080/samples/-.html. Yeah, it sucks. I chose the "-" name because it's not a valid filename so it won't conflict with document sources (it's perfectly legal to have a "index.nlp" file). There are many possible improvements:

Show directories containing no Novelang documents in a dimmed color (not showing them at all could be confusing).
Add a link to every supported format (first, PDF).
Add breadcrumbs like / > samples > served
Add some metadata like number of files and the date of the last modification.
Display files in the same directory on several columns.

2008-07-26

Novelang-0.7.0 released!

Version 0.7.0 is hot! You can download it here. It comes with a complete redesign of literal and character escaping.

Literal blocks are still here, much improved as they support any character on the inside.
Hard inline literal, corresponding to "technical" text inside plain text, like code citation. Renderers will use monospace font. Every character will appear as it is.
Soft inline literal works the same way hard inline literal does, but it should not be rendered in a different manner than casual text. Soft inline literal is a convenient answer for supporting almost any character and disabling standard formatting that occurs with punctuation, while avoiding conflict with other style delimiters.
Character escape is a last-resort option for displaying characters used as delimiters for one of the literal forms described above.

There were previous posts describing how this should work. During the implementation, there were minor adjustments so refer to the documentation.

2008-07-24

Character escaping

I just fixed a few bugs, now literal form supports nested less-than / greater-than signs, except if there are three greater-than signs in sequence at the beginning of a line. Very sweet (at least for Novelang documentation) to make this a correct literal block (starting with '<<<' and ending with '>>>', both on the beginning of the line):

<<<
<<<
 >>>
>> >
>>>

This dramatically reduces the need for character escaping. Of course there is always a weird language to quote with three greater-than signs at the beginning of a line. And there may be other weird characters in a non-supported encoding. So we're hitting character escaping problem again. In the refactoring-characterescape branch I already pushed new character escaping based on the tilde '~' character but having a non-symmetrical delimiter makes the document source much less readable. Of course this is because I'm using character escaping as a workaround, until I implement better literal. But that unreadable stuff is like a warning that tilde character is inappropriate. And I realize that it's commonly used in programming languages, so it should be escaped in literal. Gets tedious when you copy-paste from your favorite programming language. As a Mac user I'm a bit stuck to their keyboard layout but I think that left and right pointing double angle quotation marks (don't laugh, it's official Unicode name) is ok. Instead of this:

~escapecode~

I'm about to switch to this:

«escapecode»

The interest is obvious when there are several escaped character to juxtapose:

«escape1»«escape2»«escape3»

is better than

 ~escape1~~escape2~~escape3~

On a Mac AZERTY keyboard the two characters are obtained with Alt-7 and Shift-Alt-7. There must be something similar on other platforms (Windows, QWERTY). Anyways this doesn't have to be used often so it's ok to use a weird character that doesn't appear in common text or programming language. It would be then possible to document Novelang correctly by giving a sample of literal like this:

<<<
<<<
Some literal here.
«greaterthan»«greaterthan»«greaterthan»
>>>

Or even like this:

<<<
Escape character like this: «lpdaqm»escapecode«rpdaqm».
>>>

Of course lpdaqm and rpdaqm stand for "left (respectively right) pointing double angle quotation mark". I prefer to avoid acronyms but this name is really too crazy.

2008-07-14

Impressive XSL-FO resource

I was looking for how to make appear the name of current chapter in a PDF header. This is called "running header". Found Dave Pawson's site on XSLT, DocBook, and Braille. The FO section contains very serious stuff pretty above all other tutorials! The running header requires no trick. It's a standard FO feature: define a marker corresponding to current chapter title / whatever (fo:marker) and retrieve it from the header definition (fo:retrieve-*).

Novelang-0.6.0 is there!

Now you can do all sorts of amazing things with stylesheets, as explained in the documentation. There is also a nicer default stylesheet for PDF. Check out PDF version of Novelang documentation!

2008-07-07

New feature: selectable stylesheets

I just checked into GitHub the code for selectable stylesheets. Until now, a Novelang project could define its own stylesheets, using custom stylesheets. While Novelang can render PDF and HTML with its own, built-in stylesheets, every user probably needs to define his-her own ones. When rendering a document, Novelang attempts to find appropriate stylesheet:

In the directory given by novelang.stylesheet.dir system property, if defined.
In a style directory under the directory from which Novelang was launched (corresponding to user.dir).
Inside Novelang-x.x.x.jar under the /style directory.

"Appropriate stylesheet" means a stylesheet corresponding to the MIME type of requested document: pdf.xsl for a PDF document, html.xsl for a HTML document. That was not flexible enough because the same document of the same MIME type may deserve multiple renderings, like "miser printing", "visually impaired" and "tree-killer". That's where selectable stylesheets come to the rescue. With selectable stylesheets, you give the name of the stylesheet to use. This can be done at query level, or at book level. Let's say this is your project layout, with two stylesheets under the style directory:

/
  book.nlb
  chapter-1.nlp
  chapter-2.nlp
  style/
    html-quick.xsl
    html-beautiful.xsl
    pdf-beautiful.xsl

After launching Novelang HTTP daemon, you can use the stylesheet query parameter to override any other stylesheet name:

http://localhost:8080/chapter-1.html?stylesheet=html-beautiful.xsl

Please note the html-beautiful.xsl path is still relative to the directory containing custom stylesheets! Another place to set stylesheet names is the Book file. Since a Book doesn't know how it will be rendered, you can define a stylesheet for multiple document MIME types. The book.nlb would look like this:

mapstylesheets 
    $html=html-beautiful.xsl
    $pdf=pdf-beautiful.xsl

insert file:chapter-1.nlp

insert file:chapter-2.nlp

I've not tested subdirectories yet but they are supposed to work. Keep in mind: they will be relative to the directory containing your stylesheets. Supporting multiple stylesheets is a necessary step before provinding nice built-in stylesheets to be tried with documents of your own.

2008-07-04

Links on typography and characters

Wikipedia Punctuation: links to punctuation, plus interword separation, general typography, uncommon typography. Ordinal indicator Superscript Superior letter Others XML character entities Unicode character search

Some ideas for Novelang syntax extensions

Just some ideas here and here.

Here is some text ++- striked out -++.

Oh, yeah, looks like the ''++'' radix is powerful as it cleanly expresses strike. It can be composed with other characters, and supports symmetrical delimiters! I like the plus sign for designating the strike family, as it is a vertical line (figuring some character) with an horizontal strike.

Here we get ++= double strike =++.
Here we get ++$ highlight one $++.
Here we get ++£ highlight two £++.
This is a sample of ++/ oblique strike /++.

The same approach fits for underline. The _ (low-line) character is fine for underline (that was its purpose on mechanical typewriters).

Same for __- underlining -__.
So we have double __= underlining =__.
So we have waved __~ underlining ~__.

Novelang syntax already uses a low line for the "silent end" of interpolated clauses. In order to keep a strong meaning to the low line maybe we should revisit the silent end.

New silent end for interpolated clauses -- like this -<.

And now I've found what double circumflex accent is good for: small caps. Small caps sometimes carry a strong meaning, like for quoting a shouting person, or for names for which case matters (like Charles De Gaulle).

And this is for ^^ Small Caps ^^.

Talking about circumflex I was thinking about superscript those days and according to my researches superscript text is always the last part of a compound text, before a space or a punctuation sign.

Like: 2^nd.

Subscript will work the same way with a single underscore. Ayways I'll have to think a lot about all of this. I must take care of not losing the focus on content-oriented text and not invent a messy markup.

2008-07-01

Encoding(s)

Attempting to run Novelang-0.5.0 the way the documentation said (java -jar Novelang-0.5.0.jar), I discovered that some characters (especially those with accents) were not rendered as they should. As I'm working on MacOSX I shouldn't have been surprised when reading the system properties dump:

  file.encoding = MacRoman

I couldn't see the mismatch under my development environment, as it was "kindly" (and rather stealthily) forcing the file.encoding system property of a new process to encoding I did set for editing my files. If you, happy early adopter, hit such a problem and were to shy to post on Novelang User list, you've got to try this:

  java -Dfile.encoding=ISO-8859-1 -jar Novelang-0.5.0.jar

By now Novelang expects all files to be in ISO-8859-1 (aka ISO Latin 1). This is defined as a constant somewhere, and passed through method calls. I thought this would make Novelang should be insensitive to the file.encoding system property but obviously I missed something. No doubt I'll find what sooner or later, but this lead me to more interesting reflexions. File encoding must be known to convert the 8-bit characters from a file to Java's internal 16-bit Unicode characters. Novelang grammar defines very precisely the characters it accepts: a subset of ISO-8859-1. But it doesn't mean the document source file has to be encoded this way! The "é" character also exists in MacRoman encoding, so file.encoding property must be taken for what it is: just a hint to read Unicode characters from a stripped-down format. This lead me to the following conclusions:

Novelang grammar defines every supported character, no matter which encoding as they are defined in Unicode. This guarantees a lot of fun with Greeks, Germans, Russians...
If one encoding for all files is the option then the file.encoding system property provides the simplest approach. This is current option (because of a bug preventing ISO-8859-1 to be forced as default).
Per-file encoding would be a must. A Book function would be ok, telling "from now on, read Part files with encoding xxx". A HTTP query parameter could provide such a hint when previewing Parts with special encodings in a Web browser.

I strongly recommand this excellent paper about character encoding and Unicode, by Joel Spolsky: