The Novelang blog: November 2010

2010-11-19

Rule-based number spelling

Novelang comes with a Numbering class which formats an integer value in words. This adds a bit of magic when the stylesheet writes "Chapter fourty-two" from a stupid counter.

Currently the Numbering class only supports French and English, and values from 0 to 50 (all values are hardcoded). The ICU project offers the RuleBasedNumberFormat which supports rule-based formatting. This makes easy to support much greater ranges.

2010-11-17

XSL mockup for multipage rendering

Here is how an XSL would render a multipage document.

First, let’s consider the whole document defining the opus:

== One

Some text of level one.

== Two

Some text of level two.

=== Two-one

Some text of level two-one.

The XML form of the document above is:

<?xml version="1.0" encoding="UTF-8" ?>
<opus>
  <level>
    <title>One</title>
    <paragraph>Some text of level one.</paragraph>
  </level>
  <level>
    <title>Two</title>
    <paragraph>Some text of level two.</paragraph>
    <level>
      <title>Two-one</title>
      <paragraph>Some text of level two-one.</paragraph>
    </level>
  </level>
</opus>

Let’s take for granted that Novelang supports XSL metadata. Our multipage-enabled stylesheet would define an embedded stylesheet that transforms a whole opus into a simple map of page names and page paths. A path is whatever the stylesheet may reprocess, but an XPath expression is quite good. For the document above, here is how our map could look like, if we want to support 2 levels:

page1 -> /opus/level[1]
page2 -> /opus/level[2]
page3 -> /opus/level[2]/level[1]

Please note that, at this point, the decisision to support a given depth, or exclude some tagged levels, entirely belongs to the page-extracting stylesheet.

By merging the page map with the opus, we get the XML input for the rendering of one page. Novelang knows which page it is either because it is iterating over all known pages of the map (batch mode), or because the page name is a part of the request issued to the HTTP dæmon.

<op>us>
  <meta>
    <page>
      <name>page2</name>
      <path>/opus/level[2]</path>
    </page>
  </meta>

  <level>
    <title>One</title>
    <paragraph>Some text of level one.</paragraph>
  </level>
  <level>
    <title>Two</title>
    <paragraph>Some text of level two.</paragraph>
    <level>
      <title>Two-one</title>
      <paragraph>Some text of level two-one.</paragraph>
    </level>
  </level>
</opus>

(Note: the n: namespace prefix doesn’t appear here for brevity.)

The stylesheet gets this whole document as input for every page. All what changes is the name, path pair in the meta/page element. The stylesheet needs to know which page it is rendering, and the whole document tree as well, in order to create a navigation bar or any kind of header or footer corresponding to a specially-titled or tagged level of the document.

This involves some XSL trickery: evaluating an XPath expression at runtime. While it’s not part of XPath 1.0 specification, it is a part of semi-official EXSLT communitiy initiative. The dyn:evaluate http://www.exslt.org/dyn/functions/evaluate does that for us. It works well with Xalan-2.7.1 which is the XSLT engine bundled with Novelang (it works a slightly better than JDK’s one).

In the stylesheet below, we save useful expressions into variables.

The root template prints those variables, then a pseudo-navigation bar made of nested lists.

The nested loop for iterating over level elements is rather ugly but it makes sense as we don’t want infinite deph of titles in a navigation bar.

The title-with-locator template just adds bold on the title in the navigation bar that corresponds to current page.

All other templates mimic Novelang’s standard rendering.

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dyn="http://exslt.org/dynamic"
    extension-element-prefixes="dyn"
>
  <!-- Be sure to use Xalan-2.7.1 (not JDK's default). -->

  <!--
    Here, expect a meta section, embedding a stylesheet 
    that extracts the pages we'll find in the meta section 
    of input document.
  -->

  <xsl:output method="html" />

  <xsl:variable name="page-name" select="/opus/meta/page/name" />
  <xsl:variable name="page-path" select="/opus/meta/page/path" />
  <xsl:variable name="page-nodeset" 
      select="dyn:evaluate( $page-path )" />
  <xsl:variable name="page-id" 
      select="generate-id( $page-nodeset )" />

  <xsl:template match="meta/page" >
    $page-name=<xsl:value-of select="$page-name" />
    $page-path=<xsl:value-of select="$page-path" />
    $page-id=<xsl:value-of select="$page-id" />
  </xsl:template>

  <xsl:template match="/opus" >
    <html>
      <xsl:apply-templates select="meta" />

      <!-- Navigation bar -->
      <ul>
        <xsl:for-each select="level">
          <li>
            <xsl:call-template name="title-with-locator"/>
          </li>
          <xsl:if test="level">
            <ul>
              <xsl:for-each select="level">
                <li>
                  <xsl:call-template name="title-with-locator"/>
                </li>
              </xsl:for-each>
            </ul>
          </xsl:if>
        </xsl:for-each>
      </ul>

      <!-- Document body, same templates as usual -->
      <xsl:apply-templates select="$page-nodeset" />

    </html>

  </xsl:template>


  <xsl:template match="paragraph" >
    <p>
      <xsl:value-of select="." />
    </p>
  </xsl:template>

  <xsl:template match="title" />

  <xsl:template match="level" >
    <h2><xsl:value-of select="title" /></h2>
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="level/level" >
    <h3><xsl:value-of select="title" /></h3>
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template name="title-with-locator" >
    <xsl:text>
    </xsl:text>
    <xsl:choose>
      <xsl:when test="generate-id( . ) = $page-id" >
        <b><xsl:call-template name="title-alone" /></b>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="title-alone" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="title-alone" >
    Title: <xsl:value-of select="title" />
  </xsl:template>

</xsl:stylesheet>

Finally, this is how the rendering looks like:

2010-11-13

Novelang-0.52.0 released!

Just released Novelang-0.52.0!

Summary of changes:

Added n:block-inside-asterisk-pairs. Default stylesheet render it as bold.

Download it from here.

Enjoy!

Grammar pattern: twin delimiters

This post describes a tricky point of Novelang’s grammar design: how to handle twin delimiters like // in a non-ambiguous manner for an ANTLR grammar. It’s a useful refresh before adding long-awaited ** (asterisk pair) delimiter.

The problem

For paired delimiters like ( and ) or [ and ] it’s easy to know when to “open” or “close” a block, and support nested blocks. In contrast, a twin delimiter is an opening one if not preceded by a closing one inside the same block, regardless of what happens in subblocks. This is a complicated way to say we support this kind of nesting:

// block-1 ( block-2 //block-3// ) //

+ block-inside-solidus-pairs
    block-1
  + block-inside-parenthesis
      block-2
    + block-inside-solidus-pairs
        block-3

We also support this:

block-1 // block-2 // block-3 // block-4 //

  block-1
+ block-inside-solidus-pairs
    block-2
  block-3
+ block-inside-solidus-pairs
    block-4

(We have only one level of nesting here. 2 levels of nesting is counter-intuitive and would have required very complex lookahead.)

The pattern

The pattern is to define special grammatical elements when inside a block defined by a twin delimiter, to propagate this element cannot appear again, unless inside some other subblock.

Taking “XXX” for the name of some twin delimiter, here is a simplified version of the grammar for spreadblocks. The term “spreadblock” stands for a block that may spread on several lines (containing single line breaks).

paragraph
  : ... mixedDelimitedSpreadblock
  ;

mixedDelimitedSpreadblock
  : word ( punctuationSign | delimitedSpreadblock ) ...

delimitedSpreadblock
  : xxxSpreadblock
  : parenthesizedSpreadblock
  | squareBracketsSpreadblock
  | doubleQuotedSpreadblock
  | hyphenPairSpreadblock
  ;

parenthesizedSpreadblock  
  : '(' spreadblockBody ')' // Same for other paired delimiters.
  ;

spreadblockBody
  : ... mixedDelimitedSpreadblock
  ;

xxxSpreadblock
  : XXX spreadblockBodyNoXxx XXX
  ;

spreadblockBodyNoXxx
  : ... mixedDelimitedSpreadblockNoXxx ...
  ;

mixedDelimitedSpreadblockNoXxx
  : ... delimitedSpreadblockNoXxx ...
  ;

delimitedSpreadblockNoXxx
  : parenthesizedSpreadblock
  | squareBracketsSpreadblock
  | doubleQuotedSpreadblock
  | hyphenPairSpreadblock
  ;

This is more or less the same for tightblocks. “Tightblocks” stand for blocks containing no line breaks, like cells and embedded lists.

acell  // Same for embedded list items.
  : ... mixedDelimitedTightblock ..
  ;

mixedDelimitedTightblock
  : word ( punctuationSign | delimitedTightblock | ... ) ...
  : word ( punctuationSign | delimitedSpreadblock | ... ) ...
  ;

delimitedTightblock
  : xxxTightblock
  | parenthesizedTightblock
  | squareBracketsTightblock
  | doubleQuotedTightblock
  | hyphenPairTightblock
  ;

xxxTightblock
  : XXX tightblockBodyNoXxx XXX
  ;

tightblockBodyNoXxx
  : ... mixedDelimitedTightblockNoXxx ...
  ;

mixedDelimitedTightblockNoXxx
  : word ( punctuationSign | delimitedTightblockNoXxx ) ...
  ;

delimitedTightblockNoXxx
  : parenthesizedTightblock 
  | squarebracketsTightblock
  | doubleQuotedTightblock
  | hyphenPairTightblock
  ; // That's all.

Thought it is over? There is another kind of block, the delimitedTightblockNoSeparator used inside the subblockAfterTilde which reflects each block inside ~x~y~z! But at this point you probably got the idea.

Yes this makes the grammar quite verbose, but factoring it would reduce ANTLR’s ability to check for inconsistencies. Anyways, the slightest addition brings the need of writing test cases for every logical path inside each ANTLR grammar rule.

2010-11-07

Novelang-0.51.1 released!

Just released Novelang-0.51.1!

Summary of changes:

Upgraded from FOP-0.95 to FOP-1.0. FOP is the library for generating PDF documents.

Various other library upgrades that shouldn’t affect normal users.

Download it from here.

Enjoy!

2010-11-06

Novelang-0.51.0 released!

Just released Novelang-0.51.0!

Summary of changes:

Fixed: list with double hyphen and number sign was using a “plus sign” everywhere (source documents and XML elements). This might break existing documents and stylesheet using this brand new feature.

Download it from here.

Enjoy!

Novelang-0.50.2 released!

Just released Novelang-0.50.2!

Summary of changes:

Fixed: support paragraphs as lists (n:list-with-triple-hyphen and n:list-with-double-hyphen-and-plus-sign) inside n:paragraphs-inside-angled-bracket-pairs.

Download it from here.

Enjoy!

Novelang-0.50.1 released!

Just released Novelang-0.50.1!

Summary of changes:

Minor fix on JavaShell for cleaner shutdown when there is no default JmxKit. This only may affect users of Novelang-attirail subproject.

Fixed documentation generation where release notes for SNAPSHOT versions appeared for non-SNAPSHOT versions.

Download it from here.

Enjoy!

2010-11-05

Novelang-0.50.0 released!

Just released Novelang-0.50.0!

Summary of changes:

Embedded numbered lists (n:embedded-list-with-number-sign).

Paragraphs as numbered lists (n:list-with-double-hyphen-and-plus-sign).

Switched to Maven 3. This required no change but future build features may not work with formerly-used Maven 2.2.1.

Download it from here.

Enjoy!