2009-06-25

2009-06-21

Autotagging

The Tag feature is great. It turns out that I’m mainly using it to tag levels, and too often it leads to duplicate information like this:

  @Graphics
== Graphics

...

What I want is to simply write:

== Graphics

...

And then Novelang should guess that “Graphics” matches the Graphics tag. I call this “Autotagging”.

We need a few simple tranformations rules to turn titles into tags.

“foo”@foo
“Foo”@Foo
“Foo bar”@Foobar
“Foo, bar”@Foo @bar
“Foo. Bar”@Foo @Bar
“Foo-bar”@Foo-bar

Automatically-generated tags wouldn’t be part of the tag list, but they would be used for the matching of a known tag with existing level titles.

Depending on the document, this behavior may produce a lot of noise so it requires explicit activation.

Of course, the default tag list (with checkboxes) has a new option to enable autotagging, with some JavaScript adding an autotag parameter in the URL.

2009-06-19

Novelang-0.30.0 released!

Latest release available here.

This release fixes a few bugs and brings some minor enhancements. See documentation for details.

2009-06-07

Book index

Start small, think big: while I don’t even provide a decent default stylesheet, I’m not afraid to blog about a tough subject: book index.

An index is a table at the end of a book, referencing pages or chapters containing a pertinent usage of a key word.

A very simple kind of index could look like this:

icons, 21, 136, 138

You can have ranges:

gender in language, 4-6

References avoid duplicates:

GUIs, see graphical user interfaces (GUIs)

Words can group on several levels (3 being the maximum):

keys
  capitalizing name of, 68
  typographic conventions for, 144, 147
  writing about, 142

Simplest representation

How could Novelang help to represent this?

The simplest thing to do is to use some kind of delimiter to tell that a word is an index entry. (As usual the meaning of the delimiter would be a matter of stylesheet.)

The {icon} is displayed on the corner.

Obviously, this doesn’t work, because we want the index entry to show a plural. So we invent some new kind of syntactic form representing a tuple (which symbols are used doesn’t matter at this stage).

The { icon | icons } is displayed in the corner.

Great, but how to model several levels of entries? We could make the source document look like this:

In this case we must
{ capitalize | { keys | capitalizing name of } }
the name of a key.

This feels just unreadable!

External index entry declaration

The trick: split index entry declaration from its complete definition, using several files. Complete definition would rely on a subset of Novelang grammar for source files. Example above becomes:

%% source document:

In this case we must
{ capitalize | capitalizing-name-of-key }
the name of a key.


%% index definition file:

capitalizing-name-of-key
- capitalizing name of
  - keys

This deserves a few explainations. The first capitalize in the source document is still what’s displayed. The capitalize-name-of-key is the entry name that is not supposed to be read by anybody else than the document writer – could be 123456 as well. The capitalizing name of in the index definition file is what to display in the index. The keys subitem is the parent item.

Because name of the index entry has no semantic meaning, we can let Novelang generate it using simple replacement rules (spaces becoming hyphen minus…). Explicit names are useful for special cases (like homonyms) but now we expect to be able to write:

%% source document:

In this case we must {capitalize} the name
of a key.


%% index definition file:

capitalize
- keys
  - capitalizing name of

Because for the same index entry we may have another pertinent words which are not exactly “capitalize” we let the index entry support several names.

%% source document place 1:

In this case we must {capitalize} the name
of a key.


%% source document place 2:

Sometime the name of a key should not be
{in capitals}.


%% index definition file:

in-capitals
capitalize
- keys
  - capitalizing name of

The index definition file could support lots of features.Some styling. There are rare cases (like latin names) where italics are required.A kind of markup to tell which words to take in account in alphabetical sort.Multiple posting: the same keyword in the source document has several index entries. This can happen using several embedded list items.“See” and “See Also” references.

Reusing tags and identifiers

Now, entry names may give some feeling of déjà vu. We already have machine-processed names with tags (implemented) and identifiers (not implemented yet). Are entry names just redundant? As tags and identifier apply on a whole paragraph or level, they don’t have the same level of precision when it comes to refer to the exact location of a word. And it turns out that referencing a range of paragraphs (or even chapters) is what we need for supporting page ranges.

The obvious way of handling page range would be to add a new “end of index entry range” here polluting the source document and making it look like LaTeX. This is a convention of the Novelang grammar: avoid end delimiters whenever possible. Tags and identifiers provide a nice extension.

Here is how we use a tag in the index definition file. Instead of tagging, it refers to the tagged text as index entry name:

%% source document:

@gender
== Stylistic principles

=== Avoid jargon

 ...

=== Avoid sexist language

 ...

=== Common gender

 ...


%% index definition file:

@gender
- gender in language

This is not yet perfect because page ranges are bound to the scope of a tag​/​identifier. But it seems possible to Some tree-mangling could detect that several tagged nodes appear consecutively:

%% source document:

== Stylistic principles

=== Avoid jargon

 ...

@gender
=== Avoid sexist language

 ...


@gender
=== Common gender

 ...

Tree-mangling could add a special marker as the last child of the last node of the consecutive tagged ones. Our tree would look like:

level
  + level-title     "Stylistic principles"
  + level
  |   + level-title "Avoid jargon"
  + level
  |   + level-title "Avoid sexist language"
  |   + tag         "gender"
  |   + paragraph
  |   |   + start-of-range "gender"
  |   |   + ...
  |   + paragraph
  + level
      + level-title "Common gender"
      + tag         "gender"
      + paragraph
      + paragraph
         + ...
         + end-of-range    "gender"

Foreseen weaknesses

This is still unperfect because page ranges are bound to the scope of tags​/​identifiers but this looks like a pretty good approximation.

Another possible drawback is the ability to have overlapping ranges or page numbers with several different tags​/​identifiers, or index entries in the middle. Because FOP provides no hook to deal with page numbers once they are known, Novelang could not fix a serie of page numbers like 14-17, 15-18, 15, 16. It’s a long way to perfection.

Bibliography

All examples from A Style Guide for the Computer Industry, Sun Technical Publications, 1996.

2009-06-03

Novelang-0.29.0 released!

Latest release available here.

It brings various enhancements to whitespace handling, and the pretty color palette for tags.