2008-08-31

git's "detached head"

git is just the best version control tool, ever, period. Most of times, it's straightforward. Sometimes you have to really understand what happens. A few day ago I synchronized with another repository on a USB key. Then I resumed my work. While requesting git status I saw many times there was no current branch. I decided to fix that by forcing the current branch, issuing a git checkout master. Then all the work I did since the synchronization appeared to be lost on both git repository and my local filesystem! As those commits happened, I was pretty sure they were somewhere inside my git repository. Reading git doc carefully, I learned that I was working with a "detached head" (poor myself).
It is sometimes useful to be able to checkout a commit that is not at the tip of one of your branches. [...] The state you are in while your HEAD is detached is not recorded by any branch (which is natural --- you are not on any branch). What this means is that you can discard your temporary commits and merges by switching back to an existing branch.
Clearly, I messed the merge in some way. The way to recover was mentioned: the "reflog" kept track of every changes (including those not attached to a branch) until a git prune or a git gc. Here is what I did. First, read the reflog and find last "lost" commit with my bare eyes:
$ git log -g --after=2008-08-14
Appeared to be:991ee3ebc11e1dc3434fab4c22e261b7e0711346. This time I created a branch:
$ git branch rescue_2008-08-23 
$ git checkout rescue_2008-08-23
Now get back every "lost" stuff with one single command (commits are chained):
git checkout 991ee3ebc11e1dc3434fab4c22e261b7e0711346
This looked good. I committed inside the "rescue_2008-08-23" and switched back to the "master" branch:
$ git commit -a
$ git checkout master
Merge happened seamlessly as a "fast forward", wow!
$ git merge rescue_2008-08-23

Updating 13d16f3..31626f8
xxx: needs update
[...]
Fast forward
[every "lost" change listed here, there were many!]
This mess happened because I had some problems during the merge I didn't try to understand. Next time if I get such problems I'll do all the mess in a new branch of the target repository, and then perform a second merge.

2008-08-30

Barcode4J

I just threw Barcode4J's library files into Novelang and in the next version you'll be able to include various kinds of beautiful barcodes in your PDFs. I could say: "look how powerful I am!" but as a fervent Novelang blogreader you know now who's deserving the fame. PDF-embedded barcodes in a FO stylesheet requires such a namespace declaration:
<xsl:stylesheet version="1.0"
  ...
  xmlns:barcode="http://barcode4j.krysalis.org/ns"
>
And the barcode itself looks like this:
  <fo:block>
    <fo:instream-foreign-object>
      <barcode:barcode message="L loves L!">
        <barcode:datamatrix>
        <barcode:module-width>9.5mm</barcode:module-width>
        <barcode:shape>force-square</barcode:shape>
        </barcode:datamatrix>
      </barcode:barcode>
    </fo:instream-foreign-object>
  </fo:block>
Of course the "L loves L!" message could be replaced by something more serious like an EAN-13 barcode (the one used for ISBNs). In this case the <barcode:datamatrix> element becomes <barcode:ean-13> but you get the idea (datamatrix looks very pretty). Barcode4J's documentation is excellent so at best I would do some copy-paste. Just one advice of mine: in order to avoid FOP warnings you should add the barcode: namespace in front of each element. Regarding Novelang's develepment roadmap, this barcode feature may look a bit alien but I was in need for it and it doesn't cripple Novelang architecture or grammar at all, just a few more files in the lib/ directory. The only problem I ran through was a missing SVG-related classes that appeared to be in the xml-apis-ext.jar file in Batik-1.7 that I renamed into batik-xml-apis-ext-1.7.jar. It's ok now and I won't complain if projects like FOP or Batik come with many jar files that help to understand what's doing what. Barcode4J is definitely sweet and makes your project shine. Long live to its developers.

Book configuration as rendering parameter

By now the ?stylesheet=... request parameter proved especially useful to render a single Part file, using a stylesheet that can be globally defined at Book level (with mapstylesheets command). As I blogged the Book file heads towards carrying more and more configuration stuff. So, a ?book-configuration=... request parameter would make sense, in order to reuse Book's configuration, like stylesheet, encoding, hyphenation language and probably more. Because a Book could define Part-specific configuration, those must be taken in account if the Part if one of those included by the Book.

About handling multiple languages

In my previous post I raised the subject of multiple languages. In order to keep things simple, one Part file should have one language no more. The Book file provides the context for file encoding and hyphenation language.
insert file:in-english.nlp 
    $encoding=ISO-8859-1 
    $hyphenation-language="en_GB"
The language could become a parameter to pass to every renderer (especially FO stylesheets!).

Hyphenation at work

With Novelang-0.9.0 comes hyphenation support for PDFs. Basically, it's just passing a directory containing hyphenation files to FOP (the PDF generator). When trying to make hyphenation work for real, I hit several problems. It was with fr.xml hyphenation rules, which fortunately comes under the GPL license. Hyphenation takes care of apostrophes. That's because with a "remain character count" of three it is correct to hyphenate a word like "l'attrait" like this: "l'at-trait". The fr.xml was quite clear, with many occurences of the APOSTROPHE character (U+0027) which is also called "single quote" and looks symmetrical. But hyphenation occurs after the FO-generating XSL replaced the <apostrophe-wordmate> element by the RIGHT SINGLE QUOTATION MARK character (U+2019) which looks better than APOSTROPHE, but was not understood by hyphenation rules, causing potential hyphenation bug on every word with a "relooked" apostrophe. I spent much time trying to hack the rules which were correct, and finally the solution was to replace every APOSTROPHE by RIGHT SINGLE QUOTATION MARK (the &#x2019; XML entity). Because hyphenation worked better, it changed the word distribution and raised another problem: some proper nouns got hyphenated. FOP documentation tells about an <exceptions> element containing words to not hyphenate at all. First it didn't work and I had to trace into FOP code to find out that every word in exception list should be lower-cased. So Novelang could support:
  • An exception list declared in the Book file itself.
  • Automatic replacement of the quoting character.
As quoting character may vary (as it is defined in the stylesheet) this implies a metadata mechanism with the stylesheet exposing which character it uses. Such a mechanism would be useful for plenty of other things, like expected image resolution for automatic resampling. When there is no licensing issue preventing from distributing the hyphenation file, there could be built-in files providing standard stuff (including the easy-to-forget hyphenation.dtd). Generating temporary files may seem unelegant but it makes debugging easier than in-memory structures and playing with custom URL protocols. Hyphenation would get really simple for French users! Now this opens another interesing question : how to handle documents with several languages?

2008-08-22

Fonts

I've been proud of Novelang support for custom fonts and now I'm reading this: FOP, the PDF generator used by Novelang, is able to scan multiple directories for fonts. It is even able to use system fonts (based on a directory scan, though). http://xmlgraphics.apache.org/fop/0.95/fonts.html#register I still believe that using system fonts is error-prone. I wonder if FOP is able to aggregate in the same family various files representing different weights and styles. Anyways, Novelang needs a font list for the font listing which is really useful, especially when there are some broken fonts somewhere. But FOP should hold such a list somewhere internally. Now I'm tempted to give access to FOP configuration file in some way instead of making FOP "transparent". Such transparency fails because the naming convention for fonts (Xxxx-bold-italic.ttf) was influenced by how FOP works.

2008-08-16

Novelang-0.9.0 released!

Latest release of Novelang available from here. Coolest features of this release:
  • Custom fonts.
  • Hyphenation support.
  • New $style parameter for the insert function.
  • Superscript.
See release notes (in the "Status" chapter) for details. Enjoy!

2008-08-14

GPL-friendly fonts: Bitstream-Vera

It's not the purpose of Novelang to come with its own set of fonts and this could turn licensing to a mess. But testing require a set of valid fonts and since the project is public, I'm reluctant to put a font I've no rights to distribute under the version control system. Lucky me I discovered the Bitstream Vera Fonts with a license which is GPL-compatible (they're used under a Linux-related project). Those fonts are designed for screen display rather than printing but they behave quite well and provide a very complete family which is perfect for various tests. If you're interested you can download them from here. Nice work, guys!

Extending standard stylesheet functions

The eXtensible Stylesheet Language for Transformations comes with its own set of functions, but Xalan, the XSLT processor shipped with Novelang support additional functions. Here is how it works, for function that converts numbers into words. First we define a static method (the most simple approach) in some Java class, doing the conversion we need. Parameters are: the number, the name of the language, if we want lower or upper case or capitals.
package novelang.rendering.xslt;

public class XsltFunctions {

  public static String numberAsText(
      Object numberObject,
      Object localeNameObject,
      Object caseObject
  ) {
  // ...
  }

}
The class must appear in the Novelang classpath (Java developers know what it means). In the stylesheet we add a special namespace that we call "nlx" like "NoveLang eXtensions":
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:n="http://novelang.org/book-xml/1.0"
    xmlns:xalan="http://xml.apache.org/xalan"
    xmlns:nlx="xalan://novelang.rendering.xslt.XsltFunctions"
 >
  ...
Here is how looks the call to convert a number into words:
  <xsl:value-of 
      select="nlx:numberAsText(43,'EN','capital')" />
Of course function calls (like position()) can replace our litteral number ("43"). The complete example is here and also contains a nice trick for hiding page numbers when they are not welcome. This function is useful for giving a special touch to lists or chapter numbers but we can imagine many other usages.

Custom fonts in PDFs!

Custom fonts support in PDFs now work and will be available in the next release. Basically, here's all what you have to do:
  • Create a fonts directory at the root of your Novelang project.
  • Copy all the True Type fonts (.ttf files) you need here, suffixing the font name with bold.ttf, italic.ttf, and bold-italic.ttf, according to corresponding style and weight.
  • Check if all fonts are healthy by requesting the font listing to Novelang from your Web browser.
  • Enjoy, and set the font-family attribute in your stylesheets.
The URL for listing fonts is:
http://localhost:8080/~fonts.pdf
The listing displays all reccognized fonts with their name, the file name, and most of characters supported by Novelang grammar. The font directory may be set explicitely, with the novelang.fonts.dir. FOP, the PDF generator, needs to create a file for each font with font metrics. By default it is created in a fop-metrics directory under the Novelang project root, but this can be set to another place by setting the novelang.fop.fontmetrics.dir system property. If there is something wrong with any font in the directory, there is no custom font at all by now and the error message only displays in the log file. If you download free fonts on the Internet you will learn quickly that many are full of bugs and missing letters (especially accents) so I recommand to add them one by one and restart Novelang each time. FOP has its own limitations when dealing with fonts. True Type fonts define bold, italic and bold + italic as four different fonts. Operating systems and desktop applications show the families as a whole (but you know, they lie all the time). In order to remain platform-independant, Novelang uses the most simple convention. First it seemed inconvenient to require a copy of every needed font, but now it appears as a good thing to me. People using publishing tools often complain about a missing font, or a buggy one. Making font files a part of your Novelang project, with the same sharing and backup strategy as for content and stylesheet, is a clear way to gain in robustness. Possible usability improvements:
  • List broken fonts in the font listing.
  • Support several font directories.
  • Support True Type Collections (FOP does that).
  • Support Type One fonts (FOP does that).
  • Detect font file change on the disk and therefore provide an updated listing. Restarting Novelang after adding a font wouldn't be required anymore.
  • Use a temporary directory for font metrics files.

2008-08-04

Decorations revisited

On the previous post about URLs we extended the decoration concept to URLs. We stated that some identation tells "this is a decoration for the construct right below". As this impacts the way to define identifiers here is an updated example, with 4-space indentations.
    \\chapter-one-identifier
    @tag-1
== Chapter one

    \section-one-identifier
    @tag-1 @tag-2
=== Section one

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Nunc vulputate, elit ac accumsan sodales, libero nisi 
euismod erat, a semper dolor turpis non pede. 
Donec sem ligula, congue id, porta et, tincidunt quis, 
eros. Praesent ipsum. Ut at urna. Proin cursus condimentum
risus. Fusce at lacus tincidunt 
    "here"
    [Visit Novelang website]
    \\novelang-website
http://novelang.sf.net
metus tristique dictum. Pellentesque habitant morbi 
tristique senectus et netus et malesuada fames ac turpis 
egestas. Suspendisse potenti. Aliquam id quam. Quisque 
pellentesque est vitae est. Morbi faucibus ornare ligula. 
Pellentesque sed mi non elit vehicula ullamcorper. 
Pellentesque habitant morbi tristique senectus et netus 
et malesuada fames ac turpis egestas. Nunc vel eros nec 
leo mollis adipiscing.

    \paragraph-identifier
Pellentesque mollis, quam et tincidunt vulputate, ligula 
lectus ullamcorper lacus, non sagittis lorem lectus ut 
tellus.Praesent diam mi, convallis et, pharetra sed, 
tempus in, lacus. Integer aliquet, augue ac vestibulum 
sollicitudin, ligula erat molestie eros, sed feugiat diam 
felis et odio.


    \section-two-identifier
    @tag-2
=== Section two

In orci elit, porta id, volutpat ac, ornare sit amet, 
felis. Mauris vel ipsum eget mi gravida pellentesque. 
Vestibulum et pede et mi lobortis cursus. 
Phasellus fermentum, odio non auctor placerat, nisi pede 
aliquam nisi, in ultrices leo mi vitae risus. 
Lorem ipsum dolor sit amet,  
    "consectetuer adipiscing elit"
url-ref: \\novelang-website
. Quisque eu neque ac lectus consectetuer pharetra. Nulla 
rhoncus elementum mi. Phasellus vitae diam. Class aptent 
taciti sociosqu ad litora torquent per conubia nostra, per 
inceptos himenaeos. Sed bibendum, sem nec consectetuer 
laoreet, ante felis aliquam metus, non placerat nunc erat 
vitae dolor. 

URL syntax

By now URLs must appear as a standalone paragraph. This is correct:
Go to:

http://novelang.sourceforge.net

And see all useful links.
But this is incorrect:
Go to http://novelang.sourceforge.net and see all useful links.
There is a good reason to keep the URL on its own line: most text editors make easy to copy a whole text line, so there is less chance to forget some characters when moving the URL inside the text or copy-pasting it to a Web browser. Aside of this, "http", ":" and "//" are legal Novelang grammar constructs (as word, punctuation sign, and start of italics, respectively) so a hint on where to find a URL makes the Novelang grammar much simpler. I've looked at the way Markdown and WikiCreole define hyperlinks. Both require too many delimiters; I prefer to leverage on the fact a URL takes place on its own line. The good thing to keep from Markdown is labelling (give a label to some URL and reuse it later through this label); one day Novelang will do the same through identifiers. So let's say this could become legal:
Go to 
http://novelang.sourceforge.net
and see all useful links.
There are at least two features missing: text for URL and advisory title (the one appearing in a tooltip). URL text and advisory title are about "decorating" some Novelang construct. This was previously discussed for identifiers. We could get something like:
Go to 
  "Novelang website" 
  [Novelang website on Sourceforge.net]
http://novelang.sourceforge.net
and see all useful links.
I don't want to use new delimiters for URL text and advisory title, in order to not transform Novelang grammar to some new flavor of XML. So let's say double quotes are for URL text and square brackets for advisory title. As a good news, this notation is consistent with the way to decorate paragraphs with identifiers. This strengthens the meaning of indentation as "here stands metadata stuff for the thing right below". So we're breaking previous decision to put chapter and section decorations below the header. It's amazing to see, how keeping consistency on a grammar carries the showckwave of small changes on long distances: here it was about adding URL text and now we're revising the way we write chapter and section headers.