2008-10-21

Stylesheet metadata and caching

As shown by previous posts on this blog, I've already spent a lot of time on fonts and how FOP deals with them. By now (before releasing Novelang-0.12.0) the result is far from perfect but I should move on to other topics in order to keep the product well-balanced. So today I'm blogging for the pleasure to write purely speculative things, you've been warned. Outstdanding defects of current solution are:
  • FOP caches font metadata in ~/.fop/fop-fonts.cache. Font deletion doesn't seem to be detected, so it's better to always delete the cache file at application startup.
  • Font changes are not detected while application runs. This is Novelang's fault (in RenderingConfiguration) as it always returns the same FopFactory instance.
  • Fonts defined as application parameters. Should be defined at stylesheet level in order to get a finer grain.
The last point relates to the previously-discussed point on how to enrich an XSL file with FOP-specific metadata. It could include a link to hyphenation directory, as hyphenation rules may depend on some characters defined in the stylesheet (like the apostrophe). By now, when the user touches one of her / his stylesheets, the next document rendition takes the changes in account, and that's the way it should be. Fonts or hyphenation directories are set at FopFactory level, which currently lives as long as the whole Novelang application. For supporting "live" changes in XSLs, Novelang should re-create a FopFactory each time. What happens if reading font metrics takes much time? Remember: FOP has a cache for this and I guess there is a reason. I know that "premature optimization is the root of all evil" so caching has low priority for Novelang but for my own personal comfort I must know how to perform caching. The goal here is to cache a whole FopFactory (possibly several ones, for several documents rendered concurrently) for reuse when the configuration is the same. From my Cocoon days, I remember the excellent caching system. Each cacheable resource involved in the making of the final document has a CacheValidity object that tells whenever another CacheValidity is valid regarding current one. Here is my own flavor of CacheValidity using Generics (the Javadoc of the original is here):
public interface CacheValidity< T > {
 boolean isValid( T other ) ;
}
I'm not sure on how to use Generics in this case but anyways I'll see. Now here is how to get a new resource from a cache. Let's say this is a method of an object holding an instance of a CacheableFile. Synchronization is omitted for brevity.
public String getContent( File file ) {
  final FileValidity current = 
      new FileCacheValidity( file ) ;
  if( ! cache.getValidity().isValid( current ) ) {
    cache = createCache( file ) ;
  } ;
  return cache.getCachedContent() ;  
}
From the simple and clever CacheValidity object, its is possible to implement various caching strategies in a transparent manner. It is also possible to compose them, like using a temporal cache for avoiding any refresh during an given time interval, or implement a directory cache from a list of file caches. More exotic CacheValidity objects may represent a whole XML fragment corresponding to the XSL metadata. If it is left untouched by the user and files are untouched, too (what should happen the most often) then the FopFactory may be used again. I don't know if it's worth reusing Avalon code (it's a lot of mess with no Generics) but it was definitely worth a look. So what's the lesson here? If I had started to look at caching issues from the beginning, attempting to make Novelang fit around Avalon or whatever, I would have lost the focus and started recoding Cocoon (which is the historical reason for building Avalon). But I started Novelang because I was unhappy with Cocoon. So I'll probably have a close look at Avalon, and rewrite caching stuff the way I feel. This may sound like horribly stupid "Not Invented Here" syndrome but my own experience on incremental developments shows that value is not in the code itself, but in the understanding of the problems. (That's why many organizations worship somewhat crappy home-grown frameworks: not because of their intrisic value, but because it's the proof they could bring many people thinking the same way.) Joel Spolsky develops this point of view under the light of the competitive advantage.

No comments: