2008-10-01

Inferring fonts characteristics

Now I'm trying to display a nicer font listing. FOP does a great job, reading font files and extracting font name, style, and weight. A font name is disconnected from the font file name (though before Novelang-0.11.0 it was not the case). A font name should correspond to a typeface, which is a family of font. For the "Linux Libertine" font name, there can be several variants, like roman or bold+italic. But when taking a closer look at the information provided by FOP there are "virtual" font names, corresponding to the font variant of a given file, or an abbreviated name. Let's consider the files of the Linux Libertine typeface:
  • LinuxLibertine.ttf
  • LinuxLibertine-Italic.ttf
  • LinuxLibertine-Bold.ttf
  • LinuxLibertine-Bold-Italic.ttf
  • LinuxLibertine-SmallCaps.ttf
Beware of the trick: there are four files corresponding to standard style / boldness combinations plus the small capitals which can be considered as a separate font. In FOP's terminology, a font-triplet associates a font name, a style (normal / italic) and a weight (normal, bold, extra-bold...). Each triplet has a priority meaning (I guess) that triplets with higher priority should be used first when resolving a font triplet.From the five files FOP extracts following font-triplets:
"Linux Libertine" italic, bold, p=12
LinuxLibertine-Bold-Italic.ttf

"Linux Libertine" italic, normal, p=7
LinuxLibertine-Italic.ttf

"Linux Libertine C" normal, normal, p=7
LinuxLibertine-SmallCaps.ttf

"Linux Libertine" normal, bold, p=5
LinuxLibertine-Bold.ttf

"Linux Libertine Bold Italic" normal, normal, p=0
LinuxLibertine-Bold-Italic.ttf

"LinLibertineBI" normal, normal, p=0
LinuxLibertine-Bold-Italic.ttf

"Linux Libertine Bold" normal, normal, p=0
LinuxLibertine-Bold.ttf

"LinLibertineB" normal, normal, p=0
LinuxLibertine-Bold.ttf

"Linux Libertine Italic" normal, normal, p=0
LinuxLibertine-Italic.ttf

"LinLibertineI" normal, normal, p=0
LinuxLibertine-Italic.ttf

"Linux Libertine Capitals" normal, normal, p=0
LinuxLibertine-SmallCaps.ttf

"LinLibertineC" normal, normal, p=0
LinuxLibertine-SmallCaps.ttf

"Linux Libertine" normal, normal, p=0
LinuxLibertine.ttf

"LinLibertine" normal, normal, p=0
LinuxLibertine.ttf
This looks quite messy. Using raw this raw data, the font listing would reveal 14 fonts instead of the 5 expected. That is because FOP focuses on resolving font variants given a name, a style and a boldness, while each font file may contain more than one font name. Novelang has to take FOP's information and move it upside-down to obtain a human-readable font list. First, sort all triplets by priority (like above). Let's say that all triplets with a priority greater than 0 define "good" font names: font names that are shared between triplets can be used safely to choose font variants (while there is no chance to get a variant from font names that already describe a variant, like "LinLibertineI"). Let's call those names the "clean names". In the list above we get following clean names: "Linux Libertine" and "LinLibertineC". Then it is easy to craft a structure like this:
"Linux Libertine"
  italic, bold, LinuxLibertine-Bold-Italic.ttf
  italic, normal, LinuxLibertine-Italic.ttf
  normal, bold, LinuxLibertine-Bold.ttf
"Linux Libertine C"
  normal, normal, LinuxLibertine-SmallCaps.ttf
The "Linux Libertine, normal, normal" font-triplet is missing. Using the clean name "Linux Libertine" it is easy to find from font-triplets with priority zero. If looking for perfection we can try to locate a better name for "Linux Libertine C". How? Once the clean names are established, we look for singletons in the set of font triplets with priority greater than zero. For each of those elements, we replace the clean name by an "outstanding name" which is the longest name in the set of font-triplets with priority zero with the same font file (LinuxLibertine-SmallCaps.ttf). So now we have something like this:
"Linux Libertine"
  italic, bold, LinuxLibertine-Bold-Italic.ttf
  italic, normal, LinuxLibertine-Italic.ttf
  normal, bold, LinuxLibertine-Bold.ttf
  normal, normal, LinuxLibertine.ttf
"Linux Libertine Capitals"
  normal, normal, LinuxLibertine-SmallCaps.ttf
Now there is the temptation to show all available font names in the list, like "Linux Libertine C" as an alias for "Linux Libertine Capitals". While this would increase the complexity of the algorithm, I don't see how useful this would be. Anyways, the algorithm described above may require additional work, considering messy fonts of the real world.

No comments: