2008-06-09

Novelang syntax for Parts

I haven't documented the Novelang syntax yet but there are already plenty of things to change. WikiCreole's reasoning and Markdown give a great start for yet another discussion on Wiki markup.
In this document, character names refer to Unicode specification.
Headings
The chapter should be a double equals, and the section a triple one. There is a single character to know about and it's dedicated (asterisk are used by bold and unordered lists, see below). And it's eye-catching without the crippling effect of many asterisks.
== Chapter

=== Section
This makes the markup "scalable" in the sense it becomes easy to support a subsection level (though it may reflect that Parts are becoming too complex).
Identifiers and Tags may decorate Headers as they appear just below Header declaration (one linebreak away). Header identifiers are prefixed by an ampersand. Tags are prefixed by a commercial at.
== Chapter
  &identifier @tag
Paragraphs
Paragraphs are just lines of text. They are delimited from the rest by two linebreaks or more (aka hardbreak). They support identifiers and tags immediately above (one linebreak away no more). Paragraph identifiers are prefixed with a plus sign immediately followed by a commercial at, to indicate they don't work the same as Header identifiers, which are global.
This is one paragraph, continuing
on this line.

  +&identifier
This is another paragraph with an identifier.
Words are any sequence of letters and numbers. There can be a single dash between two letters or number. Apostrophe is a word delimiter.
C'mon, just a two-worded word!
There are some combinations which require character escaping, like acroynyms with dots. That looks messy but trying to turn this into a generic case seems to make things even worse.
I want my T~.~L~.~A (Three-Letter Acronym)!
Character escaping
I've been discussing character escaping and now I think that there should be no difference between single and multiple character escape, in order to avoid confusion. 
Ampersand: ~&~
O and E ligatured: ~OE~ 
Tilde: ~tilde~ 
Backslash character was an option but I like the tilde character as it carries the meaning of something linked to the rest.
Inline litteral
Inline litteral requires a delimiter with reduced visual cripple, available on most keyboard, generating minimal conflict with casual use. The grave accent (backquote) is such a gem.
 This is double slash `//` delimiter.
Tilde was a serious candidate but it has a better meaning for escaping, while the grave accent looks more like quotation. 
Bold
Double asterisk looks good and is consistent with italics' double slash. 
This is **bold**.
Subscript and superscript
A delimiter made of a single character is more concise than a double one. As it takes less visual space it reflects semantically weaker meaning. Circumflex means superscript and low line (underscore) means subscript. 
L^A^T_E_X is expected to render as LATEX.
Supporting subscript and superscript will be a mess because wether it is attached to a word or not does matter for the rendering.
Links
Often I got annoyed when copying an URL in the middle of the line. Now I'm reinventing a better world and I want to force the URL to appear at the beginning of the line. 
Many Wikis have messy syntax for URLs / URIs because of related title and text. This can be avoided by some contextualization like the quotes immediately following an URL become the text to show. Same for the link title that could be a parenthesized block.
The URL here belongs to current paragraph:
http://novelang.sf.net "Go there" (Novelang home page)
Then HTML output is expected to look like this :
The URL here belongs to current paragraph: Go there
If the quoted text really should appear as quoted text then a line break cuts it away from the URL while keeping it inside the paragraph.
URLs are an easy case as its starts with a scheme ("http:" or "file:") but URIs are harder to handle. They are left out for the moment.
Lists
Unordered lists have items starting with an asterisk.
Ordered lists have items starting with a number sign.
Sublevels could repeat the list item sign but a level 2 unordered list item marker would clash with the bold marker. The trick is to use indentation.
* Item 1
  * Item 1.1
  * Item 1.2
* Item 2
And it goes the same for ordered lists. Some text editors recognize intentation and perform wrapping under the first indented character.
Blockquotes
A pair of angled brackets look fine for defining blockquotes. They must be on the start of the line and alone on the line where they appear.
<< 
This is a blockquote. 
>>
I try to avoid closing delimiter whenever possible, but the alternative approach, which is to use a special character at the beginning of a paragraph, would require to edit each paragraph when pasting foreign text.
Litteral
Litteral is text appearing the same as in the Novelang markup. It appears inside triple angled brackets, opening and closing brackets must be on the very beginning of the line and their trailing space is not rendered. To render triple angled brackets at the beginning of a line, character escaping is required. But such combination is quite rare and shouldn't be a hassle.
<<< 
This is litteral, 
  preserving indentation.
>>>
The need of a closing delimiter is a no-brainer here.
Tables
As the Book feature supports including other file's content there should be a function to read a CSV or whatever and display it nicely. So we don't pollute the markup with a feature bloating other wiki's syntax with more and more complex style stuff.
Features I'm happy with
Interpolated clauses must be more than a special character like —. They must be declared as blocks (with opening and closing), so rendering can insert non-breakable spaces after opening dash and before closing dash. And the closing may be hinted to be not-renderable.
Interpolated clause delimiter is double dash. A dash then a low line define a non-renderable ("silent") closing.
Interpolated clauses -- like this one -- do rock.
Silent ends rock, too -- yeah-_.

Parenthesis, brackets and quotes are blocks, too, using conventional character.
(parenthesis) [brackets] "quotes"

Italics use double slash delimiter. A double character looks "big" so it refrains from overuse.
This is //italics//.

Punctuation signs come with no surprise.
Question mark ? Exclamation mark ! Colon : Semicolon; Comma, Ellipsis... Full stop.
Comments. Because of italics, the double slash made popular by Java and C++ is not an option. In order to reduce confusion, corresponding slash-asterisk combination cannot be used because for most people, they are both parts of the same set of conventions.
So line comments starts with a double percent sign, and block comments are delimited with double accolades.
%% Single-line comment.
{{ Block
comment }}
Single accolades may have a special meaning but they are free for now.

No comments: