How to make your markup language pleasant: linear and unfolding

Posted on November 7, 2017 by Rick Jelliffe

Why is Schematron relatively pleasant to read, by all accounts, while something like XProc (or XSD) is relatively difficult?

Both are small, specialized languages which I have used in large projects, and I have been trying to put my finger on why I like one but am hesitant about the other (apart from parochial reasons: XProc is a language I *want* to like, but it never inspired warm feelings.) While the context of text may be symphonic, with lot of things going on, or its format may be in multiple dimensions, at its heart it is one letter after another, one word after another, one sentence after another, and so on. Text is an attempt to capture speech. And XML is text.

And, we might say, if we define graphics as 2D media, we can say that graphics are a map or projection of something. And if we make that 3D but static, we get holograms. And if we add movement, we get film.

And this has implications for the design of markup languages. The more that the information we are marking up can be comprehended as an unfolding where one thing follows the next, the more that it can, as a small to medium program, be understood. So here is my theory: if the information we are marking up has a natural representation that is graphical, or holographic or filmic rather than being like linear speech, then we find it difficult to understand when linearized into XML. In those cases, the user needs a tool to mediate it, and the XML (while still being able to represent the information) loses its value as a medium of human communication.

Why are XML Schemas so hard to read? Not only just because of the arcane rules (much improved in XSD 1.1) but because you forever have to jump around: you need a hypertext version of it, or a tree version of it, to be effective: and those IDE tools turn text into graphics (wander around the tree), or holographs (look at something from different viewpoints), or even films (run the step debugger), in effect.

And I think this goes someway to explain my frumpiness towards XProc, despite having found it useful (using Norman Walsh’s Calabash implementation in Java). XProc is crying out for integration into an IDE with some kind of hypertext or graphical layout: the XProc system of ports destroys any “narrative” of what the pipeline is: you don’t have a top-down or unfolding view.

Schematron, on the other hand, can gives quite strong top-down and linear struture, to allow it to be approachable as text or speech: not just the various titles and paragraphs and rich text.

At the top you get a declaration of all the namespaces: no complex searches for prefix binding.
After you get a declaration of the phases: which patterns will be active (and which can be ignored): no complex searches for what is in effect or our of effect.
Then you get each pattern, which can be in any order, but have no interconnections: no searching between patterns.
Then each rule, and the rules form a linear if-then-else chain within the pattern.
And the assertions are unordered, but have no interconnections.

So the popular parts of Schematron all are “texty”: a linear unfolding, with no forward references that you need to follow, limited context and establishing information at the head. The unpopular parts of Schematron don’t accord to that rule: abstract patterns, abstract rules, even variables, you don’t know where in the scope that information might be, ahead or behind. But at least they give names to things. Phases, diagnostics and properties all have their implementation following in specific sections after the declaration, so that is still linear in a fashion, but it does require jumping back and forward to understand well, so is perhaps 2D.

Under my theory we would expect people to use diagnostics and properties only when they need to be well organized, and to use abstract rules and abstract patterns only when they have no choice, which I think corresponds to reality

Now, I don’t think the idea of whether considerations of whether a new technical markup language has a nice texty flow (strictly keeping basic features linear and encouraging an unfolding of detail rather than a scatter of detail, etc.) is one the RADAR for most developers. I think the evidence of the markup languages I see as being a clear implicit “no“, or perhaps “huh” or at best “who cares about your untested theories about usability Rick, leave me alone“. (Of course, multiple factors explain any successful language, including the luck of time.)