Global Exclusions are so Easy in Schematron

Posted on July 17, 2021 by Rick Jelliffe

Very often, corporate or industry standard schemas have way more elements than are needed for some application. They need a subset of the kitchen sink.

And sometimes elements lose favour: who now remembers HTML’s <marquee> element?

Declaring that some set of elements is not used is really hard in DTDs and XSD 1.0: you have to go through every content model that uses the elements and remove them (as part of “restricting the base”), and you have to make sure you have not caused any ambiguous content models on the way, which may involve removing dead branches and refactoring.

XML’s venerable predecessor, SGML, had a system called “exclusion exceptions” which allowed you to specify that for any element type and all its children, some other element types were not allowed. So validation meant checking the input document against the content model, but raising an error if an excluded element cropped up. Where these exceptions occur on the root element type of the document, we call them global exclusions.

Global exclusions are so easy in Schematron.

Lets say we want to exclude elements A, B, C, D, E, F, G from our documents. In Schematron terms, we can call this a “pattern” throughout the document, because it hangs together in some way and is not dependent on any other constraints. And so we can easily make a very comprehensible pattern to implement it:

<sch:pattern id="global-exclusions">
   <sch:rule context= " A | B | C | D | E | F | G">
      <sch:report test="true()">The elements A, B, C, D, E, F
      and G are excluded</sch:report>
   </sch:rule>
</sch:pattern>

In fact, if we are adding this restriction to an existing Schematron schema, because we are happy if this exclusion supersedes any other existing assertions about those elements, we could actually just add the rule in the first position of any existing pattern and it will work. (And, actually, all we need to do is make sure it comes before any rule whose context may be A-G.)

Series of Variants

Now, what if we have a series of these exclusions? For example, say our starting schema was made in year 2001 and it define 26 elements A-Z. In 2011 we decide to exclude A-G as above. In 2021 we decide to restict H as well, but re-allow A. Schematron lets us capture all this.

<sch:phase id="version-2001">
   <sch:active pattern="whatever"/>
<sch:phase>
<sch:phase id="version-2011">
   <sch:active pattern="global-excluseions-2011"/>
   <sch:active pattern="whatever"/>
<sch:phase>
<sch:phase id="version-2002">
   <sch:active pattern="global-excluseions-2021"/>
   <sch:active pattern="whatever"/>
<sch:phase>

<sch:pattern id="global-exclusions-2011">
   <sch:rule context= " A | B | C | D | E | F | G">
      <sch:report test="true()">The elements A, B, C, D, E, F
      and G are excluded</sch:report>
   </sch:rule>
</sch:pattern>

<sch:pattern id="global-exclusions-2021">
   <sch:rule context= " B | C | D | E | F | G | H">
      <sch:report test="true()">The elements B, C, D, E, F, G
      and H are excluded</sch:report>
   </sch:rule>
</sch:pattern

Excluded and Allowed and Unexpected

What if we want to exclude some elements, but also use our Schematron schema to let us know if there are unexpected elements too? For example, someone spelled an element name wrong or put in one from the wrong namespace or just plain added an element that is not part of our intended document type?

Again, this is entirely straightforward: we make three rules, one each for excluded, allowed and unexpected.

<sch:pattern id="alllowed-excluded-unexpected">
   <sch:rule id="excluded" context= " A | B | C | D | E | F | G">
      <sch:report test="true()">The elements A, B, C, D, E, F
      and G are excluded</sch:report>
   </sch:rule>

   <sch:rule id="allowed" context= " H | J | K">
      <sch:report test="false()">The elements H I
      and J are allowed</sch:report>
   </sch:rule>

   <sch:rule id="unexpected" context="*">
     <sch:report test="true()">Any other element is unexpected.
     </sch:report>
   </sch:rule>
</sch:pattern>

How does this work? The Schematron schema is evaluated by looking at every node in the document. A node can only match one rule’s context expression, the first one provided. So the rule “excluded” swallows any element nodes A-G. Then the second rule swallows any element nodes H-K: it has @test=”false()” so the report will succeed: nevertheless we put in our assertion text “The elements H I and J are allowed” as documentation. Finally, the third rule swallows any other elements and so complains about anything else.

We could provide more useful diagnostics by, for example, dynamically retrieving the actual name that caused the problem:

<sch:rule id="unexpected" context="*">
     <sch:report test="true()">Any other element is unexpected.
     Found <sch:name/>.
     </sch:report>

</sch:rule>

Non-Global Exclusions

You can see that it is easy to use this method where we only want to ban some elements from some context:

<sch:pattern id="non-global-exclusions">
   <sch:rule context= " Z/A | Z/B | Z/C ">
      <sch:report test="true()">The elements A, B, and C
     are excluded under Z</sch:report>
   </sch:rule>
</sch:pattern>

In Schematron using XPath2, XSLT2 or more recent, you should be able to use a context like this, to stop an explosion of particles:

( X | Y | Z)/( Z | B | C)

Global Exclusions of Attribute

We can use the same mechanism. (In some Schematron implementation, you may have to turn on attribute visiting, for this.)

<sch:pattern id="non-global-exclusions">
   <sch:rule context= " @A | @B | @C ">
      <sch:report test="true()">The attributes
      A, B or C are excluded. </sch:report>
   </sch:rule>
</sch:pattern>

Another way to express the same thing follows, if you prefer:

<sch:pattern id="non-global-exclusions">
   <sch:rule context= " *[@*] ">
      <sch:report test=" @A or @B or @C">The attributes
      A, B or C are excluded. </sch:report>
   </sch:rule>
</sch:pattern>

Hint: In the sch:rule/@context, we have to use “|” not “or” because we are selecting nodes. In the sch:report/@test here, we could use “|” or “or”. Using “or” evaluates the presence of each attribute as true() and the absence as false() and ors them. If we used test=” @A | @B | @C ” here, we would be making a list of any attributes found, then the @test would cast that to boolean

You can see how these global attribute exclusions can be put into the allowed rule from the allowed/excluded/unexpected pattern above:

<sch:rule id="allowed" context= " H | J | K">
      <sch:report test="false()">The elements H I
      and J are allowed</sch:report>
     <sch:report test=" @A or @B or @C">The attributes
      A, B or C are excluded. </sch:report>
   </sch:rule>

This is very straightforward.

Caveat: Contradictory Schemas

Now it will not help if there are other rules whose assertions require banned elements: what you have is a contradictory schema: for example, if you have one assertion that says an M must contain an N, but then you exclude all Ns by some other assertions, then the assertion for M will fail.

These global assertions exclude elements as rule contexts, so their presence will be reported; they do not re-write the operation of any assertion tests.