Converting XML Schemas to Schematron: (#7) Validating special complex content types

This article first appeared on a blog on O'Reilly on November 2, 2007.

A year ago I wrote in this blog a precursor to this series Converting Content Models to Schematron , in which I outlined one approach. This blog item is an update on that, in particular for special cases, clearing the decks with them leaves us free to look at XML content models:

  • Empty elements
  • Text content (untyped)
  • Element content
  • XSD ALL content models

Empty Elements

Empty elements are easy. (Update: 2007-11-09)

<xsl:template match="xs:element[xs:complexType
                    [not(xs:simpleContent)]
                    [not(@mixed='true')]
                    [not(.//xs:element)]]"        priority="100">
        <sch:rule>
                <xsl:call-template name="generate-element-context"/>
                <xsl:comment>Check Empty Elements: They can't have
                        1, text nodes 2, elements 3, comments 4, processing-instructions </xsl:comment>
                <sch:assert test="count(*|processing-instruction()|comment()|text()) = 0" diagnostics="d1">
                Element <sch:name/> should have no content.</sch:assert>
        </sch:rule>
</xsl:template>

Text Elements (Untyped)

Text elements are easy too.

<xsl:template match="xs:element[xs:complexType[xs:simpleContent]]" priority="99">
        <sch:rule>
                <xsl:call-template name="generate-element-context"/>
                <xsl:comment>Check Text Only: They can't have
                        1, elements </xsl:comment>
                <sch:assert test="count(*) = 0" diagnostics="d1">
                Element <sch:name/> should have text content and attributes only, but no sub-elements.
                (They may have procesing instructions and comments.0</sch:assert>
        </sch:rule>
</xsl:template>

Element Content

For element content elements, we’ll just check that they don’t have text, for this pattern. (We will check whether the elements it has are allowed in a different pattern, in a future blog.)

<xsl:template match="xs:element
                    [xs:complexType[not(@mixed='true')][not(xs:simpleContent)]]" priority="98">
        <sch:rule>
                <xsl:call-template name="generate-element-context"/>
                <xsl:comment>Check None Text found: They can't have
                        1, any text content </xsl:comment>
                <sch:assert test="string-length(normalize-space(string-join(text(), ''))) = 0" diagnostics="d1">
                Element <sch:name/> should have no text content.</sch:assert>
        </sch:rule>
</xsl:template>

The ALL Content Model

The ALL content model, in XSD, is a way of saying that all the elements are
required (or optional) but they can be in any order. To do this with a grammar runs the risk of a combinatorial explosion, but the ALL content model is very straightforward to implement in Schematron, but we have to break it into its component assertions.

FIrst, the ALL content model is closed (we don’t implement wildcards.) So we count that the total number of elements is equal to the sum of the counts of the allowed elements. If the element requires all A, B and C, then we count(A) + count(B) + count(C) = count(*) which is another example of how in Schematron you solve many problems by counting.

<xsl:template match="xs:element[.//xs:all]" priority="90">

<xsl:comment>======= Handle XS:ALL ========</xsl:comment>

<sch:rule>

        <xsl:call-template name="generate-element-context"/>

        <xsl:comment>check allowed elements</xsl:comment>

        <sch:assert  >

                <xsl:attribute name="test">

                        <!-- get names of each allowed element -->

                        <xsl:for-each select=".//xs:all/xs:element">

                                <xsl:text>count(</xsl:text>

                                <xsl:value-of select="if (@name) then @name else @ref" />

                                <xsl:text>)</xsl:text>

                                <xsl:if test="following-sibling::xs:element"> + </xsl:if>

                        </xsl:for-each>

                        <xsl:text> = count(*)</xsl:text>

                </xsl:attribute>

                        The element <xsl:value-of select ="@name"/> can only have the following elements:

                <!-- get names of each allowed element -->

                <xsl:for-each select=".//xs:all/xs:element">

                        <xsl:value-of select="if (@name) then @name else @ref" />

                        <xsl:if test="following-sibling::xs:element">, </xsl:if>

                </xsl:for-each>.

        </sch:assert>

Next we generate an assertion that each element only occurs with the cardinality of the maxOccurs and minOccurs.

<xsl:for-each select=".//xs:all/xs:element">

                <xsl:variable name="ancestor-element" select="ancestor::xs:element/@name"/>

                <xsl:variable name="element-name" select="if (@name) then @name else @ref"/>

                <xsl:variable name="MAXOccurs" select="if (@maxOccurs) then @maxOccurs else '1'"/>

                <xsl:variable name="MINOccurs" select="if (@minOccurs) then @minOccurs else '1'"/>

                <xsl:choose>

                        <xsl:when test="$MAXOccurs = $MINOccurs">

                                <sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

                                        <xsl:attribute name="test">

                                                        count(<xsl:value-of select="$element-name"/>) = <xsl:value-of select="$MAXOccurs"/>

                                        </xsl:attribute>

                                                There should be <xsl:value-of select="$MAXOccurs"/> of element <xsl:value-of select="$element-name"/>

                                </sch:assert>

                        </xsl:when>

                        <xsl:otherwise>

                                <sch:assert  >

                                        <xsl:attribute name="test">

                                                        count(<xsl:value-of select="$element-name"/>) <= <xsl:value-of select="$MAXOccurs"/>

                                        </xsl:attribute>

                                                There should be at most <xsl:value-of select="$MAXOccurs"/> of element <xsl:value-of select="$element-name"/>

                                </sch:assert>

                                <sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

                                        <xsl:attribute name="test">

                                                        count(<xsl:value-of select="$element-name"/>) >= <xsl:value-of select="$MINOccurs"/>

                                        </xsl:attribute>

                                                There should be at least <xsl:value-of select="$MINOccurs"/> of element <xsl:value-of select="$element-name"/>

                                </sch:assert>

                        </xsl:otherwise>

                </xsl:choose>

        </xsl:for-each>

</sch:rule>

</xsl:template>

So every element with an ALL type only requires a single rule to implement.

Now we want to add some more information for better diagnostics, so for each of the count rules we implement

<sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

and we generate the corresponding diagnostics to give an actual count of the overpopulation:

<xsl:for-each select="xs:element[.//xs:all]//xs:all/xs:element">

        <xsl:variable name="ancestor-element" select="ancestor::xs:element/@name"/>

        <xsl:variable name="element-name" select="if (@name) then @name else @ref"/>

        <sch:diagnostic id="{concat('d2-',$ancestor-element,'-',$element-name)}">  elements were found

</xsl:for-each>

In Schematron , we make a distinction between the assertion text, which is a positive statement of what is true, and diagnostics, which give extra help to humans. Very often people new to Schematron want to put diagnostic messages as the assertion text. (Indeed, some of the programmers working on this project did it, so it is not an obvious thing sometimes.) To get the idea, think about what happens if you want to generate a paper document with the schema printed out, with one bullet point per assertion: the diagnostics information would not make much sense, while usually good assertions would be perfectly readable and useful for domain experts.

Housekeeping

Finally, here are a couple of useful housekeeping elements, to be used in the same pattern as above: these give warnings about which element declarations are actually handled, to prove the converter.

<xsl:template match="xs:element[@ref]" priority="1" >
        <xsl:message>PROGRAMMING ERROR: trying to process an element reference.</xsl:message>
</xsl:template>

<xsl:template match="xs:element" >
        <xsl:message>I don't know how to handle this kind of element declaration yet.</xsl:message>
</xsl:template>