Chapter 3. Running Schematron

Table of Contents
If you are used to running XSLT transforms this won't be much of a surprise for you. If it's new, and you're learning, I hope it's enough to get you started. For this sequence I'm using an XSLT 2.0 implementation from Mike Kay called Saxon. 3 may be downloaded from 4. Having installed Saxon, I have a couple of scripts to run Schematron. I'll show the Windows version first. So, Example 3.1, “Schematron Script, for windows” shows this script.
Example 3.1. Schematron Script, for windows
Note the use of \ as a continuation character.
Such lines should be put all on one line
@echo off
cls
echo Usage: build %%1 = iso schematron file, no extension.
echo %%2 is the input xml file, with the extension.
echo E.g. build input input.xml will produce input.report.xml as output
del tmp.xsl
echo Generate the stylesheet from %1
java -mx250m -ms250m -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \ If this image does not appear, you need to resolve it in PageSeeder
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 \
-o tmp.xsl %1.sch iso_svrl.xsl
echo Now run the input file %2 against the generated stylesheet \
tmp.xsl to produce %1%.report.xml
java -mx250m -ms250m -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \ If this image does not appear, you need to resolve it in PageSeeder
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 \
-o %1.report.xml %2 tmp.xsl
type %1.report.xml
Please note the comment about long lines. Where a line ends in a backslash, please remove it and join it to the following line.
The first point to note is that I have installed Saxon8.jar, the Saxon XSLT processor jar file, into a directory called myjava on the root of the current disk. If you installed it elsewhere, please change this. I'm assuming you're running java 1.5. If you're not, you're on your own!
For Linux, Example 3.2, “Schematron Script, for Linux” is suitable, with the same constraints concerning where your XSLT 2.0 engine is installed.
Example 3.2. Schematron Script, for Linux

clear
echo Usage: build $1 = iso schematron file, no extension. $2 is the input xml file, with extension.
echo E.g. build input input.xml will produce input.report.xml as output

if [ $# -ne 2 ]
   then
   echo "Usage: build <filename>.sch <filename.ext> to use filename.sch to validate filename.ext"
   exit 2
fi

if [ -f $1.sch ]
   then
     echo
   else
     echo Schema file $1 not found
     exit 2
fi

if [ -e $2 ]
   then
   echo
   else
     echo input file $2 not found
     exit 2
fi



if [ -e tmp.xsl  ]
  then
    rm -f tmp.xsl
fi

if [ -e $1.report.xml ]
   then
    rm $1.report.xml
fi
echo Validate the schema
cp=/myjava/jing.jar:/myjava/saxon652.jar:/myjava/xercesImpl.jar:/myjava/xml-apis.jar

java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/isoSchematron.rng $1.sch

if [ $? -eq 0 ]
   then
     echo $1.sch is valid
   else
     echo Invalid Schematron file
     exit 2
fi


echo Generate the stylesheet from $1

java  -mx250m -ms250m  -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
       net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   \
       -o tmp.xsl    $1.sch /sgml/schematron/iso/iso_svrl.xsl  "generate-paths=yes"

# Add source document paths with the parameter "generate-paths=yes"





if [ $? -eq 0 ]
  then 
  echo run the input file $2 against the generated stylesheet $1.xsl to produce $1.report.xml

  java  -mx250m -ms250m  -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
    net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   -o $1.report.xml $2 tmp.xsl

  if [ -e $1.report.xml ]
   then
    #cat $1.report.xml
    java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/svrlDP.rng $1.report.xml
    if [ $? -eq 0  ]
      then
      echo $1.report.xml is valid
    else
      echo $1.report.xml is invalid
    fi
  fi

fi
echo Done
Using input.sch and input.xml as the schematron file and input file (just as in the examples above), the first transform generates an XSLT stylesheet called tmp.xsl. The next transform uses this stylesheet, and the input file, input.xml to produce an output file called input.report.xml.

Note

Note that this does not include any include processoring, nor any abstract pattern processing. This requires a further two stages of processing prior to the above.
If you run it, you should see something like the following as the output, which is output to the console as the last action of the script.
Example 3.3. Schematron output
Warning: at xsl:stylesheet on line 89 of file:/C:/sgml/schematron/iso/iso_svrl.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
run the input file input.xml against the generated stylesheet input1.report.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/> If this image does not appear, you need to resolve it in PageSeeder
<svrl:fired-rule context="chapter"/> If this image does not appear, you need to resolve it in PageSeeder
<svrl:fired-rule context="chapter"/> If this image does not appear, you need to resolve it in PageSeeder
</svrl:schematron-output>" "
No, not very interesting is it! This is the reality of testing. The less output the better! the lines containing fired-rule simply indicate that the rules within the chapter context were fired (i.e. they ran) three times. Exactly what we'd expect, with three chapters in our input file! So I'd class that as a success. There is a little more about this language in Chapter 12, Schematron Validation Report Language (SVRL)Chapter 12, Schematron Validation Report Language (SVRL). If you are curious, see annex D of 2.
Just to see what happens, you could remove the title from one of the chapters in input.xml and re-run the script. I removed the title from the second chapter. The output changed to
Example 3.4. Schematron output
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/>
<svrl:failed-assert test="title"> If this image does not appear, you need to resolve it in PageSeeder
<svrl:text>Chapter should have a title</svrl:text>
</svrl:failed-assert>
<svrl:fired-rule context="chapter"/>
</svrl:schematron-output>
Which output tells us that for the second time that the rule fired, the assertion failed, hence the output message is seen. That's the output received, derived directly from the statement in the input schematron file. XSLT can process that into any format you might need.
Moving on from the assert statement.
The second element of prime interest in Schematron is essentially the inverse of the assert element, it's the report element as defined in ¶ 5.4.11 in 2. The syntax is just the same as the assert, the only change being the element used and the semantics. This can lead to confusion. The standard reads, “if the test evaluates positive, the report succeeds”. Which reads almost identically to the assert semantics! Yet if you play around with it, you'll find that an output message is seen under the inverse conditions of the assert. My view on this is that we should use a report when something is not as it should be. The logic here is that the test should make a positive statement. That way the report element seeks invalid content and reports it, the assert statement seeks errors and reports them. Yet again, think on that for a while and the difference should become clear.
Returning to the title element, we could generate a report each time we found a title in a chapter. In my view that's not very useful. What I'm going to suggest is a report element which counts the number of paragraphs within a chapter and reports that number. Feeble, but it shows two aspects of Schematron. Firstly the use of report, and secondly the abstraction of information for the report, from the source document. Example 3.5, “Using thereport element” shows the updated schematron file. The input file has changed as shown below.
Example 3.5. Using the report element
<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
queryBinding='xslt2'
schemaVersion="ISO19757-3">
<iso:title>Test ISO schematron file. Introduction mode</iso:title>
<!-- Not used in first run -->
<iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />
<iso:pattern >
<iso:rule context="chapter">
<iso:assert test="title">Chapter should have a title</iso:assert>
<iso:report test="count(para)"> If this image does not appear, you need to resolve it in PageSeeder
<iso:value-of select="count(para)"/> paragraphs</iso:report> If this image does not appear, you need to resolve it in PageSeeder
</iso:rule>
</iso:pattern>
</iso:schema>
That is the full file used.
The input file has changed insofar as a few more para elements have been added. It now looks like Example 3.6, “The updatedinput.xml file”. The value-of element in the Schematron namespace is used to obtain information from the source document.
Example 3.6. The updated input.xml file
<?xml version="1.0" encoding="utf-8" ?>
<doc>
<chapter id="c1"> If this image does not appear, you need to resolve it in PageSeeder
<title>chapter title</title>
<para>Chapter content</para>
</chapter>
<chapter id="c2"> If this image does not appear, you need to resolve it in PageSeeder
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
</chapter>
<chapter id="c3"> If this image does not appear, you need to resolve it in PageSeeder
<para>Para in the wrong position</para>
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
<para>aa</para>
</chapter>
</doc>
When this file is run, the output should be something like Example 3.7, “Resultant svrl output file”
Example 3.7. Resultant svrl output file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)"> If this image does not appear, you need to resolve it in PageSeeder
<svrl:text>1 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)"> If this image does not appear, you need to resolve it in PageSeeder
<svrl:text>3 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)"> If this image does not appear, you need to resolve it in PageSeeder
<svrl:text>4 paragraphs</svrl:text>
</svrl:successful-report>
</svrl:schematron-output>
Note how it has suddenly become far more dense? The level of markup is starting to hide the actual information content, hence the need for a further tranform to format it the way you want. The only items of interest are the lines which output (for each chapter) the paragraph count. This tells us the number of paragraphs in the three chapters. Informative? Maybe, but I think the ideas are clear.
Before moving on to other aspects of Schematron, I want to diverge just a little into decorations. Sometimes they are useful, other times you may have no use for them at all. I find them useful for adding to the output such things as versioning information, the data and time processed etc. I'll show how and where they are added, then you can use them if you choose. Chapter 10, Decorating the outputChapter 10, Decorating the output discusses this further. No change to the input file, but the Schematron file input.sch has a few additions. See Example 3.8, “A decorated Schematron file”
Example 3.8. A decorated Schematron file
<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
queryBinding='xslt2'
schemaVersion="ISO19757-3">
<iso:title>Test ISO schematron file. Introduction mode </iso:title> If this image does not appear, you need to resolve it in PageSeeder
<!-- Not used in first run -->
<iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />
<iso:pattern id="doc.checks">
<iso:title>checking an XXX document</iso:title> If this image does not appear, you need to resolve it in PageSeeder
<iso:rule context="doc">
<iso:report test="chapter">Report date.
<iso:value-of select="current-dateTime()"/></iso:report> If this image does not appear, you need to resolve it in PageSeeder
</iso:rule>
</iso:pattern>
<iso:pattern id="chapter.checks">
<iso:title>Basic Chapter checks</iso:title> If this image does not appear, you need to resolve it in PageSeeder
<iso:p>All chapter level checks. </iso:p> If this image does not appear, you need to resolve it in PageSeeder
<iso:rule context="chapter">
<iso:assert test="title">Chapter should have a title</iso:assert>
<iso:report test="count(para)"><iso:value-of select="count(para)"/> paragraphs</iso:report>
<iso:assert test="count(para) >= 1">A chapter must have one or more paragraphs</iso:assert>
<iso:assert test="*[1][self::title]">Title must be first child of chapter</iso:assert>
<iso:assert test="@id">All chapters must have an ID attribute</iso:assert>
</iso:rule>
</iso:pattern>
</iso:schema>
Going through this, notice the following:
  1. Another pattern has been added for document level checks
  2. The report in that section outputs the current date and time using XSLT functionality.
  3. In that same pattern a title has been added which produces output in the final report. (The first element of decoration)
  4. In the chapter.checks pattern, a title and p element has been added, which may be useful for your purposes.
  5. An assert statement has been added to check that the title element is the first child of a chapter element.
  6. An assert statement has been added to ensure that a chapter has one or more paragraph.
  7. An assert statement has been added to check that each chapter has an id attribute.
  8. The report statement, counting paragraphs have been left in.
Running this version produces output as shown in Example 3.9, “The output report with decorations”
Example 3.9. The output report with decorations
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode "
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern name="doc.checks" id="doc.checks"/>
<svrl:fired-rule context="doc"/>
<svrl:successful-report test="chapter">
<svrl:text>Report date.2007-01-19T14:33:41.153Z</svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:successful-report>
<svrl:active-pattern name="chapter.checks" id="chapter.checks">
<svrl:text>All chapter level checks. </svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:active-pattern>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>1 paragraphs</svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>3 paragraphs</svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>4 paragraphs</svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:successful-report>
<svrl:failed-assert test="*[1][self::title]">
<svrl:text>Title must be first child of chapter</svrl:text> If this image does not appear, you need to resolve it in PageSeeder
</svrl:failed-assert>
</svrl:schematron-output>
You should be able to see where each of the additions arises. The decorations as I've called them are all within the text elements. It's your choice if you use them.
That summarizes the basics of Schematron. The functionality has grown from this basis. You can achieve a great deal with these two, which formed the basis of the intial Schematron.