Converting XML Schemas to Schematron: (#2) macro processing the XSD

This article appeared in an O"Reilly blog on October 11, 2007 . Links have not been checked.

Here is some XSLT scripts for macro-expanding a set of XSD schemas into a single file with references removed, as a more optimal form for schema interrogation and conversion.

Converting an XSD schema into Schematron involves three stages:

  • Preparing the XSD schemas so they are in an optimal form for transforming out from
  • Converting the grammar and datatype constraints of this prepared schema into Schematron for elements and datatypes
  • Converting the other constraints such as KEY and ID into Schematron.

This blog item gives some beta XSLT code for the first part. A pipeline of three XSLT scripts are used:

  • INCLUDE: starting from a schema, substitute all the included and imported schemas in-place. (<redefine> is not supported in this version.)
  • FLATTEN: move schemas for different namespaces to the top-level, removing duplicates.
  • EXPAND: substitute references to complexType, group, attributeGroup and remove declarations (substitution groups and wildcards are not supported in this version.)

The result is a document with a top-level element of <schemas> contain <namespace> elements each containing an XML Schema module for a single namespace. These modules contain element, attribute and simpleType declarations, but structural references have been replaced. This resolved form makes the job of converting to Schematron much easier, because there are fewer cases to consider and simpler paths. And all the schemas are gathered into a single file.

I have put the beta XSLT files here. It will go to sourceforge or somewhere eventually: watch this space. But I have been frustrated by the lack of tools that expand out XSD schemas, so this code may be useful for other things (I may rewrite Topologi’s XSD to RELAX NG converter to use this as the front end, for example):

I would like to acknowledge JSTOR as the sponsor for this code. Thanks to Matt Stoeffler. It is licensed under GPL as open source.