Path Validation Language

PVL is a simple language for expressing simple validation constraints and also information item stripping policy concisely. It is suitable for being derived from a schema, and being tightly integrated into an XML parser or SAX stream at a low level.

PVL is made from one or more namespace declarations (ns elements) followed by an actions element. The actions element is line oriented, with a kind of XPath followed by an error keyword (+ allow, w warn, X error) optionally followed by the keyword - strip (do not pass the information on) or 0 fail (halt processing.) Actions are matched starting from the top until the first pattern succeeds. Otherwise they fail.

The semi-XPath is just something likeo

     ((( prefix ":")? name)? "/")? 
            ((prefix ":")? (("@")? name) 
            | "#DATA" | #WS | #COMMENT | PI | #DOCTYPE
where prefix and name could be wildcard "*" and the #WS is for whitespace runs.

PVL addresses several problems: how to remove non-significant whitespace pre-DOM without full validation, how to fail early on gross validation errors thus allowing more complete validation to be performed as a separate pass or phase without tieing up resources, how to have most of the flexibility of order-free, contextual validaiton of elements and attributes without building a grammar or risking DFD blowout, how to get some path benefits without a full random access schematron-style DOM blowout, how to enforce extra requires such as SOAP's no PI rule but without confusing it with XML WFness. Most particularly, how to do these with a small language that would be trivial to implement using the parsers existing stack.

The following schema defines that a document must start with xxx:yyy as the root element, that whitespace children of xxx:yyy are to be stripped, that xxx:zzz children are allowed with data content or an xxx:eee element, that xxx:eee is an empty element, that comments and PIs are stripped, and that a DOCTYPE declaration should generate a warning.

  <pvl:schema xmlns:pvl="...">
     <pvl:ns prefix="xxx" uri="..."/>

     /xxx:yyy        +
     /*              X 0
     /*:*            X 0
     xxx:yyy/#WS     + -
     xxx:zzz/#DATA   +
     xxx:zzz/xxx:eee +
     xxx:eee/#DATA   X 0
     #COMMENT        + -
     #PI             + -
     #DOCTYPE        w
     *:*/*:*         X 0 
     *:*/*           X 0