Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

PMML validation

Predictive Model Markup Language (PMML) is the leading standard for statistical and data mining models. PMML describes one or more structures of the data mining models in XML document with a root element of type PMML.

Our Pervasive DataRush-Analytics project provides the following data mining models: AssociationModel, NaiveBayesModel, and RegressionModel.  The PMML generated from these models can be shared and exchanged from one environment to another, but the PMML needs to be validated against the schema to find any problems that may need to be fixed. 

To guarantee validation, the Pervasive DataRush-Analytics model uses both XSD validation and XSLT validation as recommended by data mining group.

First step:  XSD Validation :

Get the PMML XSD 3.2 schema

Here is an example of validating PMML file against PMML XSD schema:

public void pmmlXSDValidate(String schemaPath, String sourcePath) {
try {
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);Source schemaFile = new StreamSource(new File(schemaPath));

Schema schema = factory.newSchema(schemaFile);

Validator validator = schema.newValidator();

validator.validate(new StreamSource(sourcePath));

} catch (SAXException e) {

..........

} catch (IOException e) {

..........

}

}

XSD validation is a necessary part, but not sufficient by itself for determining if a PMML model is valid.

Second step: XSLT Validation:

Get the PMML XSLT style sheet

Here is an example of XSLT validation.

public void pmmlXSLTvalidate(String stylesheetPath, String sourcePath, String resultPath) {
try {

DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();

//This setting will ignore the namespace

docFactory.setNamespaceAware(false);

DocumentBuilder parser = docFactory.newDocumentBuilder();

Document document = parser.parse(
new FileInputStream(sourcePath));

Source pmmlSource = new DOMSource(document);

Source xsltSource = new StreamSource(new FileInputStream(stylesheetPath));

TransformerFactory transFactory = TransformerFactory.newInstance();

Transformer transformer = transFactory.newTransformer(xsltSource);

transformer .transform(pmmlSource , new StreamResult(resultPath));

//check result after transformation

......................

} catch (TransformerConfigurationException e) {

..............

} catch (TransformerException e) {

..............

}

}

It is possible that problems may still exist even if the PMML is validated, but running this test lowers the probability.  Once validated, Pervasive DataRush-Analytics models will provide specified results to help you analyze your business data and predict customer need.

Comments

No Comments