Validation With SHACL

The SHapes Constraint Language (SHACL) is a language for validating RDF graphs.

This documentation is for RDF4J 2.5 and onwards, as the SHACL Engine was experimental until this release.

Understanding the RDF4J SHACL Engine

The SHACL Engine works by analyzing the changes made in a transaction and creating a set of validation plans (similar to query plans) and executing these as part of the transaction commit() call.

The SHACL Engine can often validate your changes based solely on the changes themselves. However, there are times when your alterations may have an impact on, or be influenced by, pre-existing data in the database. In these cases, the engine retrieves the related data from the database.

Let’s consider an example to understand how the current data in the database can influence the validation plans:

ex:PersonShape
    a sh:NodeShape  ;
    sh:targetClass ex:Person ;
    sh:property [
        sh:path ex:age ;
        sh:datatype xsd:int ;
        sh:message "A person's age must be an integer (xsd:int)." ;
    ] .

Here is the initial data in the database:

ex:pete a ex:Person.

Next, a transaction adds the following data:

ex:pete ex:age "eighteen".

In this scenario, the SHACL Engine will match the predicate ex:age with ex:PersonShape and realize that ex:pete may be already defined as an ex:Person in the database. As a result, the validation plan will include a check to confirm if ex:pete is indeed categorized as ex:Person.

How to load and update SHACL shapes

By default, the ShaclSail uses a reserved graph (http://rdf4j.org/schema/rdf4j#SHACLShapeGraph) for storing the SHACL shapes. Utilize a normal connection to load your shapes into this graph. SPARQL is not supported.

The Shapes Graph section explains how to store your shapes in a different named graph.

ShaclSail shaclSail = new ShaclSail(new MemoryStore());
Repository repo = new SailRepository(shaclSail);

try (RepositoryConnection connection = repo.getConnection()) {

    Reader shaclRules = ....

    // add shapes
    connection.begin();
    connection.add(shaclRules, "", RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();
    
    // clear existing shapes and add new ones in single transaction (eg. update shapes)
    connection.begin();
    connection.clear(RDF4J.SHACL_SHAPE_GRAPH);
    connection.add(shaclRules, "", RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();

    // clear all shapes
    connection.begin();
    connection.clear(RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();
    
    // connection.clear(); will not clear the Shapes graph!


}

You can at any point update your shapes. Updating shapes will cause your data to be re-validated. The transaction will fail if the data is not valid according to the changed shapes.

The simplest way to update your shapes is within a transaction, drop the shapes graph and load your updated shapes. The ShaclSail will only apply the actual changes. Modifying shapes in parallel should not be a problem, but may cause slowdowns.

Remember that a full validation is run for all the affect shapes. Depending on the amount of data already in your store this may take a significant amount of time. If you are sure that your data will not violate the shapes, or otherwise need to skip validation then you can read the section on Performance.

Do not use SPARQL to update your shapes!

Supported SHACL features

The SHACL W3C Recommendation defines the SHACL features that should be supported and RDF4J is working hard to support them all. At the moment a fairly large subset of these features are supported by the ShaclSail, however quite a few are not yet implemented.

An always-up-to-date list of features can be found by calling the static method ShaclSail.getSupportedShaclPredicates().

As of writing this documentation the following features are supported.

sh:targetClass
sh:targetNode
sh:targetSubjectsOf
sh:targetObjectsOf
sh:target for use with DASH targets and sh:SPARQLTarget
sh:path, sh:inversePath, sh:alternativePath and sequence paths
sh:node
sh:property
sh:and
sh:or
sh:not
sh:minCount
sh:maxCount
sh:qualifiedMinCount
sh:qualifiedMaxCount
sh:qualifiedValueShape
sh:nodeKind
sh:minLength
sh:maxLength
sh:pattern and sh:flags
sh:languageIn
sh:datatype
sh:class
sh:hasValue
sh:in
sh:minExclusive
sh:minInclusive
sh:maxExclusive
sh:maxInclusive
sh:uniqueLang
sh:sparql
sh:select
sh:prefixes
sh:prefix
sh:namespace
sh:declare
sh:deactivated
sh:severity
sh:message
sh:shapesGraph
dash:hasValueIn
rsx:targetShape

DASH and RSX features need to be explicitly enabled, for instance with setDashDataShapes(true) and setEclipseRdf4jShaclExtensions(true). These are currently experimental features. For more information about the RSX features, see the RSX section of this document.

Implicit sh:targetClass is supported for nodes that are rdfs:Class and either of sh:PropertyShape or sh:NodeShape. Validation for all nodes, equivalent to owl:Thing or rdfs:Resource in an environment with a reasoner, can be enabled by setting setUndefinedTargetValidatesAllSubjects(true).

sh:pathsupports single predicate paths, e.g. ex:age, inverse paths, sequence paths and alternate paths.

Validation results

On commit() the ShaclSail will validate your changes and throw an exception if there are violations. The exception contains a validation report and can be retrieved like this:

try {
    connection.commit();
} catch (RepositoryException exception) {
    Throwable cause = exception.getCause();
    if (cause instanceof ValidationException) {
        Model validationReportModel = ((ValidationException) cause).validationReportAsModel();
        // use validationReportModel to understand validation violations

        Rio.write(validationReportModel, System.out, RDFFormat.TURTLE);
    }
    throw exception;
}

The validationReportModel follows the report format specified by the W3C SHACL recommendation. It does not provide all the information specified in the recommendation. Example report:

[]
    a sh:ValidationReport ;
    sh:conforms false ;
    rdf4j:truncated false;
    sh:result [
        a sh:ValidationResult ;
        sh:value "eighteen";
        sh:focusNode ex:pete ;
        sh:resultPath ex:age ;
        sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
        sh:resultSeverity sh:Violation;
        sh:resultMessage "A person's age must be a number (xsd:int).";
        sh:sourceShape [ a sh:PropertyShape;
            sh:message "A persons age must be a number (xsd:int).";
            sh:path ex:age;
            sh:datatype xsd:int
        ]
    ] .

The ValidationReport class provides the same information as the validationReportModel, but as a Java object with getters for accessing the report data.

Limiting the validation report

Large validation reports take time to generate and can use large amounts of memory. Limiting the size of the report can be useful to speed up validation and to reduce the number of similar violations.

Limitations can either be configured directly in the ShaclSail or through the configuration files.

setValidationResultsLimitTotal(1000) limits the total number of validation results per report to 1000. (1 000 000 by default)
- <http://rdf4j.org/config/sail/shacl#validationResultsLimitTotal>
setValidationResultsLimitPerConstraint(10) limits the number of validation results per constraint component to 10. (1000 by default)
- <http://rdf4j.org/config/sail/shacl#validationResultsLimitPerConstraint>

Use -1 to remove a limit and 0 to validate but return an empty validation report.

A truncated validation report will have isTruncated() return true and the model will have rdf4j:truncated true.

Retrieving violated shapes

The validation report contains the portions of the source shape responsible for the violation. As the SHACL shapes graph stores all shapes, you can obtain the entire violated shape from the ShaclSail when a transaction fails.

try {
    connection.commit();
} catch (RepositoryException exception) {
    Throwable cause = exception.getCause();
    if (cause instanceof ValidationException) {
        Model validationReportModel = ((ValidationException) cause).validationReportAsModel();

        validationReportModel
            .filter(null, SHACL.SOURCE_SHAPE, null);
            .forEach(s -> {
                Value object = s.getObject();

                try (SailRepositoryConnection connection = shaclSail.getConnection()) {

                    try (Stream<Statement> stream = connection.getStatements((Resource) object, null, null, RDF4J.SHACL_SHAPE_GRAPH).stream()) {
                        List<Statement> collect = stream.collect(Collectors.toList());

                        // collect contains the shape!
                    }

                }

            });
    }
    throw exception;
}

Transactional support

When running multiple transactions in parallel, the results from two transactions may jointly cause the validation to fail. A good example is as follows:

ex:PersonShape
    a sh:NodeShape  ;
    sh:targetClass ex:Person ;
    sh:property [
        sh:path ex:age ;
        sh:datatype xsd:int ;
    ] .

One transaction adds:

ex:pete a ex:Person.

In parallel another transaction adds:

ex:pete ex:age "eighteen".

Neither of these transactions will by themselves cause the validation to fail, but together they will.

Typically, in order to handle this scenario a user would need to use SERIALIZABLE transactions, which are slow and prone to failure. The ShaclSail instead uses locking to run transactions one-after-the-other if the isolation level is set to SNAPSHOT. This is typically 2-4x faster than using SERIALIZABLE.

Locking only affects transactions that write to the ShaclSail, and locking is only applied when calling commit().

It is possible to disable this type of validation with setSerializableValidation(false).

Performance

The ShaclSail is built for performance. Each transaction is analyzed so that only the minimal set of shapes need to be validated, and for each of those shapes only the least amount of data is retrieved in order to perform the validation.

Parallel validation further increases performance and is enabled by default. This can be disabled with setParallelValidation(false).

Some workloads will not fit in memory and need to be validated while stored on disk. This can be achieved by using a NativeStore and specifying the Bulk transaction setting.

ShaclSail.TransactionSettings.ValidationApproach.Auto: Let the ShaclSail choose the best approach.
ShaclSail.TransactionSettings.ValidationApproach.Bulk: Optimized for large transactions, disables caching and parallel validation and runs a full validation step at the end of the transaction.
ShaclSail.TransactionSettings.ValidationApproach.Disabled: Disable validation.

Disabling validation for a transaction may leave your data in an invalid state. Running a transaction with bulk validation will force a full validation. This is a useful approach if you need to use multiple transactions to bulk load your data. You can always skip this if you are certain that your changes are valid.

There are also a set of experimental transaction settings for hinting about performance aspects of the validation.

ShaclSail.TransactionSettings.PerformanceHint.CacheEnabled: Enable the cache that stores intermediate results so these only need to be computed once.
ShaclSail.TransactionSettings.PerformanceHint.CacheDisabled: Disable the cache.
ShaclSail.TransactionSettings.PerformanceHint.ParallelValidation: Run validation in parallel (multithreaded).
ShaclSail.TransactionSettings.PerformanceHint.SerialValidation: Run validation in serial (single threaded).

ShaclSail shaclSail = new ShaclSail(new NativeStore(new File(...), "spoc,ospc,psoc"));
SailRepository sailRepository = new SailRepository(shaclSail);

try (SailRepositoryConnection connection = sailRepository.getConnection()) {

      connection.begin(
          IsolationLevels.NONE, 
          ShaclSail.TransactionSettings.ValidationApproach.Bulk, 
          ShaclSail.TransactionSettings.PerformanceHint.CacheDisabled, 
          ShaclSail.TransactionSettings.PerformanceHint.SerialValidation
      );    
        
//    You can enable parallel validation and the intermediate cache for better performance if you have sufficient memory 
//    connection.begin(
//        IsolationLevels.NONE, 
//        ShaclSail.TransactionSettings.ValidationApproach.Bulk, 
//        ShaclSail.TransactionSettings.PerformanceHint.CacheEnabled, 
//        ShaclSail.TransactionSettings.PerformanceHint.ParallelValidation
//    );    
    
    // load shapes
    try (InputStream inputStream = new FileInputStream("shacl.ttl")) {
        connection.add(inputStream, "", RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH);
    }

    // load data
    try (InputStream inputStream = new BufferedInputStream(new FileInputStream("data.ttl"))) {
        connection.add(inputStream, "", RDFFormat.TURTLE);
    }

    // commit transaction and catch any exception
    try {
        connection.commit();
    } catch (RepositoryException e){
        if(e.getCause() instanceof ValidationException){
            Model model = ((ValidationException) e.getCause()).validationReportAsModel();
            Rio.write(model, System.out, RDFFormat.TURTLE);
        }
    }

}

sailRepository.shutDown();

Automatic bulk validation

Large transactions will take up significant amounts of memory because the transactional validation needs to track and analyze the transactional changes in order to decide what needs to be validated. Very large transactions could exceed the amount of memory available and cause the JVM to crash.

As of 4.0.0 transactions will automatically be switched to bulk validation if they exceed a set limit.

setTransactionalValidationLimit(1000) will make transactions switch to bulk validation if the transaction size is more than 1000 statements. Default is 500 000.
- <http://rdf4j.org/config/sail/shacl#transactionalValidationLimit>

Automatic bulk validation is not compatible with serializable validation.

Reasoning

By default, the ShaclSail supports the simple rdfs:subClassOf reasoning required by the SHACL W3C recommendation. There is no support for sh:entailment, however the entire reasoner can be disabled with setRdfsSubClassReasoning(false).

Shapes graph

When shapes are used to validate the contents of a database it makes sense to consider them part of the database schema and as such keep them seperated from each other. This separation is achieved by storing shapes in a reserved graph (http://rdf4j.org/schema/rdf4j#SHACLShapeGraph) which is hidden unless specifically named when loading or removing data.

As of 4.0.0 the ShaclSail can be configured to load shapes from other graphs (both named graphs and the default graph).

setShapesGraphs(Set.of(RDF4J.SHACL_SHAPE_GRAPH, Values.iri("http://example.org/myShapeGraph")) will read shapes from both the normal reserved graph and the named graph http://example.org/myShapeGraph.
- <http://rdf4j.org/config/sail/shacl#shapesGraph>

Shapes stored in the reserved graph (http://rdf4j.org/schema/rdf4j#SHACLShapeGraph) are used to validate the union of all triples in the default graph and any other named graph. The ShaclSail relies on sh:shapesGraph statements to understand how shapes stored in other graphs should be used for validation.

SHACL uses the term shapes graph to refer to an RDF graph where the shapes are defined, and the term data graph to refer to an RDF graph where the data to be validated is stored. Data graphs and shapes graphs are linked together using sh:shapesGraph statements which are used by the ShaclSail to decide which shapes should be used to validate which graphs.

For security and performance the ShaclSail ignores sh:shapesGraph statements that are not in a graph that has been configured for shapes, as explained above. This means that you can always trust data you load into your database to not tamper with your shapes or with which shapes are used for validation, as long as you limit which graphs your load the data into.

Example

ex:shapesGraph1 {
    ex:PersonShape
        a sh:NodeShape  ;
        sh:targetClass ex:Person ;       
}  


ex:shapesGraph2 {
    ex:PersonShape       
        sh:property [
            sh:path ex:age ;
            sh:datatype xsd:int ;
        ] .

    ex:dataGraph sh:shapesGraph ex:shapesGraph1, ex:shapesGraph2.         
}


ex:shapesGraph3 {
    ex:PersonShape       
        sh:property [
            sh:path ex:age ;
            sh:minCount 1;
        ] .
}

The above shapes will result in all the data in ex:dataGraph being validated against the shapes defined in the union of both ex:shapesGraph1 and ex:shapesGraph2. The shape defined in ex:shapesGraph3 is effectively ignored.

The following data is valid.

ex:dataGraph {
    ex:steve a ex:Person.
    
    ex:jane a ex:Person;
        ex:age 40.
}

While the following data is invalid.

ex:dataGraph {    
    ex:john a ex:Person;
        ex:age "seventy two".
}

It is also possible to validate the data in the default (unnamed) graph. Use rdf4j:nil to refer to the default graph.

To validate all the data in your database, e.g. the union of the default graph and all named graphs, you simply need to put your shapes in the rdf4j:SHACLShapeGraph.

RSX - Eclipse RDF4J SHACL Extensions

RDF4J has seen a need to develop its own extension the W3C SHACL Recommendation in order to support new and innovative features. We always strive to collaborate with the community when developing these features.

RSX currently contains rsx:targetShape which will allow a Shape to be the target for your constraints. This means that it will be possible to model more complex use-cases like “all norwegian companies with 10 or more employees, a revenue greater than or equal to 6 million NOK or valued at 23 million or above are required to have a registered accountant”. This also allows for considerably faster implementations than what is currently possible with SPARQL Targets.

The RSX specification will be published soon together with the limited support for rsx:targetShape in 3.3.0 (sh:or, sh:and, sh:hasValue, dash:hasValueIn, sh:path, sh:property).

Logging and debugging

By default, there is no logging enabled in the ShaclSail. There are four methods for enabling logging:

shaclSail.setLogValidationPlans(true);
shaclSail.setGlobalLogValidationExecution(true);
shaclSail.setLogValidationViolations(true);
shaclSail.setPerformanceLogging(true);

All these will log as INFO.

First step to debugging and understanding an unexpected violation is to enable shaclSail.setLogValidationViolations(true);.

Log validation plans

Validation plans are logged as Graphviz DOT. Validations plans are a form of query plan.

Here is the validation plan for the example above: Link

The structure of this log and its contents may change in the future, without warning.

Log validation execution

The execution of the validation plan shows what data was requested during the execution and how that data was joined together and filtered.

Enabling this logging will enable it for all ShaclSails on all threads.

Enabling this logging will make your validation considerably slower and take up considerably more memory.

Following on from the example above

01. [main] INFO   Select.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
02. [main] INFO    DatatypeFilter;falseNode:  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
03. [main] INFO     DirectTupleFromFilter.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
04. [main] INFO      InnerJoin;discardedRight:  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
05. [main] INFO       BufferedTupleFromFilter.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
06. [main] INFO         ExternalTypeFilterNode.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
07. [main] INFO           UnionNode.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
08. [main] INFO             UnionNode.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
09. [main] INFO              DatatypeFilter;falseNode:  Tuple{line=[http://example.com/ns#pete, "eighteen"]}
10. [main] INFO               DirectTupleFromFilter.next():  Tuple{line=[http://example.com/ns#pete, "eighteen"]}

The log is best read in conjunction with the validation plan. By taking the bottom log line (10.) as the bottom node in the plan and for each indentation following the plan upwards. Multiple lines at a given indentation mean that that node produced multiple tuples.

Line 6 shows a query to the underlying database. Line 1 is the query for everything matching the path (ex:age) against the added data in the transaction.

The indentation is best-effort.

The structure of this log and its contents may change in the future, without warning.

Log validation violations

As the commit() call iterates over the shapes it can log the results (tuples) from the execution of each validation plan.

Following on from the example above:

1. [main] INFO  SHACL not valid. The following experimental debug results were produced:
2.     NodeShape: http://example.com/ns#PersonShape
3.         Tuple{line=[http://example.com/ns#pete, "eighteen"], propertyShapes= DatatypePropertyShape <_:node1d285h2ktx1>} -cause->  [ Tuple{line=[http://example.com/ns#pete, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://example.com/ns#Person]} ]

Line 2 shows the shape that triggered this violation. Line 3 shows the ultimate tuple produced and which PropertyShape produced the exception followed by a cause listing other tuples that caused the violation. In this case the existing type statement.

The structure of this log and its contents may change in the future, without warning.

Full working example

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.Logger;
import org.eclipse.rdf4j.common.exception.ValidationException;
import org.eclipse.rdf4j.model.Model;
import org.eclipse.rdf4j.model.vocabulary.RDF4J;
import org.eclipse.rdf4j.repository.RepositoryException;
import org.eclipse.rdf4j.repository.sail.SailRepository;
import org.eclipse.rdf4j.repository.sail.SailRepositoryConnection;
import org.eclipse.rdf4j.rio.RDFFormat;
import org.eclipse.rdf4j.rio.Rio;
import org.eclipse.rdf4j.rio.WriterConfig;
import org.eclipse.rdf4j.rio.helpers.BasicWriterSettings;
import org.eclipse.rdf4j.sail.memory.MemoryStore;
import org.eclipse.rdf4j.sail.shacl.ShaclSail;
import org.eclipse.rdf4j.sail.shacl.results.ValidationReport;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.StringReader;

public class ShaclSampleCode {

    public static void main(String[] args) throws IOException {

        ShaclSail shaclSail = new ShaclSail(new MemoryStore());

//        Logger root = (Logger) LoggerFactory.getLogger(ShaclSail.class.getName());
//        root.setLevel(Level.INFO);
//
//        shaclSail.setPerformanceLogging(true);


        SailRepository sailRepository = new SailRepository(shaclSail);
        sailRepository.init();

        try (SailRepositoryConnection connection = sailRepository.getConnection()) {

            connection.begin();

            StringReader shaclRules = new StringReader(
                String.join("\n", "",
                    "@prefix ex: <http://example.com/ns#> .",
                    "@prefix sh: <http://www.w3.org/ns/shacl#> .",
                    "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .",
                    "@prefix foaf: <http://xmlns.com/foaf/0.1/>.",

                    "ex:PersonShape",
                    "  a sh:NodeShape  ;",
                    "  sh:targetClass foaf:Person ;",
                    "  sh:property ex:PersonAgeIntShape, ex:PersonMustHaveAge, ex:PersonCanNotHaveMultipleAge  .",

                    "ex:PersonAgeIntShape ",
                    "  sh:path foaf:age ;",
                    "  sh:message \"A person's age must be a number (xsd:int).\" ;",
                    "  sh:datatype xsd:int .",

                    "ex:PersonMustHaveAge ",
                    "  sh:path foaf:age ;",
                    "  sh:message \"A must have an age.\" ;",
                    "  sh:minCount 1 .",

                    "ex:PersonCanNotHaveMultipleAge ",
                    "  sh:path foaf:age ;",
                    "  sh:message \"A person can only have one age.\" ;",
                    "  sh:maxCount 1 ."

                ));

            connection.add(shaclRules, "", RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH);
            connection.commit();

            connection.begin();

            StringReader invalidSampleData = new StringReader(
                String.join("\n", "",
                    "@prefix ex: <http://example.com/ns#> .",
                    "@prefix foaf: <http://xmlns.com/foaf/0.1/>.",
                    "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .",

                    "ex:peter a foaf:Person ;",
                    "  foaf:age 20, \"30\"^^xsd:int  ."

                ));

            connection.add(invalidSampleData, "", RDFFormat.TURTLE);
            try {
                connection.commit();
            } catch (RepositoryException exception) {
                Throwable cause = exception.getCause();
                if (cause instanceof ValidationException) {
                    Model validationReportModel = ((ValidationException) cause).validationReportAsModel();

                    WriterConfig writerConfig = new WriterConfig()
                        .set(BasicWriterSettings.INLINE_BLANK_NODES, true)
                        .set(BasicWriterSettings.XSD_STRING_TO_PLAIN_LITERAL, true)
                        .set(BasicWriterSettings.PRETTY_PRINT, true);

                    Rio.write(validationReportModel, System.out, RDFFormat.TURTLE, writerConfig);
                }
                throw exception;
            }
        }
    }
}

Breadcrumb

Validation With SHACL

Understanding the RDF4J SHACL Engine

How to load and update SHACL shapes

Supported SHACL features

Validation results

Limiting the validation report

Retrieving violated shapes

Transactional support

Performance

Automatic bulk validation

Reasoning

Shapes graph

Example

RSX - Eclipse RDF4J SHACL Extensions

Logging and debugging

Log validation plans

Log validation execution

Log validation violations

Full working example

Further reading