Class SketchBasedJoinEstimator

java.lang.Object
org.eclipse.rdf4j.query.algebra.evaluation.sketch.SketchBasedJoinEstimator
All Implemented Interfaces:
QueryOptimizationScopeProvider

@Experimental public class SketchBasedJoinEstimator extends Object implements QueryOptimizationScopeProvider
ArrayOfDoublesSketch‑based selectivity and join‑size estimator for RDF4J.

Features:

  • Array-of-doubles tuple sketches over S, P, O, C singles, component degree sketches, and all six pairs.
  • Synchronized reads sharing buffer locks; double‑buffered rebuilds.
  • Incremental addStatement / deleteStatement with signed multiplicity summaries.
  • Configurable via SketchBasedJoinEstimator.Config and system properties (see below).

Configuration

Applications should prefer SketchBasedJoinEstimator(SketchStatementSource, Config) to set options programmatically. For convenience, SketchBasedJoinEstimator(SketchStatementSource, int, long, long) delegates to SketchBasedJoinEstimator.Config.defaults() and will pick up system properties as well.

System properties (overlay)

All options can be overridden at construction time by JVM system properties with prefix org.eclipse.rdf4j.query.algebra.evaluation.sketch.SketchBasedJoinEstimator.. When present, the system property value takes precedence over the corresponding value provided through SketchBasedJoinEstimator.Config. The legacy org.eclipse.rdf4j.sail.base.SketchBasedJoinEstimator. prefix is also accepted as a fallback. Supported keys (defaults shown in SketchBasedJoinEstimator.Config):

  • nominalEntries (int ≥ 16, sketch nominal entries)
  • subjectBucketCount (int ≥ 4)
  • predicateBucketCount (int ≥ 4)
  • objectBucketCount (int ≥ 4)
  • contextBucketCount (int ≥ 4)
  • contextPairSketchesEnabled (boolean)
  • throttleEveryN (long)
  • throttleMillis (long)
  • refreshSleepMillis (long)
  • estimateCacheSeconds (long)
  • defaultContextString (String)
  • roundJoinEstimates (boolean)
  • churnSampleMin (int)
  • churnSamplePercent (double 0..1)
  • churnSampleMax (int)
  • churnReaddThreshold (double 0..1)
  • churnRemovalRatioThreshold (double 0..1)
  • incrementalQueueInitialLimit (int)
  • incrementalQueueIdleResetMillis (long)
  • incrementalQueueEstimatedStatementBytes (long)

Example (configure default context and reduce refresh cadence):


System.setProperty(
    "org.eclipse.rdf4j.query.algebra.evaluation.sketch.SketchBasedJoinEstimator.defaultContextString", "urn:ctx");
System.setProperty(
    "org.eclipse.rdf4j.query.algebra.evaluation.sketch.SketchBasedJoinEstimator.refreshSleepMillis", "500");
var est = new SketchBasedJoinEstimator(source, Config.defaults().withNominalEntries(1024));