Class SourceSelection

java.lang.Object
org.eclipse.rdf4j.federated.optimizer.SourceSelection

public class SourceSelection extends Object
Performs source selection during query optimization by determining, for each statement pattern in a BGP, which federation members (endpoints) can contribute results.

Algorithm

For each statement pattern and each endpoint the SourceSelectionCache is consulted first. Patterns whose source membership is already known from a previous query are resolved without any remote communication. Only patterns with a POSSIBLY_HAS_STATEMENTS assurance require a remote check. All remote checks are executed in parallel using the FedX worker-thread infrastructure (SourceSelection.SourceSelectionExecutorWithLatch), and the calling thread blocks until every check has completed or the query timeout is reached.

Remote check strategies

Two strategies are supported for performing the remote checks, selected per endpoint based on the configuration:

Grouped source selection (default, enableGroupedSourceSelection=true)
All statement patterns that require a remote check for a given endpoint are batched into a single SPARQL SELECT query of the form:
SELECT * WHERE {
  BIND(EXISTS { <pattern_0> } AS ?stmt_0)
  BIND(EXISTS { <pattern_1> } AS ?stmt_1)
  ...
}
This reduces the number of remote requests from O(S × M) to O(M), where S is the number of statement patterns and M the number of federation members. It is particularly effective in high-latency settings. Grouped checks are only used when more than two patterns require a check for the same endpoint; otherwise, individual ASK queries are sent. The grouped check is implemented in SourceSelection.ParallelGroupedCheckTask.
Individual ASK queries (classic FedX, enableGroupedSourceSelection=false)
One ASK query is sent per statement pattern and endpoint, yielding up to S × M remote requests. Implemented in SourceSelection.ParallelCheckTask.

Configuration

The source selection strategy is controlled via FedXConfig:

Author:
Andreas Schwarte
  • Field Details

  • Constructor Details

  • Method Details

    • doSourceSelection

      public void doSourceSelection(List<StatementPattern> stmts)
      Perform source selection for the provided statements using cache or remote ASK queries. Remote ASK queries are evaluated in parallel using the concurrency infrastructure of FedX. Note, that this method is blocking until every source is resolved. The statement patterns are replaced by appropriate annotations in this optimization.
      Parameters:
      stmts -
    • getRelevantSources

      public Set<Endpoint> getRelevantSources()
      Retrieve a set of relevant sources for this query.
      Returns:
      the relevant sources
    • addSource

      protected void addSource(StatementPattern stmt, StatementSource source)
      Add a source to the given statement in the map (synchronized through map)
      Parameters:
      stmt -
      source -