Class SourceSelection
java.lang.Object
org.eclipse.rdf4j.federated.optimizer.SourceSelection
Performs source selection during query optimization by determining, for each statement pattern in a BGP, which
federation members (endpoints) can contribute results.
Algorithm
For each statement pattern and each endpoint the SourceSelectionCache is consulted first. Patterns whose
source membership is already known from a previous query are resolved without any remote communication. Only patterns
with a POSSIBLY_HAS_STATEMENTS assurance require a remote check. All remote checks are executed in parallel using the FedX
worker-thread infrastructure (SourceSelection.SourceSelectionExecutorWithLatch), and the calling thread blocks until every
check has completed or the query timeout is reached.
Remote check strategies
Two strategies are supported for performing the remote checks, selected per endpoint based on the configuration:
- Grouped source selection (default,
enableGroupedSourceSelection=true) - All statement patterns that require a remote check for a given endpoint are batched into a single SPARQL SELECT
query of the form:
SELECT * WHERE { BIND(EXISTS { <pattern_0> } AS ?stmt_0) BIND(EXISTS { <pattern_1> } AS ?stmt_1) ... }This reduces the number of remote requests from O(S × M) to O(M), where S is the number of statement patterns and M the number of federation members. It is particularly effective in high-latency settings. Grouped checks are only used when more than two patterns require a check for the same endpoint; otherwise, individual ASK queries are sent. The grouped check is implemented inSourceSelection.ParallelGroupedCheckTask. - Individual ASK queries (classic FedX,
enableGroupedSourceSelection=false) - One ASK query is sent per statement pattern and endpoint, yielding up to S × M remote requests.
Implemented in
SourceSelection.ParallelCheckTask.
Configuration
The source selection strategy is controlled via FedXConfig:
withEnableGroupedSourceSelection(boolean)— enables or disables grouped source selection (default:true)withSourceSelectionCacheSpec(String)— configures the GuavaCacheBuilderSpecfor the source selection cachewithSourceSelectionCacheFactory(...)— supplies a customSourceSelectionCacheFactory
- Author:
- Andreas Schwarte
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected classprotected static classTask for sending an ASK request to the endpoints (for source selection)protected static classprotected static classprotected static class -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final SourceSelectionCacheprotected final FederationContextprotected final QueryInfoprotected Map<StatementPattern, List<StatementSource>> Map statements to their sources. -
Constructor Summary
ConstructorsConstructorDescriptionSourceSelection(List<Endpoint> endpoints, SourceSelectionCache cache, FederationContext federationContext, QueryInfo queryInfo) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddSource(StatementPattern stmt, StatementSource source) Add a source to the given statement in the map (synchronized through map)voiddoSourceSelection(List<StatementPattern> stmts) Perform source selection for the provided statements using cache or remote ASK queries.Retrieve a set of relevant sources for this query.
-
Field Details
-
endpoints
-
cache
-
federationContext
-
queryInfo
-
stmtToSources
Map statements to their sources. Use synchronized access!
-
-
Constructor Details
-
SourceSelection
public SourceSelection(List<Endpoint> endpoints, SourceSelectionCache cache, FederationContext federationContext, QueryInfo queryInfo)
-
-
Method Details
-
doSourceSelection
Perform source selection for the provided statements using cache or remote ASK queries. Remote ASK queries are evaluated in parallel using the concurrency infrastructure of FedX. Note, that this method is blocking until every source is resolved. The statement patterns are replaced by appropriate annotations in this optimization.- Parameters:
stmts-
-
getRelevantSources
-
addSource
Add a source to the given statement in the map (synchronized through map)- Parameters:
stmt-source-
-