Skip to main content

Federation with FedX

(new in RDF4J 3.1)

FedX provides transparent federation of multiple SPARQL endpoints under a single virtual endpoint. As an example, a knowledge graph such as Wikidata can be queried in a federation with endpoints that are linked to Wikidata as an integration hub. In a federated SPARQL query in FedX, one no longer needs to explicitly address specific endpoints using SERVICE clauses. Instead, FedX automatically selects relevant sources, sends statement patterns to these sources for evaluation, and joins the individual results. FedX seamlessly integrates into RDF4J using the Repository API and can be used as a drop-in component in existing applications including the RDF4J Workbench.

Core Features

  • Virtual Integration of heterogeneous Linked Data sources (e.g. as SPARQL endpoints)
  • Transparent access to data sources through a federation
  • Efficient query processing in federated environments
  • On-demand federation setup at query time
  • Fast and effective query execution due to new optimization techniques for federated setups
  • Practical applicability & easy integration as a RDF4J Repository

Getting Started

Below we present examples for getting started in using FedX.

The examples query data from http://dbpedia.org/ and join it with data from https://www.wikidata.org/. It turns out that these endpoints are currently the most reliable ones that are publicly accessible.

Example query

Retrieve the European Union countries from DBpedia and join it with the GDP data coming from Wikidata

SELECT * WHERE { 
  ?country a yago:WikicatMemberStatesOfTheEuropeanUnion .
  ?country owl:sameAs ?countrySameAs . 
  ?countrySameAs wdt:P2131 ?gdp .
}

Note that the query is a bit artificial, however, it illustrates quite well the powers of federating different data sources.

Using a Java program

The following Java code can be used to execute our example query against the federation.

Repository repository = FedXFactory.newFederation()
	.withSparqlEndpoint("http://dbpedia.org/sparql")
	.withSparqlEndpoint("https://query.wikidata.org/sparql")
	.create();
		
try (RepositoryConnection conn = repository.getConnection()) {

	String query = 
		"PREFIX wd: <http://www.wikidata.org/entity/> "
		+ "PREFIX wdt: <http://www.wikidata.org/prop/direct/> "
		+ "SELECT * WHERE { "
		+ " ?country a <http://dbpedia.org/class/yago/WikicatMemberStatesOfTheEuropeanUnion> ."
		+ " ?country <http://www.w3.org/2002/07/owl#sameAs> ?countrySameAs . "
		+ " ?countrySameAs wdt:P2131 ?gdp ."
		+ "}";

	TupleQuery tq = conn.prepareTupleQuery(query);
	try (TupleQueryResult tqRes = tq.evaluate()) {

		int count = 0;
		while (tqRes.hasNext()) {
			BindingSet b = tqRes.next();
			System.out.println(b);
			count++;
		}

		System.out.println("Results: " + count);
	}
}
		
repository.shutDown();

The full code is also available as source in the “demos” package of the test source folder.

Instead of defining the federation via code, it is also possible to use data configurations. See the following sections for further examples.

FedX in Java Applications

FedX is implemented as a RDF4J Repository. To initialize FedX and the underlying federation SAIL, we provide the FedXFactory class, which provides various methods for intuitive configuration. In the following, we present various Java code snippets that illustrate how FedX can be used in an application.

Basically, FedX can be used and accessed using the SAIL architecture (see the RDF4J SAIL documentation for details). The Repository can be obtained from any FedXFactory initialization method. Besides using the Repository interface for creating queries, we also provide a QueryManager class to conveniently create queries. The advantage of the QueryManager over using the RepositoryConnection to create queries, is that preconfigured PREFIX declarations are added automatically to the query, i.e. the user can use common prefixes (such as rdf, foaf, etc.) without the need to specify them in the prologue of the query. See PREFIX Declarations for a detailed documentation.

Example 1: Using a simple SPARQL Federation as a Repository

In the following example, we configure a federation with the publicly available DBpedia and SemanticWebDogFood SPARQL endpoints. Please refer to Configuring FedX for details.

Repository repo = FedXFactory.createSparqlFederation(Arrays.asList(
			"http://dbpedia.org/sparql",
			"http://data.semanticweb.org/sparql"));
repo.init();

String q = "PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;\n"
	+ "PREFIX dbpedia-owl: &lt;http://dbpedia.org/ontology/&gt;\n"
	+ "SELECT ?President ?Party WHERE {\n"
	+ "?President rdf:type dbpedia-owl:President .\n"
	+ "?President dbpedia-owl:party ?Party . }";

try (RepositoryConnection conn = repo.getConnection()) {
	TupleQuery query = conn.prepareTupleQuery(QueryLanguage.SPARQL, q);
	try (TupleQueryResult res = query.evaluate()) {

		while (res.hasNext()) {
			System.out.println(res.next());
		}
	}
}

repo.shutDown();
System.out.println("Done.");
System.exit(0);

Example 2: Using a data configuration file

In this example we use a data configuration file to set up the federation members (see section on member configuration below for more details). Note that in this example we use an initialized Repository to create the query, as well as the connection.

File dataConfig = new File("local/dataSourceConfig.ttl");
Repository repo = FedXFactory.createFederation(dataConfig);
repo.init();

String q = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n"
	+ "PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>\n"
	+ "SELECT ?President ?Party WHERE {\n"
	+ "?President rdf:type dbpedia-owl:President .\n"
	+ "?President dbpedia-owl:party ?Party . }";

try (RepositoryConnection conn = repo.getConnection()) {
	TupleQuery query = conn.prepareTupleQuery(QueryLanguage.SPARQL, q);
	try (TupleQueryResult res = query.evaluate()) {

		while (res.hasNext()) {
			System.out.println(res.next());
		}
	}
}

repo.shutDown();
System.out.println("Done.");
System.exit(0);

Example 3: Setting up FedX using the Endpoint utilities

This example shows how to setup FedX using a mechanism to include dynamic endpoints.

List<Endpoint> endpoints = new ArrayList<>();
endpoints.add( EndpointFactory.loadSPARQLEndpoint("dbpedia", "http://dbpedia.org/sparql"));
endpoints.add( EndpointFactory.loadSPARQLEndpoint("swdf", "http://data.semanticweb.org/sparql"));

Repository repo = FedXFactory.createFederation(endpoints);
repo.init();

String q = "PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;\n"
	+ "PREFIX dbpedia-owl: &lt;http://dbpedia.org/ontology/&gt;\n"
	+ "SELECT ?President ?Party WHERE {\n"
	+ "?President rdf:type dbpedia-owl:President .\n"
	+ "?President dbpedia-owl:party ?Party . }";

TupleQuery query = QueryManager.prepareTupleQuery(q);
try (TupleQueryResult res = query.evaluate()) {

	while (res.hasNext()) {
		System.out.println(res.next());
	}
}

repo.shutDown();
System.out.println("Done.");
System.exit(0);

Federation Management

FedX federations can be managed both at initialization and at runtime. This is possible since FedX is capable of on-demand federation setup, meaning that we do not require any prior knowledge about data sources.

The federation can be controlled at runtime using the FederationManager. This class provides all means for interacting with the federation at runtime, e.g. adding or removing federation members.

Endpoints can be added to the federation using the methods addEndpoint(Endpoint) and removed with removeEndpoint(endpoint). Note that new endpoints can be initialized using the endpoint Management facilities.

Endpoint Management

In FedX any federation member is mapped to an Endpoint. The endpoint maintains all relevant information for a particular endpoint, e.g. how triples can be retrieved from the endpoint. Endpoints can be added to the federation at initialization time or at runtime.

In FedX we provide support methods to create Endpoints for SPARQL endpoints, RDF4J NativeStores. The methods can be used to create endpoints easily.

Example: Using the endpoint Manager to create endpoints

Config.initialize();
List<Endpoint> endpoints = new ArrayList<>();

// initializing a SPARQL endpoint (with explicit name)
endpoints.add( EndpointFactory.loadSPARQLEndpoint("http://dbpedia", "http://dbpedia.org/sparql"));

// another SPARQL endpoint (name is constructed from url)
endpoints.add( EndpointFactory.loadSPARQLEndpoint("http://data.semanticweb.org/sparql"));

// load a RDF4J NativeStore (path either absolute or relative to Config#getBaseDir)
endpoints.add( EndpointFactory.loadNativeEndpoint("http://mystore", "path/to/myNativeStore"));

FedXFactory.initializeFederation(endpoints);

For details about the methods please refer to the javadoc help of the class EndpointFactory

Note: With the Endpoint mechanism it is basically possible to support any kind of Repository of SAIL implementation as federation member. For documentation consider the javadoc, in particular EndpointFactory and EndpointProvider.

FedX configuration

FedX provides various means for configuration. Configuration settings can be defined using the FedXConfig facility, which can be passed at initialization time. Note that certain settings can also be changed during runtime, please refer to the API documentation for details.

Available Properties

PropertyDescription
prefixDeclarationsPath to prefix declarations file, see PREFIX Declarations
cacheLocationLocation where the memory cache gets persisted at shutdown, default cache.db
joinWorkerThreadsThe number of join worker threads for parallelization, default 20
unionWorkerThreadsThe number of union worker threads for parallelization, default 20
boundJoinBlockSizeBlock size for bound joins, default 15
enforceMaxQueryTimeMax query time in seconds, 0 to disable, default 30
enableServiceAsBoundJoinFlag for evaluating a SERVICE expression (contacting non-federation members) using vectored evaluation, default false. For today's endpoints it is more efficient to disable vectored evaluation of SERVICE
debugQueryPlanPrint the optimized query execution plan to stdout, default false
enableMonitoringFlag to enable/disable monitoring features, default false
logQueryPlanFlag to enable/disable query plan logging via Java class QueryPlanLog, default false
logQueriesFlag to enable/disable query logging via QueryLog, default false. The QueryLog facility allows to log all queries to a file

Query timeouts

FedX supports to define the maximum execution time for a query. This can be set on query level Query#setMaxExecutionTimeor globally using the FedX config setting enforceMaxQueryTime.

Note that the query engine attempts to abort any running evaluation of a subquery when the maximum execution time has reached.

If a query timeout occurs, a QueryInterruptedException is thrown.

Prefix declarations

FedX allows to (optionally) define commonly used prefixes (e.g. rdf, foaf, etc.) in a configuration file. These configured prefixes are then automatically inserted into a query, meaning that the user does not have to specify full URIs nor the PREFIX declaration in the query.

The prefixes can be either specified in a configuration file as key-value pairs or directly configured via Java code (see examples below). When using a configuration file, this can be configured via the prefixDeclarations property.

Example: Prefix configuration via configuration file

# this file contains a set of prefix declarations
=http://example.org/
foaf=http://xmlns.com/foaf/0.1/
rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
dbpedia=http://dbpedia.org/ontology/

Example: Setting prefixes at runtime

The QueryManager can be used to define additional prefixes at runtime.

QueryManager qm = repo.getQueryManager();
qm.addPrefixDeclaration("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
qm.addPrefixDeclaration("dbpedia", "http://dbpedia.org/ontology/");

Member configuration

Federation members can be added to a federation either directly as a list of endpoints, or using a data configuration file (see section FedX in Java applications). In a data configuration the federation members are specified using turtle syntax.

Example 1: SPARQL Federation:

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix fedx: <http://www.fluidops.com/config/fedx#> .

<http://DBpedia> a sd:Service ;
	fedx:store "SPARQLEndpoint";
	sd:endpoint "http://dbpedia.org/sparql";
	fedx:supportsASKQueries false .

<http://SWDF> a sd:Service ;
	fedx:store "SPARQLEndpoint" ;
	sd:endpoint "http://data.semanticweb.org/sparql".

<http://LinkedMDB> a sd:Service ;
	fedx:store "SPARQLEndpoint";
	sd:endpoint "http://data.linkedmdb.org/sparql".

Note: if a SPARQL endpoint does not support ASK queries, the endpoint can be configured to use SELECT queries instead using fedx:supportsASKQueries false. This is for instance useful for Virtuoso based endpoints like DBpedia. Moreover note that for convenience the public DBpedia endpoint is automatically configured to use SELECT queries.

Example 2: SPARQL Federation with RDF4J remote repositories

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix fedx: <http://www.fluidops.com/config/fedx#> .

<http://dbpedia> a sd:Service ;
	fedx:store "RemoteRepository";
	fedx:repositoryServer "http://host/rdf4j-server" ;
	fedx:repositoryName "repoName" .

Example 3: Local Federation (NativeStore):

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix fedx: <http://www.fluidops.com/config/fedx#> .

<http://DBpedia> a sd:Service ;
	fedx:store "NativeStore";
	fedx:repositoryLocation "repositories\\native-storage.dbpedia36".

<http://NYTimes> a sd:Service ;
	fedx:store "NativeStore";
	fedx:repositoryLocation "repositories\\native-storage.nytimes".

Example 4: Federation with resolvable endpoints:

FedX supports to use resolvable endpoints as federation members. These resolvable repositories are not managed by FedX, but are resolved using a provided RepositoryResolver. An example use case is to reference a repository managed by the rdf4j-server (i.e. from within the RDF4J workbench). Alternatively, any custom resolver can be provided to FedX during the initialization using the FedXFactory, e.g. a LocalRepositoryManager.

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix fedx: <http://www.fluidops.com/config/fedx#> .

<http://myNtiveStore> a sd:Service ;
	fedx:store "ResolvableRepository" ;
	fedx:repositoryName "myNativeStore" .

Note that also hybrid combinations are possible.

Monitoring & Logging

FedX does not rely on a specific logging backend implementation at runtime. To integrate with any logging backends it is possible to use any of the SLF4J adapters.

FedX brings certain facilities to monitor the application state. These facilities are described in the following.

Note: for the following features enableMonitoring must be set in the FedX configuration.

Logging queries

By setting logQueries=true in the FedX configuration, all incoming queries are traced to a logger with the name QueryLogger. If a corresponding configuration is added to the logging backend, the queries can for instance be traced to a file.

Logging the query plan

There are two ways of seeing the optimized query plan:

a) by setting debugQueryPlan=true, the query plan is printed to stdout (which is handy in the CLI or for debugging).

b) by setting logQueryPlan=true the optimized query plan is written to a variable local to the executing thread.The optimized query plan can be retrieved via the QueryPlanLog service, as illustrated in the following abstract snippet.

FedXConfig config = new FedXConfig().withEnableMonitoring(true).withLogQueryPlan(true);
Repository repo = FedXFactory.newFederation()
		.withSparqlEndpoint("http://dbpedia.org/sparql")
		.withSparqlEndpoint("https://query.wikidata.org/sparql")
		.withConfig(config)
		.create();

TupleQuery query = repo.getConnection().prepareTupleQuery(QueryLanguage.SPARQL, <SOME_QUERY>);
.. evaluate query ..

System.out.println("# Optimized Query Plan:");
System.out.println(QueryPlanLog.getQueryPlan());

Monitoring the number of requests

If monitoring is enabled, the number of requests sent to each individual federation member are monitored. All available information can be retrieved by the MonitoringService, which can be retrieved via

MonitoringUtil.getMonitoringService()

The following snippet illustrates a monitoring utility that prints all monitoring information to stdout.

FedXConfig config = new FedXConfig().withEnableMonitoring(true).withLogQueryPlan(true);
FedXRepository repo = FedXFactory.newFederation()
		.withSparqlEndpoint("http://dbpedia.org/sparql")
		.withSparqlEndpoint("https://query.wikidata.org/sparql")
		.withConfig(config)
		.create();
repo.init();

TupleQuery query = ...

.. evaluate queries ..

MonitoringUtil.printMonitoringInformation(repo.getFederationContext());

Back to the top