Class LuceneSail

All Implemented Interfaces:
FederatedServiceResolverClient, NotifyingSail, Sail, StackableSail

public class LuceneSail extends NotifyingSailWrapper
A LuceneSail wraps an arbitrary existing Sail and extends it with support for full-text search on all Literals.

Setting up a LuceneSail Link icon

LuceneSail works in two modes: storing its data into a directory on the harddisk or into a RAMDirectory in RAM (which is discarded when the program ends). Example with storage in a folder:
 // create a sesame memory sail
 MemoryStore memoryStore = new MemoryStore();

 // create a lucenesail to wrap the memorystore
 LuceneSail lucenesail = new LuceneSail();
 // set this parameter to store the lucene index on disk
 lucenesail.setParameter(LuceneSail.LUCENE_DIR_KEY, "./data/mydirectory");

 // wrap memorystore in a lucenesail
 lucenesail.setBaseSail(memoryStore);

 // create a Repository to access the sails
 SailRepository repository = new SailRepository(lucenesail);
 repository.initialize();
 

Example with storage in a RAM directory:

 // create a sesame memory sail
 MemoryStore memoryStore = new MemoryStore();

 // create a lucenesail to wrap the memorystore
 LuceneSail lucenesail = new LuceneSail();
 // set this parameter to let the lucene index store its data in ram
 lucenesail.setParameter(LuceneSail.LUCENE_RAMDIR_KEY, "true");

 // wrap memorystore in a lucenesail
 lucenesail.setBaseSail(memoryStore);

 // create a Repository to access the sails
 SailRepository repository = new SailRepository(lucenesail);
 

Asking full-text queries Link icon

Text queries are expressed using the virtual properties of the LuceneSail.

In SPARQL:


 SELECT ?subject ?score ?snippet ?resource
 WHERE {
   ?subject <http://www.openrdf.org/contrib/lucenesail#matches> [
      a <http://www.openrdf.org/contrib/lucenesail#LuceneQuery> ;
      <http://www.openrdf.org/contrib/lucenesail#query> "my Lucene query" ;
      <http://www.openrdf.org/contrib/lucenesail#score> ?score ;
      <http://www.openrdf.org/contrib/lucenesail#snippet> ?snippet ;
      <http://www.openrdf.org/contrib/lucenesail#resource> ?resource
   ]
 }
 
When defining queries, these properties type and query are mandatory. Also, the matches relation is mandatory. When one of these misses, the query will not be executed as expected. The failure behavior can be configured, setting the Sail property "incompletequeryfail" to true will throw a SailException when such patterns are found, this is the default behavior to help finding inaccurate queries. Set it to false to have warnings logged instead. Multiple queries can be issued to the sail, the results of the queries will be integrated. Note that you cannot use the same variable for multiple Text queries, if you want to combine text searches, use Lucenes query syntax.

Fields are stored/indexed Link icon

All fields are stored and indexed. The "text" fields (gathering all literals) have to be stored, because when a new literal is added to a document, the previous texts need to be copied from the existing document to the new Document, this does not work when they are only "indexed". Fields that are not stored, cannot be retrieved using full-text querying.

Deleting a Lucene index Link icon

At the moment, deleting the lucene index can be done in two ways:

Handling of Contexts Link icon

Each lucene document contains a field for every contextIDs that contributed to the document. NULL contexts are marked using the String SearchFields.CONTEXT_NULL ("null") and stored in the lucene field SearchFields.CONTEXT_FIELD_NAME ("context"). This means that when adding/appending to a document, all additional context-uris are added to the document. When deleting individual triples, the context is ignored. In clear(Resource ...) we make a query on all Lucene-Documents that were possibly created by this context(s). Given a document D that context C(1-n) contributed to. D' is the new document after clear(). - if there is only one C then D can be safely removed. There is no D' (I hope this is the standard case: like in ontologies, where all triples about a resource are in one document) - if there are multiple C, remember the uri of D, delete D, and query (s,p,o, ?) from the underlying store after committing the operation- this returns the literals of D', add D' as new document This will probably be both fast in the common case and capable enough in the multiple-C case.

Defining the indexed Fields Link icon

The property INDEXEDFIELDS is to configure which fields to index and to project a property to another. Syntax:
 # only index label and comment
 index.1=http://www.w3.org/2000/01/rdf-schema#label
 index.2=http://www.w3.org/2000/01/rdf-schema#comment
 # project http://xmlns.com/foaf/0.1/name to rdfs:label
 http\://xmlns.com/foaf/0.1/name=http\://www.w3.org/2000/01/rdf-schema#label
 

Set and select Lucene sail by id Link icon

The property INDEX_ID is to configure the id of the index and filter every request without the search:indexid predicate, the request would be:

 ?subj search:matches [
 	      search:indexid my:lucene_index_id;
 	      search:query "search terms...";
 	      search:property my:property;
 	      search:score ?score;
 	      search:snippet ?snippet ] .
 

If a LuceneSail is using another LuceneSail as a base sail, the evaluation mode should be set to TupleFunctionEvaluationMode.NATIVE.

Defining the indexed Types/Languages Link icon

The properties INDEXEDTYPES and INDEXEDLANG are to configure which fields to index by their language or type. INDEXEDTYPES Syntax:
 # only index object of rdf:type ex:mytype1, rdf:type ex:mytype2 or ex:mytypedef ex:mytype3
 http\://www.w3.org/1999/02/22-rdf-syntax-ns#type=http://example.org/mytype1 http://example.org/mytype2
 http\://example.org/mytypedef=http://example.org/mytype3
 

INDEXEDLANG Syntax:

 # syntax to index only French(fr) and English(en) literals
 fr en
 

Datatypes Link icon

Datatypes are ignored in the LuceneSail.
  • Field Details Link icon

    • REINDEX_QUERY_KEY Link icon

      public static final String REINDEX_QUERY_KEY
      Set the parameter "reindexQuery=" to configure the statements to index over. Default value is "SELECT ?s ?p ?o ?c WHERE {{?s ?p ?o} UNION {GRAPH ?c {?s ?p ?o.}}} ORDER BY ?s" . NB: the query must contain the bindings ?s, ?p, ?o and ?c and must be ordered by ?s.
      See Also:
    • INDEXEDFIELDS Link icon

      public static final String INDEXEDFIELDS
      Set the parameter "indexedfields=..." to configure a selection of fields to index, and projections of properties. Only the configured fields will be indexed. A property P projected to Q will cause the index to contain Q instead of P, when triples with P were indexed. Syntax of indexedfields - see above
      See Also:
    • INDEXEDTYPES Link icon

      public static final String INDEXEDTYPES
      Set the parameter "indexedtypes=..." to configure a selection of field type to index. Only the fields with the specific type will be indexed. Syntax of indexedtypes - see above
      See Also:
    • INDEXEDLANG Link icon

      public static final String INDEXEDLANG
      Set the parameter "indexedlang=..." to configure a selection of field language to index. Only the fields with the specific language will be indexed. Syntax of indexedlang - see above
      See Also:
    • INDEX_TYPE_BACKTRACE_MODE Link icon

      public static final String INDEX_TYPE_BACKTRACE_MODE
      See Also:
    • LUCENE_DIR_KEY Link icon

      public static final String LUCENE_DIR_KEY
      Set the key "lucenedir=<path>" as sail parameter to configure the Lucene Directory on the filesystem where to store the lucene index.
      See Also:
    • DEFAULT_LUCENE_DIR Link icon

      public static final String DEFAULT_LUCENE_DIR
      Set the default directory of the Lucene index files. The value is always relational to the dataDir location as a parent directory.
      See Also:
    • LUCENE_RAMDIR_KEY Link icon

      public static final String LUCENE_RAMDIR_KEY
      Set the key "useramdir=true" as sail parameter to let the LuceneSail store its Lucene index in RAM. This is not intended for production environments.
      See Also:
    • DEFAULT_NUM_DOCS_KEY Link icon

      public static final String DEFAULT_NUM_DOCS_KEY
      Set the key "defaultNumDocs=<n>" as sail parameter to limit the maximum number of documents to return from a search query. The default is to return all documents. NB: this may involve extra cost for some SearchIndex implementations as they may have to determine this number.
      See Also:
    • MAX_DOCUMENTS_KEY Link icon

      public static final String MAX_DOCUMENTS_KEY
      Set the key "maxDocuments=<n>" as sail parameter to limit the maximum number of documents the user can query at a time to return from a search query. The default is the value of the DEFAULT_NUM_DOCS_KEY parameter.
      See Also:
    • WKT_FIELDS Link icon

      public static final String WKT_FIELDS
      Set this key to configure which fields contain WKT and should be spatially indexed. The value should be a space-separated list of URIs. Default is http://www.opengis.net/ont/geosparql#asWKT.
      See Also:
    • INDEX_CLASS_KEY Link icon

      public static final String INDEX_CLASS_KEY
      Set this key to configure the SearchIndex class implementation. Default is org.eclipse.rdf4j.sail.lucene.LuceneIndex.
      See Also:
    • INDEX_ID Link icon

      public static final String INDEX_ID
      Set this key to configure the filtering of queries, if this parameter is set, the match object should contain the search:indexid parameter, see the syntax above
      See Also:
    • DEFAULT_INDEX_CLASS Link icon

      public static final String DEFAULT_INDEX_CLASS
      See Also:
    • ANALYZER_CLASS_KEY Link icon

      public static final String ANALYZER_CLASS_KEY
      Set this key as sail parameter to configure the Lucene analyzer class implementation to use for text analysis.
      See Also:
    • QUERY_ANALYZER_CLASS_KEY Link icon

      public static final String QUERY_ANALYZER_CLASS_KEY
      Set this key as sail parameter to configure the Lucene analyzer class implementation used for query analysis. In most cases this should be set to the same value as ANALYZER_CLASS_KEY
      See Also:
    • SIMILARITY_CLASS_KEY Link icon

      public static final String SIMILARITY_CLASS_KEY
      Set this key as sail parameter to configure Similarity class implementation to use for text analysis.
      See Also:
    • INCOMPLETE_QUERY_FAIL_KEY Link icon

      public static final String INCOMPLETE_QUERY_FAIL_KEY
      Set this key as sail parameter to influence whether incomplete queries are treated as failure (Malformed queries) or whether they are ignored. Set to either "true" or "false". When omitted in the properties, true is default (failure on incomplete queries). see isIncompleteQueryFails()
      See Also:
    • EVALUATION_MODE_KEY Link icon

      public static final String EVALUATION_MODE_KEY
      See Also:
    • FUZZY_PREFIX_LENGTH_KEY Link icon

      public static final String FUZZY_PREFIX_LENGTH_KEY
      Set this key as sail parameter to influence the fuzzy prefix length.
      See Also:
    • parameters Link icon

      protected final Properties parameters
  • Constructor Details Link icon

    • LuceneSail Link icon

      public LuceneSail()
  • Method Details Link icon

    • setLuceneIndex Link icon

      public void setLuceneIndex(SearchIndex luceneIndex)
    • getLuceneIndex Link icon

      public SearchIndex getLuceneIndex()
    • getConnection Link icon

      public NotifyingSailConnection getConnection() throws SailException
      Description copied from interface: Sail
      Opens a connection on the Sail which can be used to query and update data. Depending on how the implementation handles concurrent access, a call to this method might block when there is another open connection on this Sail.
      Specified by:
      getConnection in interface NotifyingSail
      Specified by:
      getConnection in interface Sail
      Overrides:
      getConnection in class NotifyingSailWrapper
      Throws:
      SailException - If no transaction could be started, for example because the Sail is not writable.
    • shutDown Link icon

      public void shutDown() throws SailException
      Description copied from interface: Sail
      Shuts down the Sail, giving it the opportunity to synchronize any stale data. Care should be taken that all initialized Sails are being shut down before an application exits to avoid potential loss of data. Once shut down, a Sail can no longer be used until it is re-initialized.
      Specified by:
      shutDown in interface Sail
      Overrides:
      shutDown in class SailWrapper
      Throws:
      SailException - If the Sail object encountered an error or unexpected situation internally.
    • setDataDir Link icon

      public void setDataDir(File dataDir)
      Description copied from interface: Sail
      Sets the data directory for the Sail. The Sail can use this directory for storage of data, parameters, etc. This directory must be set before the Sail is
      invalid reference
      initialized
      .
      Specified by:
      setDataDir in interface Sail
      Overrides:
      setDataDir in class SailWrapper
    • init Link icon

      public void init() throws SailException
      Description copied from interface: Sail
      Initializes the Sail. Care should be taken that required initialization parameters have been set before this method is called. Please consult the specific Sail implementation for information about the relevant parameters.
      Specified by:
      init in interface Sail
      Overrides:
      init in class SailWrapper
      Throws:
      SailException - If the Sail could not be initialized.
    • createSearchIndex Link icon

      @Deprecated protected static SearchIndex createSearchIndex(Properties parameters) throws Exception
      Deprecated.
      Parameters:
      parameters -
      Returns:
      search index
      Throws:
      Exception
    • initializeLuceneIndex Link icon

      protected void initializeLuceneIndex() throws Exception
      Throws:
      Exception
    • setParameter Link icon

      public void setParameter(String key, String value)
    • getParameter Link icon

      public String getParameter(String key)
    • getParameterNames Link icon

      public Set<String> getParameterNames()
    • getReindexQuery Link icon

      public String getReindexQuery()
      See REINDEX_QUERY_KEY parameter.
    • setReindexQuery Link icon

      public void setReindexQuery(String query)
      See REINDEX_QUERY_KEY parameter.
    • isIncompleteQueryFails Link icon

      public boolean isIncompleteQueryFails()
      When this is true, incomplete queries will trigger a SailException. You can set this value either using setIncompleteQueryFails(boolean) or using the parameter "incompletequeryfail"
      Returns:
      Returns the incompleteQueryFails.
    • setIncompleteQueryFails Link icon

      public void setIncompleteQueryFails(boolean incompleteQueryFails)
      Set this to true, so that incomplete queries will trigger a SailException. Otherwise, incomplete queries will be logged with level WARN. Default is true. You can set this value also using the parameter "incompletequeryfail".
      Parameters:
      incompleteQueryFails - true or false
    • getEvaluationMode Link icon

      public TupleFunctionEvaluationMode getEvaluationMode()
      See EVALUATION_MODE_KEY parameter.
    • setEvaluationMode Link icon

      public void setEvaluationMode(TupleFunctionEvaluationMode mode)
      See EVALUATION_MODE_KEY parameter.
    • getIndexBacktraceMode Link icon

      public TypeBacktraceMode getIndexBacktraceMode()
    • setIndexBacktraceMode Link icon

      public void setIndexBacktraceMode(TypeBacktraceMode mode)
    • setFuzzyPrefixLength Link icon

      public void setFuzzyPrefixLength(int fuzzyPrefixLength)
    • getTupleFunctionRegistry Link icon

      public TupleFunctionRegistry getTupleFunctionRegistry()
    • setTupleFunctionRegistry Link icon

      public void setTupleFunctionRegistry(TupleFunctionRegistry registry)
    • getFederatedServiceResolver Link icon

      public FederatedServiceResolver getFederatedServiceResolver()
      Description copied from interface: FederatedServiceResolverClient
      Gets the FederatedServiceResolver used by this client.
      Specified by:
      getFederatedServiceResolver in interface FederatedServiceResolverClient
      Overrides:
      getFederatedServiceResolver in class SailWrapper
    • setFederatedServiceResolver Link icon

      public void setFederatedServiceResolver(FederatedServiceResolver resolver)
      Description copied from interface: FederatedServiceResolverClient
      Sets the FederatedServiceResolver to use for this client.
      Specified by:
      setFederatedServiceResolver in interface FederatedServiceResolverClient
      Overrides:
      setFederatedServiceResolver in class SailWrapper
      Parameters:
      resolver - The resolver to use.
    • reindex Link icon

      public void reindex() throws SailException
      Starts a reindexation process of the whole sail. Basically, this will delete and add all data again, a long-lasting process.
      Throws:
      SailException - If the Sail could not be reindex
    • registerStatementFilter Link icon

      public void registerStatementFilter(IndexableStatementFilter filter)
      Sets a filter which determines whether a statement should be considered for indexing when performing complete reindexing.
    • acceptStatementToIndex Link icon

      protected boolean acceptStatementToIndex(Statement s)
    • mapStatement Link icon

      public Statement mapStatement(Statement statement)
    • getSearchQueryInterpreters Link icon

      protected Collection<SearchQueryInterpreter> getSearchQueryInterpreters()