The queryIndex/10 built-in executes a search in a Lucene full-text index. Typically this index is an automatically managed index of the OntoBroker, but external indexes can also be queried. The built-in has 10 arguments, the first five must be bound.
Argument |
Bound/ |
Description |
<module> |
b |
The module whose index should be queried |
<option list> |
b |
List of optional parameter (see list below for details) |
<lucene query text> |
b |
The query string (Lucene query syntax) |
<offset> |
b |
Index of first hit to return (starts with 0) |
<limit> |
b |
Maximal number of hits to return |
<object> |
f |
Term for object hit |
<total count> |
f |
Total count of hits |
<score> |
f |
Lucene ranking for the hit |
<order> |
f |
Order number to sort the hits in the correct order |
<optional output list> |
f |
Contents depends on <option list> |
The <option list> parameter consists of a list of optional parameter. If no optional parameters should be specified, use the empty list, i.e. []
Supported optional parameter for <option list> argument:
return(<field>)
The content of the field is returned for the hit in the <optional output list> variable. Note that the field must be defined as “stored” in the Lucene index, otherwise nothing is returned.
Example:
?- _queryIndex(module1, [return("name_en")], "name_en:foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
For every “name_en” field of a hit document in the Lucene index, the <optional output list> will contain an item name_en("content of field")
stringmetric(<metric>)
Sets the string metric to be used for the fuzzy search. If this option is not set, the default value is used, either explicitly specified by the property “defaultStringMetric” in the fulltextindex-config.xml or if this is also not set, the string metric “Jaro” is used.
Supported string metric values are:
"Levenstein", "MongeElkan", "NeedlemanWunch", "QGrams", "Jaro",
"JaroScaled", "JaroWinkler", "DamerauLevenshtein", "DamerauLevenshteinScaled", "MaxJaroDamerauLevenshteinScaled", "DamerauLevenshteinSoundex", "Jaccard", "Soundex", "SmithWaterman"
Side remark:
You can use the built-in distance2 to see how two strings compare using one of these string metrics, e.g.
?- _distance2("Jaro", "good", "food", 0, ?X).
?X will return a similarity value (between 0 and 1.0), here 0.833
If you perform a fuzzy search with the string metric Jaro, e.g. Lucene query text "good~0.8", this will match "food", as 0.833 is >= 0.8
Example:
?- _queryIndex(module1, [stringmetric("Jaro")], "name_en:good~0.8", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Use string metric "Jaro" for fuzzy search
includeall
Includes all imported modules of <module> (first argument) in the search.
Example:
?- _queryIndex(module1, [includeAll], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
defaultfield(<field>)
Sets the default field to be used for search terms whose field is not explicitly given. E.g. if you have the Lucene query text "all:foo bar", the search term "foo" is searched in the field "all" and bar is searched in the default field.
If the default field is not set in the option, the default field specified in the fulltextindex-schema.xml (tag defaultSearchField) is used. If this is also not set, the default field is "all".
Example:
?- _queryIndex(module1, [defaultfield("name_en")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Use field "name_en" as default field
solrparam(<name>,<value>)
Sets additional Apache Solr parameters. The queryIndex built-in uses also the core of Apache Solr on top of Lucene. With this option you can set one or multiple parameters for this layer.
Example:
?- _queryIndex(module1, [solrparam("hl", "true"), solrparam("hl.fl","name_en"),
solrparam("hl.snippets", "2"),
solrparam("hl.fragsize", "200")")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
These parameters enable the Solr highlighting. Please note that only stored fields can be used for highlighting. More details about the Solr parameter for highlighting can be found here:
http://wiki.apache.org/solr/HighlightingParameters
externalindexesonly
If this option is set, the module in the first argument is ignored. Note that in this case the option externalindex(<path>) must be set.
externalindex(<path>)
Adds one or multiple external Lucene indexes for the search. Note that the used fields must nonetheless be defined in the fulltextindex-config.xml.
Example:
?- _queryIndex(dummy, [externalindexesonly, externalindex("d:/index1"),externalindex("d:/index2")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Includes the Lucene indexes located in the directory d:\index1 and d:\index2
Extended syntax for <Lucene query text> argument
You can use the Solr query syntax extensions in the <Lucene query text> argument. This allows using customized query parsers to add new functionality to the search. A customized query parser is specified by starting the query text with “{!parsername param1=value1 param2=value2}”. Here parsername is the name of the query parser, param1, value1, param2, value2 are sample parameter/value pairs.
OntoBroker currently supports two extended query parsers: lucene and multifield
lucene
This is a normal Lucene query which some additional parameters specified directly in the query text.
Parameter |
Description |
q.op |
Default operator (either AND or OR). The standard default operator is AND |
df |
Default field (see above) |
stringMetric |
String metric (see above) |
sort |
Sort results, e.g. sort='id desc' Important restriction: Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer) |
Example:
?- _queryIndex(module1, [], "{!lucene df=name_en q.op=OR sort='id asc'} foo bar", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Queries for "foo OR bar" in the default field "name_en" and sorting the results ascending by the field id.
multifield
This query parser searches for the search terms in multiple default fields. It supports the same parameters as the lucene query parser plus additionally:
Parameter |
Description |
fields |
Fields to search, e.g. fields='name_en^2 docu_en'. Here a hit in the field name_en is boosted additionally by a factor 2 |
Example:
?- _queryIndex(module1, [return(name_en),return(docu_en)], "{!multifield fields='name_en docu_en' q.op=OR sort='id asc'} foo bar", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Queries for "foo OR bar" in the fields "name_en" and "docu_en" and sorting the results ascending by the field id and returns fields "name_en" and "docu_en".
Available fields for objects in modules
If the fulltext indexing is enabled, OntoBroker creates index entries for every ObjectLogic object which is used in the given module as concept, instance, attribute or relation. This means hits are always to the indexed ObjectLogic object, whose term is returned in the <object> parameter.
Side remark
Fulltext indexing is enabled by the OntoConfig.prp parameter, e.g.
FullTextIndex = on
The fields in the full-text index are defined in the fulltextindex-config.xml and fulltextindex-schema.xml. (see section “Fulltext indexer settings” in the OntoBroker Manual Appendix for details).
As a default, the following fields are filled for every object:
Field |
Stored |
Indexed |
Description |
id |
yes |
yes (untokenized) |
This field stores the untokenized ObjectLogic term representation of the object. |
lid |
yes |
yes |
Field for indexing the localname of the ObjectLogic object term (for terms which are not IRI this is the same as the id) |
type |
yes |
yes |
This field contains the types of the object: i = Instance c = Concept a = Attribute specification r = Relation specification p = Property specification u = Rule q = Query t = Constraint |
assertedisa |
yes |
yes |
For instances this field contains the ids of its concepts |
repr_de repr_en … |
yes |
yes |
Contains the language-dependent label for a given object |
docu_de docu_en … |
no |
yes |
Contains the language-dependent documentation for a given object |
syn_de |
yes |
yes |
Contains language-dependent synonyms |
name_de name_en ... |
yes |
yes |
This field contains the label and the synonyms in the given language. By default the indexer only creates fields for the languages “de” and “en”. If this field is returned (e.g. <option list> = […,return(name_en),…]), the first line always contains the label. |
syn |
yes |
yes |
Contains all synonyms for all languages |
all |
no |
yes |
Contains all text of the fields lid, name_{lang}, docu_{lang}, attval, syn |
axiomtext |
yes |
yes |
Contains the rule text for rules, queries and constraints. |
All fields which are indexed can be used in the query text. For all fields which are stored a “return” option can be specified.
Example:
?- _queryIndex(<http://company.com#onto1>,
[return(name_en),return(type),includeall], "+name_en:city
+type:i +assertedisa:\"<http://company.com#Region>\"", 0, 20,
?OBJ,?TC,?SCORE,?ORDER,?OPT).
This query searches for instances of <http://company.com#Region> in the module <http://company.com#onto1> whose English representation or synonym contains the word “city”.
Here are some more examples for valid Lucene query text:
all:city |
searches in the field “all” for the word “city” |
|---|---|
city |
same as “all:city” |
+name_en:village +type:i |
searches for instances whose English representation or synonym contains the word “village” |
id:"http://company.com#Project" |
Searches for the object with the id <http://company.com#Project> |