A.4. Plugin Examples
A.4.1. Using Options in Solr
In Solr, you do not directly use the option enum classes. Instead, options are specified in the schema using the format option="value"
.
When specifying options for the tokenizer (class BaseLinguisticsTokenizerFactory), use options in the TokenizerOption
enum class. When specifying options for a filter
(class BaseLinguisticsTokenFilterFactory
), use options in the AnalyzerOption
and FilterOption
enum classes.
Here is an example:
<fieldType class="solr.TextField" name="basis-french">
<analyzer type="index">
<tokenizer class="com.basistech.rosette.lucene.BaseLinguisticsTokenizerFactory" language="fra" rootDirectory="${bt_root}"/>
<filter class="com.basistech.rosette.lucene.BaseLinguisticsTokenFilterFactory" language="fra" rootDirectory="${bt_root}"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="com.basistech.rosette.lucene.BaseLinguisticsTokenizerFactory" language="fra" query="true" rootDirectory="${bt_root}"/>
<filter class="com.basistech.rosette.lucene.BaseLinguisticsTokenFilterFactory" language="fra" query="true" rootDirectory="${bt_root}"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Here is an example of specifying alternative tokenization options [96] in YAML:
<fieldType class="solr.TextField" name="basis-japanese">
<analyzer>
<tokenizer alternativeTokenization="true"
class="com.basistech.rosette.lucene.BaseLinguisticsTokenizerFactory"
language="jpn"
alternativeTokenizationOptions="{DecomposeCompounds: false}"
rootDirectory="${bt_root}"/>
<filter alternativeTokenization="true"
class="com.basistech.rosette.lucene.BaseLinguisticsTokenFilterFactory"
language="jpn"
rootDirectory="${bt_root}"/>
</analyzer>
</fieldType>
A.4.2. Using Options in Elasticsearch
For Elasticsearch, as with Solr, you do not directly use the option enum classes but rather specify their names in key-value pairs.
For tokenizer, use the options available in the enum class TokenizerOption. For the filter, use options in the AnalyzerOption
and FilterOption
classes.
Note. When using the RBL-JE within Elasticsearch, you must set the type
to rbl
when using an RBL-JE analyzer
, tokenizer
, or filter
.
Here is an example of using an English analyzer:
{
"settings":{
"analysis":{
"analyzer":{
"rbl":{
"type":"rbl",
"language":"eng"
}
}
}
}
}
Here is an example of specifying alternative tokenization options [96] in YAML:
{
"settings":{
"analysis":{
"analyzer":{
"rbl":{
"type":"custom",
"language":"jpn",
"tokenizer":"rbl_tok",
"filter":["rbl_filter"]
}
},
"tokenizer":{
"rbl_tok":{
"type":"rbl",
"language":"jpn",
"alternativeTokenization":"true",
"alternativeTokenizationOptions":"{DecomposeCompounds: false}"
}
},
"filter":{
"rbl_filter":{
"type":"rbl",
"language":"jpn",
"alternativeTokenization":"true"
}
}
}
}
}
A.4.3. Lucene options
Go to the section on Lucene [42] for examples of the API options.
A.4.4. Core API options
Go to the section on Core [19] for examples of the API options.