Chapter 1. Introduction

Rosette Base Linguistics Java Edition (RBL-JE) provides a set of linguistic analysis tools. Language-specific modules can provide base forms (lemmas) of words, parts-of-speech, compound components, normalized tokens, stems, and roots. RBL-JE also includes a Chinese Script Converter (CSC) [72] , which converts tokens in Traditional Chinese text to Simplified Chinese and vice versa. See the RBL-JE Feature Set[12].

1.1. APIs

You can use RBL-JE's core APIs in your own JVM application, use its Apache Lucene compatible API in a Lucene application, or integrate it directly with wither Apache Solr or Elasticsearch1.

RBL-JE API Hierarchy
RBL-JE API Hierarchy

JVM Applications. You can use the core classes that RBL-JE provides to integrate base linguistics functionality in your own applications. See Using the RBL-JE Core Classes [19].

Lucene. In an Apache Lucene application, you can use a custom Lucene Analyzer in conjunction with a Base Linguistics Tokenizer and Token Filter to produce an enriched token stream for indexing documents and for queries. See Using RBL-JE in Lucene [41].

Solr. With a few simple steps, you can configure an Apache Solr search server to use RBL-JE for both indexing documents and for queries. See Using RBL-JE in Apache Solr [56].

Elasticsearch. Basis provides a separate Elasticsearch plugin that you can install to use RBL-JE for analysis, indexing, and queries.

1. Apache LuceneTM, LuceneTM, Apache SolrTM, and SolrTM are trademarks of the Apache Software Foundation. ElasticsearchTM is a trademark of Elasticsearch BV.

results matching ""

    No results matching ""