7.2. Compiling a User Dictionary
In the tools/bin directory, RBL-JE includes a shell script for Unix
rbl-build-user-dictionary
and a .bat file for Windows.
rbl-build-user-dictionary.bat
The script uses Java to compile the user dictionary. The operation is performed in memory, so you may require more than the default heap size. You can set heap size with the JAVA_OPTS
environment variable. For example, to provide 8 GB of heap size, set JAVA_OPTS
to -Xmx8g
.
Unix shell:
export JAVA_OPTS=-Xmx8g
Windows command prompt:
set JAVA_OPTS=-Xmx8g
7.2.1. Compiling a Lemma User Dictionary
From the RBL-JE root directory:
tools/bin/rbl-build-user-dictionary -type analysis LANG INPUT_FILE OUTPUT_FILE
where LANG is the three-letter ISO 639-3 code for the language, INPUT_FILE is the pathname of the source file you have created, and OUTPUT_File is the pathname of the binary compiled dictionary the tool creates. For example:
tools/bin/rbl-build-user-dictionary -type analysis jpn jpn_lemmadict.txt jpn_lemmadict.bin
7.2.2. Compiling a Segmentation User Dictionary
The command line requires -type
segmentation. For example:
tools/bin/rbl-build-user-dictionary -type segmentation jpn jpn_segdict.txt jpn_segdict.bin
7.2.3. Compiling a Many-to-one Normalization User Dictionary
The command line requires -type m1norm. For example:
tools/bin/rbl-build-user-dictionary -type m1norm jpn jpn_m1norm_dict.txt jpn_minorm_dict.bin