Chapter 10. Removing Unnecessary Files
Depending on the scope of your application, you may wish to remove files that you know will not be needed, in order to cut down on the size of your application.
10.1. Tool Files
Once you are done building any user-defined dictionaries [59] if necessary (also see
CSC User Dictionaries [78] ), and if you won't be using the
RBL-JE Command Line Utility [31] , you may freely delete the entire root/tools directory.
10.2. Language-Specific Model and Dictionary Files
Generally, you can remove files from root/dicts and root/models [17] that represent languages your application does not need to support. Additionally, if your distribution platform is of a particular endianness, you can elect to remove the models of the opposite endianness. When applicable, the endianness of a file is given at the end of the file name: for example, the file root/ models/eng-model-LE.bin is a little-endian binary storing the English language model, whereas root/models/eng-model-BE.bin is the same but stored in big-endian format. More details are given below in Required Files by Language [82].
10.2.1. Required Files by Language
The following table gives a manifest of files in root/dicts and root/models. For each file, we list the language(s) dependent on the file, as well as if that file is necessary for big-endian and little-endian machines.
File name | BE | LE | Language[s] |
---|---|---|---|
root/dicts/ara/*-BE.bin | ✓ | Arabic | |
root/dicts/ara/*-LE.bin | ✓ | Arabic | |
root/dicts/ces/*-BE.bin | ✓ | Czech | |
root/dicts/ces/*-LE.bin | ✓ | Czech | |
root/dicts/csc/sc2tc_codepoint.txt | ✓ | ✓ | Chinese (Simplified and Traditional) a |
root/dicts/csc/tc2sc_codepoint.txt | ✓ | ✓ | Chinese (Simplified and Traditional) |
root/dicts/csc/*_BE.bin | ✓ | Chinese (Simplified and Traditional) | |
root/dicts/csc/*_LE.bin | ✓ | Chinese (Simplified and Traditional) | |
root/dicts/dan/*-BE.bin | ✓ | Danish | |
root/dicts/dan/*-LE.bin | ✓ | Danish | |
root/dicts/deu/*-BE.bin | ✓ | German | |
root/dicts/deu/*-LE.bin | ✓ | German | |
root/dicts/ell/*-BE.bin | ✓ | Greek | |
root/dicts/ell/*-LE.bin | ✓ | Greek | |
root/dicts/eng/*-BE.bin | ✓ | English | |
root/dicts/eng/*-LE.bin | ✓ | English | |
root/dicts/fas/*-BE.bin | ✓ | Persian (Western Farsi and Dari) | |
root/dicts/fas/*-LE.bin | ✓ | Persian (Western Farsi and Dari) | |
root/dicts/fra/*-BE.bin | ✓ | French | |
root/dicts/fra/*-LE.bin | ✓ | French | |
root/dicts/hun/*-BE.bin | ✓ | Hungarian | |
root/dicts/hun/*-LE.bin | ✓ | Hungarian | |
root/dicts/ita/*-BE.bin | ✓ | Italian | |
root/dicts/ita/*-LE.bin | ✓ | Italian | |
root/dicts/jpn/jla/*_BE.bin | ✓ | Japanese b | |
root/dicts/jpn/jla/*_LE.bin | ✓ | Japanese | |
root/dicts/jpn/jla/*_BE_Reading.bin | ✓ | Japanese | |
root/dicts/jpn/jla/*_LE_Reading.bin | ✓ | Japanese | |
root/dicts/jpn/jla/JP_stop.utf8 | ✓ | ✓ | Japanese |
root/dicts/jpn/*-BE.bin | ✓ | Japanese | |
root/dicts/jpn/*-LE.bin | ✓ | Japanese | |
root/dicts/nld/*-BE.bin | ✓ | Dutch | |
root/dicts/nld/*-LE.bin | ✓ | Dutch | |
root/dicts/nno/*-BE.bin | ✓ | Norwegian (Nynorsk) | |
root/dicts/nno/*-LE.bin | ✓ | Norwegian (Nynorsk) | |
root/dicts/nob/*-BE.bin | ✓ | Norwegian (Bokmål) | |
root/dicts/nob/*-LE.bin | ✓ | Norwegian (Bokmål) | |
root/dicts/pol/*-BE.bin | ✓ | Polish | |
root/dicts/pol/*-LE.bin | ✓ | Polish | |
root/dicts/por/*-BE.bin ✓ Portuguese | ✓ | Portuguese | |
root/dicts/por/*-LE.bin | ✓ | Portuguese | |
root/dicts/rus/*-BE.bin | ✓ | Russian | |
root/dicts/rus/*-LE.bin ✓ | ✓ | Russian | |
root/dicts/spa/*-BE.bin | ✓ | Spanish | |
root/dicts/spa/*-LE.bin ✓ | ✓ | Spanish | |
root/dicts/swe/*-BE.bin | ✓ | Swedish | |
root/dicts/swe/*-LE.bin | ✓ | Swedish | |
root/dicts/tha/*-BE.bin | ✓ | Thai | |
root/dicts/tha/*-LE.bin | ✓ | Thai | |
root/dicts/urd/*-BE.bin | ✓ | Urdu | |
root/dicts/urd/*-LE.bin | ✓ | Urdu | |
root/dicts/zho/cla/*_BE.bin | ✓ | Chinese (Simplified and Traditional) c | |
root/dicts/zho/cla/*_LE.bin | ✓ | Chinese (Simplified and Traditional) | |
root/dicts/zho/cla/zh_stop.utf8 | ✓ | ✓ | Chinese (Simplified and Traditional) |
root/dicts/zho/*-BE.bin | ✓ | Chinese (Simplified and Traditional) | |
root/dicts/zho/*-LE.bin | ✓ | Chinese (Simplified and Traditional) | |
root/models/ara-model-BE.bin | ✓ | Arabic | |
root/models/ara-model-LE.bin | ✓ | Arabic | |
root/models/dinflections-BE.bin | ✓ | Hebrew | |
root/models/dinflections-LE.bin | ✓ | Hebrew | |
root/models/dprefixes.data | ✓ | ✓ | Hebrew |
root/models/eng-model-BE.bin | ✓ | English | |
root/models/eng-model-LE.bin | ✓ | English | |
root/models/gimatria.data | ✓ | ✓ | Hebrew |
root/models/jpn-model-BE.bin | ✓ | Japanese d | |
root/models/jpn-model-LE.bin | ✓ | Japanese | |
root/models/jpn.model | ✓ | ✓ | Japanese |
root/models/kor-model-BE.bin | ✓ | Korean | |
root/models/kor-model-LE.bin | ✓ | Korean | |
root/models/kor.model | ✓ | ✓ | Korean |
root/models/tha.model | ✓ | ✓ | Thai |
root/models/zho.model | ✓ | ✓ | Chinese (Simplified and Traditional) e |
a. The CSC files are only necessary for Chinese Script Conversion [72] . ↩
b. The JLA files are only necessary if the alternativeTokenization option is set. For more information, see Tokenizers [9] and the Table of API Options [86]. ↩
c. The CLA files are only necessary if the alternativeTokenization option is set. For more information, see Tokenizers [9] and the Table of API Options [86] . ↩
d. The three Japanese model files (jpn-model-BE.bin, jpn-model-LE.bin, and jpn.model) are only necessary when alternativeTokenization is set to false. ↩
e. zho.model is only necessary when alternativeTokenization is set to false. ↩