Appendix H. POS Tag Map

The universalPosTags option converts Basis POS tags to universal POS tags. The POS tag mappings are defined by POS tag map files. By default, the annotator uses the map in rootDirectory/upt-16/upt-16-language.yaml, where language is an ISO 639-3 language code [109] . customPosTagsUri allows you to specify custom POS tag mappings.

H.1. POS Tag Map File Format

A POS tag map file is a YAML file encoded in UTF-8. It is a sequence of mapping rules.

A mapping rule is a sequence of two elements: the POS tag to be mapped and a sequence of submappings. Rules are checked in the order they appear in the rules file. A token which matches a rule is not checked against any further rules.

A submapping is a mapping with the keys m, s, and t. m is a Java regular expression. s is a surface form. m and s are optional: they can be omitted or null. t specifies the output POS tag to use when the following criteria are met:

  • The input token's POS tag equals the POS tag to be mapped.
  • m (if any) matches a substring of the input token's morphological tags.
  • s (if any) equals the input token's surface form, compared case-insensitively.

H.2. Example

-
    - NUM_VOC
    -
        - { m: \+Total, t: PRON }
        - { s: moc, t: DET }
        - { s: oba, t: DET }
        - { t: NUM }

This rule maps tokens with Basis's NUM_VOC POS tag. If the input token's morphological tags match the regular expression +Total, the token becomes a PRON. Otherwise, if the token's surface form is moc or oba, the token becomes a DET. Otherwise, the token becomes a NUM.

results matching ""

    No results matching ""