table of contents
KYTEA(1) | General Commands Manual | KYTEA(1) |
NAME¶
kytea — a word segmentation/pronunciation estimation tool
SYNOPSIS¶
kytea [options]
DESCRIPTION¶
This manual page documents briefly the kytea command.
This manual page was written for the Debian distribution because the original program does not have a manual page. Instead, it has documentation in the GNU Info format; see below.
kytea is morphological analysis system based on pointwise predictors. It separetes sentences into words, tagging and predict pronunciations. The pronunciation of KyTea is same as cutie.
OPTIONS¶
A summary of options is included below.
Analysis Options:¶
- -model
- The model file to use when analyzing text
- -nows
- Don't do word segmentation (raw input cannot be accepted)
- Do only word segmentation, no tagging
- -notag
- Skip the tag of the nth tag (n starts at 1)
- -nounk
- Don't estimate the pronunciation of unknown words
- -wsconst
- Specifies character types to not be segmented (e.g. D for digits)
- -unkbeam
- The width of the beam to use in beam search for unknown words (default 50, 0 for full search)
- -debug
- The debugging level (0=silent, 1=simple, 2=detailed)
Format Options:¶
- -in
- The formatting of the input (raw/tok/full/part/conf, default raw)
- -out
- The formatting of the output (full/part/conf/eda/tags, default full)
- -tagmax
- The maximum number of tags to print for one word (default 3, 0 implies no limit)
- -deftag
- A tag for words that cannot be given any tag (for example, unknown words that contain a character not in the subword dictionary)
- -unktag
- A tag to append to indicate words not in the dictionary
Format Options (for advanced users):¶
- -wordbound
- The separator for words in full annotation (" ")
- -tagbound
- The separator for tags in full/partial annotation ("/")
- -elembound
- The separator for candidates in full/partial annotation ("&")
- -unkbound
- Indicates unannotated boundaries in partial annotation (" ")
- -skipbound
- Indicates skipped boundaries in partial annotation ("?")
- -nobound
- Indicates non-existence of boundaries in partial annotation ("-")
- -hasbound
- Indicates existence of boundaries in partial annotation ("|")
AUTHOR¶
This manual page was written by Koichi Akabe vbkaisetsu@gmail.com for the Debian system (and may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation.
On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL.