NAME¶
genpyt - generate the PINYIN lexicon
SYNOPSIS¶
genpyt lexicon-file result-file log-file
slm-file
DESCRIPTION¶
genpyt is used to generate the PINYIN lexicon. It only works on
zh_CN.UTF-8 locale.
ARGUMENTS¶
- lexicon-file
- Specify a dictionary file. It should be a line-based text
file in utf-8 encoding . Each line looks like:
CCC id [pinyin'pinyin'pinyin]*
A default dictionary file can be found at
/usr/share/sunpinyin/dict.utf8.
- result-file
- The output binary PINYIN lexicon file. This lexicon
contains a trie presenting the key tree of PINYIN. And all of the candiate
words are sorted using the unigram in slm-file. This file can be
used with sunpinyin input method engines.
- log-file
- Specify the file to where the log goes. The log-file
can be seen as the human-readble presentation of the binary output
file.
- slm-file
- The language model from which the unigram information are
retrieved. Typically, the slm-file is generated by
slmthread.
AUTHOR¶
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently
maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO¶
slmthread(1).