table of contents
other versions
- experimental 0.4.1-2+b1
ESTIMATE-NGRAM(1) | User Commands | ESTIMATE-NGRAM(1) |
NAME¶
estimate-ngram - estimates n-gram language model
SYNOPSIS¶
estimate-ngram [Options]
DESCRIPTION¶
Estimates an n-gram language model by cumulating n-gram count statistics, smoothing observed counts, and building a backoff n-gram model. Parameters can be optionally tuned to optimize development set performance.
Filename argument can be an ASCII file, a compressed file (ending in .Z or .gz), or '-' to indicate stdin/stdout.
OPTIONS¶
- -h, -help
- Print this message.
- -verbose <int>
- Set verbosity level.
- Default: 1
- -o, -order <int>
- Set the n-gram order of the estimated LM.
- Default: 3
- -v, -vocab <file>
- Fix the vocab to only words from the specified file.
- -u, -unk <boolean>
- Replace all out of vocab words with <unk>.
- Default: false
- -t, -text <files>
- Add counts from text files.
- -c, -counts <files>
- Add counts from counts files.
- -s, -smoothing <ML, FixKN, FixModKN, FixKN#, KN, ModKN, KN#>
- Specify smoothing algorithms.
- Default: ModKN
- -wf, -weight-features <features-template>
- Specify n-gram weighting features.
- -p, -params <file>
- Set initial model params.
- -oa, -opt-alg <Powell, LBFGS, LBFGSB>
- Specify optimization algorithm.
- Default: Powell
- -op, -opt-perp <file>
- Tune params to minimize dev set perplexity.
- -ow, -opt-wer <file>
- Tune params to minimize lattice word error rate.
- -om, -opt-margin <file>
- Tune params to minimize lattice margin.
- -wb, -write-binary <boolean>
- Write LM/counts files in binary format.
- Default: false
- -wp, -write-params <file>
- Write tuned model params to file.
- -wv, -write-vocab <file>
- Write LM vocab to file.
- -wc, -write-counts <file>
- Write n-gram counts to file.
- -wec, -write-eff-counts <file>
- Write effective n-gram counts to file.
- -wlc, -write-left-counts <file>
- Write left-branching n-gram counts to file.
- -wrc, -write-right-counts <file>
- Write right-branching n-gram counts to file.
- -wl, -write-lm <file>
- Write ARPA backoff LM to file.
- -ep, -eval-perp <files>
- Compute test set perplexity.
- -ew, -eval-wer <files>
- Compute test set lattice word error rate.
- -em, -eval-margin <files>
- Compute test set lattice margin.
SEE ALSO¶
January 2013 | MITLM |