NAME¶

sequitur-g2p - grapheme-to-phoneme conversion tool

SYNOPSIS¶

sequitur-g2p [OPTION]... FILE...

DESCRIPTION¶

Grapheme-to-Phoneme Conversion

Samples can be either in plain format (one word per line followed by phonetic transcription) or Bliss XML Lexicon format.

OPTIONS¶

--version: show program's version number and exit
-h, --help: show this help message and exit
-p FILE, --profile=FILE: Profile execution time and store result in FILE
-R, --resource-usage: Report resource usage execution time
-Y, --psyco: Use Psyco to speed up execution
--tempdir=PATH: store temporary files in PATH
-t FILE, --train=FILE: read training sample from FILE
-d FILE / N%, --devel=FILE / N%: read held-out training sample from FILE or use N% of the training data
-x FILE, --test=FILE: read test sample from FILE
--checkpoint: save state of training in regular time intervals. The name of the checkpoint file is derived from --writemodel.
--resume-from-checkpoint=FILE: load checkpoint FILE and continue training
-T, --transpose: Transpose model, i.e. do phoneme-to-grapheme conversion
-m FILE, --model=FILE: read model from FILE
-n FILE, --write-model=FILE: write model to FILE
--continuous-test: report error rates on development and test set in each iteration
-S, --self-test: apply model to development set and report error rates
-s l1,l2,r1,r2, --size-constraints=l1,l2,r1,r2: multigrams must have l1 ... l2 left-symbols and r1 ... r2 right-symbols
-E, --no-emergence: do not allow new joint-multigrams to be added to the model
--viterbi: estimate model using maximum approximation rather than true EM
-r, --ramp-up: ramp up the model
-W, --wipe-out: wipe out probabilities, retain only model structure
-C, --initialize-with-counts: initialize probabilities estimation by counting how many times every graphone occurs in the training set, disregarding possible overlaps
-i MINITERATIONS, --min-iterations=MINITERATIONS: minimum number of EM iterations during training
-I MAXITERATIONS, --max-iterations=MAXITERATIONS: maximum number of EM iterations during training
--eager-discount-adjustment: re-adjust discounts in each iteration
--fixed-discount=D: set discount to D and keep it fixed
-e ENC, --encoding=ENC: use character set encoding ENC
-P, --phoneme-to-phoneme: train/apply a phoneme-to-phoneme converter
--test-segmental: evaluate only at segmental level, i.e. do not count syllable boundaries and stress marks
-B FILE, --result=FILE: store test result in table FILE (for use with bootlog or R)
-a FILE, --apply=FILE: apply grapheme-to-phoneme conversion to words read from FILE
-V Q, --variants-mass=Q: generate pronunciation variants until \sum_i p(var_i) >= Q (only effective with --apply)
--variants-number=N: generate up to N pronunciation variants (only effective with --apply)
-f FILE, --fake=FILE: use a translation memory (read from sample FILE) instead of a genuine model (use in combination with -x to evaluate two files against each other)
--stack-limit=N: limit size of search stack to N elements

May 2016

sequitur-g2p 0+r1668

Source file:	sequitur-g2p.1.en.gz (from sequitur-g2p 0+r1668.r3-1)
Source last updated:	2016-05-06T17:47:33Z
Converted to HTML:	2024-10-21T17:57:13Z