Scroll to navigation

PHONETISAURUS(1) User Commands PHONETISAURUS(1)

NAME

phonetisaurus-arpa2fst - ARPA LM to FST conversion tool

SYNOPSIS

phonetisaurus-arpa2fst --input=arpa.lm --prefix=output_prefix [OPTIONS]

DESCRIPTION

phonetisaurus-arpa2fst

This tool converts an ARPA language model into a weighted finite transducer that can be used with phonetisaurus-g2p.

OPTIONS

--help=<bool> (default: false)

show usage information

--helpshort=<bool> (default: false)

show brief usage information

--tmpdir=<string> (default: "/tmp/")

temporary directory

--v=<int32> (default: 0)

verbose level

--fst_align=<bool> (default: false)

Write FST data aligned where appropriate

--fst_default_cache_gc=<bool> (default: true)

Enable garbage collection of cache

--fst_default_cache_gc_limit=<int64> (default: 1048576)

Cache byte size that triggers garbage collection

--fst_verify_properties=<bool> (default: false)

Verify fst properties queried by TestProperties

--fst_weight_parentheses=<string> (default: "")

Characters enclosing the first weight of a printed composite weight (e.g. pair weight, tuple weight and derived classes) to ensure proper I/O of nested composite weights; must have size 0 (none) or 2 (open and close parenthesis)

--fst_weight_separator=<string> (default: ",")

Character separator between printed composite weights; must be a single character

--save_relabel_ipairs=<string> (default: "")

Save input relabel pairs to file

--save_relabel_opairs=<string> (default: "")

Save output relabel pairs to file

--delim=<string> (default: "}")

Delimiter used to separate input and output tokens.

--eps=<string> (default: "<eps>")

Epsilon symbol.

--input=<string> (default: "")

Input ARPA-format LM.

--null_sep=<string> (default: "_")

Graphemic null symbol.

--phi=<string> (default: "<phi>")

Optional Phi (failure) symbol (not currently in use).

--prefix=<string> (default: "test")

Output filename prefix.

--sb=<string> (default: "<s>")

Sentence begin symbol.

--se=<string> (default: "</s>")

Sentence end symbol.

--split=<string> (default: "|")

Delimiter used to split mult-token symbols.

--start=<string> (default: "<start>")

Start symbol.

--write_syms=<bool> (default: false)

Write the symbol tables to disk.

--fst_compat_symbols=<bool> (default: true)

Require symbol tables to match when appropriate

--fst_field_separator=<string> (default: " ")

Set of characters used as a separator between printed fields

--fst_error_fatal=<bool> (default: true)

FST errors are fatal; o.w. return objects flagged as bad: e.g., FSTs - kError prop. true, FST weights - not a Member()
February 2013 phonetisaurus 0.7.8