NAME¶
sylseg-sk - segments a Slovak words in to the sylables
SYNOPSIS¶
sylseg-sk [--best] [--color] [--dl debug level] [--help] [--ofile
<file_name>] [<input_file>]
DESCRIPTION¶
The sylabic segmentation is esential for some linguistic or speech recognition
applications. Depending on the language either rule based or statistical
approach is beying used. For Slovak the statistical approach seems to be more
suitable.
sylseg-sk implements one of the statistical approaches for the syllabic
segmentaion. Each input word is segmented into the syllables. The several
possible segmentations are generated and sorted by the likelihood. If no input
file is specified, the standard input is expected. If input file is used then
the output is written in to the file as well. The filename is input filename
with the extension ".syllables".
The input output code page is ISO 8859-2. To use it with different CP use some
CP convertor and pipes. For example to have input and output in UTF-8 use (for
interactive use):
filterm UTF8-iso2 iso2-UTF8 sylseg-sk or (for batch
processing)
iconv -f UTF-8 -t ISO_8859-2 | sylseg-sk | iconv -f ISO_8859-2
-t UTF-8
Performance of the syllabic segmentation depend on the used statistics. To
improve the quality of the segmentaion is possible to train the better system
with the sylseg-sk-training tool and replace the original file located in
/usr/share/sylseg_sk/sylseg-sk.stats
The design of the sylseg-sk is language independent. With retrained statistics
it theoreticaly should work for any language.
OPTIONS¶
- --best
- Print the best result only.
- --color
- Enable color output.
- --dl 1..5
- Set the debug level. Control the amount of displayed
information The debug level 0 displays nothing. The maximum level 5
displays full debugging report. The default debug level is 1.
- --help
- display a short help text
- --ofile <file_name>
- Write output also in to given file.
EXAMPLES¶
- Use standard input and debug level 3:
- sylseg-sk --dl 3
- Process all the from file aaa.txt and print just the best
segmentation:
- sylseg-sk --best aaa.txt
EXIT STATUS¶
sylseg-sk returns a zero if it succeeds to process all the input words
AUTHOR¶
Jozef Ivanecky (dodo (at) kanoistika.sk)
SEE ALSO¶
sylseg-sk-training(1),
filterm(1),
iconv(1),
konwert(1)