NAME¶
apertium-tagger
—
part-of-speech tagger and trainer for Apertium
SYNOPSIS¶
apertium-tagger |
[options] -g
serialized_tagger [input
[output]] |
apertium-tagger |
[options] -r
iterations corpus serialized_tagger |
apertium-tagger |
[options] -s
iterations dictionary corpus tagger_spec
serialized_tagger tagged_corpus
untagged_corpus |
apertium-tagger |
[options] -s
0 dictionary tagger_spec
serialized_tagger tagged_corpus untagged_corpus |
apertium-tagger |
[options] -s
0 -u model
serialized_tagger tagged_corpus |
apertium-tagger |
[options] -t
iterations dictionary corpus tagger_spec
serialized_tagger |
DESCRIPTION¶
apertium-tagger
is the application
responsible for the apertium part-of-speech tagger training or tagging,
depending on the calling options. This command only reads from the standard
input if the option --tagger
or
-g
is used.
MODES¶
-g
,
--tagger
- Tags input text by means of Viterbi algorithm.
-r
n, --retrain
n
- Retrains the model with n additional Baum-Welch
iterations (unsupervised). This option is incompatible with
-u
(--unigram
)
-s
n, --supervised
n
- Initializes parameters against a hand-tagged text (supervised) through the
maximum likelihood estimate method, then performs n
iterations of the Baum-Welch training algorithm (unsupervised). The CRP
argument can be omitted only when n = 0.
-t
n, --train
n
- Initializes parameters through Kupiec's method (unsupervised), then
performs n iterations of the Baum-Welch training
algorithm (unsupervised).
OPTIONS¶
-d
,
--debug
- Print error (if any) or debug messages while operating.
-e,
--skip-on-error
- Used with
-xs
to ignore certain types of errors
with the training corpus
-f
,
--first
- Used in conjunction with
-g
(--tagger
) makes the tagger give all lexical forms
of each word, with the chosen one in the first place (after the
lemma)
-m
,
--mark
- Mark disambiguated words.
-p
,
--show-superficial
- Prints the superficial form of the word along side the lexical form in the
output stream.
-z
,
--null-flush
- Used in conjunction with
-g
(--tagger
) to flush the output after getting each
null character.
--help
- Display a help message.
FILES¶
These are the kinds of files used with each option:
- dictionary
- Full expanded dictionary file
- corpus
- Training text corpus file
- tagger_spec
- Tagger specification file, in XML format
- serialized_tagger
- Tagger data file, built in the training and used while tagging
- tagged_corpus
- Hand-tagged text corpus
- untagged_corpus
- Untagged text corpus, morphological analysis of hand-tagged corpus to use
both jointly with
-s
option
- input
- Input file, stdin by default
- output
- Output file, stdout by default
COPYRIGHT¶
Copyright © 2005, 2006 Universitat d'Alacant / Universidad
de Alicante. This is free software. You may redistribute copies of it under
the terms of the
GNU General Public License.
BUGS¶
Many... lurking in the dark and waiting for you!