NAME¶
apertium-tagger —
    part-of-speech tagger and trainer for Apertium
SYNOPSIS¶
  
    | apertium-tagger | [options] -gserialized_tagger [input
      [output]] | 
  
    | apertium-tagger | [options] -riterations corpus serialized_tagger | 
  
    | apertium-tagger | [options] -siterations dictionary corpus tagger_spec
      serialized_tagger tagged_corpus
      untagged_corpus | 
  
    | apertium-tagger | [options] -s0dictionary tagger_spec
      serialized_tagger tagged_corpus untagged_corpus | 
  
    | apertium-tagger | [options] -s0-umodel
      serialized_tagger tagged_corpus | 
  
    | apertium-tagger | [options] -titerations dictionary corpus tagger_spec
      serialized_tagger | 
DESCRIPTION¶
apertium-tagger is the application
    responsible for the apertium part-of-speech tagger training or tagging,
    depending on the calling options. This command only reads from the standard
    input if the option --tagger or
    -g is used.
MODES¶
  - -g,- --tagger
- Tags input text by means of Viterbi algorithm.
- -rn,- --retrainn
- Retrains the model with n additional Baum-Welch
      iterations (unsupervised). This option is incompatible with
      -u(--unigram)
- -sn,- --supervisedn
- Initializes parameters against a hand-tagged text (supervised) through the
      maximum likelihood estimate method, then performs n
      iterations of the Baum-Welch training algorithm (unsupervised). The CRP
      argument can be omitted only when n = 0.
- -tn,- --trainn
- Initializes parameters through Kupiec's method (unsupervised), then
      performs n iterations of the Baum-Welch training
      algorithm (unsupervised).
    
  
OPTIONS¶
  - -d,- --debug
- Print error (if any) or debug messages while operating.
- -e,- --skip-on-error
- Used with -xsto ignore certain types of errors
      with the training corpus
- -f,- --first
- Used in conjunction with -g(--tagger) makes the tagger give all lexical forms
      of each word, with the chosen one in the first place (after the
    lemma)
- -m,- --mark
- Mark disambiguated words.
- -p,- --show-superficial
- Prints the superficial form of the word along side the lexical form in the
      output stream.
- -z,- --null-flush
- Used in conjunction with -g(--tagger) to flush the output after getting each
      null character.
- --help
- Display a help message.
    
  
FILES¶
These are the kinds of files used with each option:
  - dictionary
- Full expanded dictionary file
- corpus
- Training text corpus file
- tagger_spec
- Tagger specification file, in XML format
- serialized_tagger
- Tagger data file, built in the training and used while tagging
- tagged_corpus
- Hand-tagged text corpus
- untagged_corpus
- Untagged text corpus, morphological analysis of hand-tagged corpus to use
      both jointly with -soption
- input
- Input file, stdin by default
- output
- Output file, stdout by default
COPYRIGHT¶
Copyright © 2005, 2006 Universitat d'Alacant / Universidad
    de Alicante. This is free software. You may redistribute copies of it under
    the terms of the
    GNU General Public License.
BUGS¶
Many... lurking in the dark and waiting for you!