OPTIONS¶
-c <file> or --config=<file>
set the configuration using 'file'.
you can use -c lang/config-file to select the 'config-file'
for an installed language 'lang'
--debug=<modele><level>,...
set debug level per module, indicated by a single letter:
Tagger (T), Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a), Chunker
(c), Multi‐Word Units (m), Named Entity Recognition (n), or Parser (p).
Different modules must be separated by commas.
(e.g. --debug=l5,n3 sets the level for the Lemmatizer to 5 and for
the NER to 3 )
-d <level>
set global debug level. (for all modules)
--deep‐morph
generate a deep morphological analysis and add it to the
XML. This also includes compound information. The default 'Tabbed' and JSON
output is also more detailed in the Morpheme field.
-e <encoding>
set input encoding. (default UTF8)
-h or --help
give some help
--language=<comma separated list of languages>
Set the languages to work on. This parameter is passed to
the tokenizer. The strings are assumed to be ISO 639-2 codes.
The first language in the list will be the default, unspecified
languages are asumed to be of that default.
e.g. --language=nld,eng,por means: detect Dutch, English and
Portuguese, with Dutch being the default.
IMPORTANT Frog can at the moment handle only one language
at a time, as determined by the config file. So other languages mentioned
here will be tokenized correctly, but further they will be handled as that
language.
-n
assume inputfile to have one sentence per line. (newline
separators)
Very useful when running interactive, otherwise an empty line is
needed to signal end of input.
--nostdout
suppress the 'Tabbed' or JSON output to stdout. (when no
outputfile was specified with -o or --outputdir)
Especially useful when XML output is specified with -X or
--xmldir.
-o <file>
send 'Tabbed' output to 'file' instead of stdout.
Defaults to the name of the inputfile with '.out' appended.
--outputdir <dir>
send all 'Tabbed' or JSON output to 'dir' instead of
stdout. Creates filenames from the inputfilename(s) with '.out'
appended.
--retry
assume a re-run on the same input file(s). Frog wil only
process those files that haven't been processed yet. This is accomplished by
looking at the output file names. (so this has no effect if neither -o,
--outputdir, -X or --xmldir is used)
--skip=[tlacnmp]
skip parts of the process: Tokenizer (t), Lemmatizer (l),
Morphological Analyzer (a), Chunker (c), Named Entity Recognition (n),
Multi-Word Units (m) or Parser (p).
Skipping the Multiword Unit implies disabling the Parser too.
--alpino
Use a locally installed Alpino parser
--alpino=server
use a remote installed Alpino server, as specified in the
frog configuration file.
-S <port>
Run Frog as a server on 'port'
-t <file>
process 'file'.
-t can be omitted. Frog will run on any <file> found on the
command-line. Wildcards are allowed too. When NO files are specied, Frog
will start in interactive mode.
-x <xmlfile>
process 'xmlfile', which is supposed to be in FoLiA
format! If 'xmlfile' is empty, and --testdir=<dir> is provided,
all '.xml' files in 'dir' will be processed as FoLia XML.
-X <xmlfile>
When 'xmlfile' is specified, create a FoLiA XML output
file with that name.
When 'xmlfile' is empty, generate XML output for every
inputfile.
--textclass=<cls>
When -x is given, use 'cls' to find AND store text
in the FoLiA document(s). Using --inputclass and --ptclass is in general a
better choice.
--inputclass=<cls>
use 'cls' to find text in the FoLiA input
document(s).
--outputclass=<cls>
use 'cls' to output text in the FoLiA input document(s).
Preferably this is another class then the inputclass.
--testdir=<dir>
process all files in 'dir'. When the input mode is XML,
only '.xml' files are teken from 'dir'. see also --outputdir
--tmpdir=<dir>
location to store intermediate files. Default /tmp. NOT
USED!
--uttmarker=<mark>
assume all utterances are separated by 'mark'. (the
default is none).
--threads=<n>
use a maximum of 'n' threads. The default is to take
whatever is needed. In servermode we always run on 1 thread per session.
-V or --version
show version info
--xmldir=<dir>
generate FoLiA XML output and send it to 'dir'. Creates
filenames from the inputfilename with '.xml' appended. (Except when it already
ends with '.xml')
-X <file>
generate FoLiA XML output and send it to 'file'. Defaults
to the name of the inputfile(s) with '.xml' appended. (Except when it already
ends with '.xml')
--id=<id>
When -X for FoLia is given, use 'id' to give the
doc an ID. The default is an xml:id based on the filename.
--allow-word-corrections
Allow the ucto tokenizer to apply simple
corrections on words while processing FoLiA output. For instance splitting
punctuation.
--max-parser-tokens=<num>
Limit the size of sentences to be handled by the Parser.
(Default 500 words).
The Parser is very memory consuming. 500 Words will already need
16Gb of RAM.
--JSONin
The input is in JSON format. Mainly for Server mode, but
works on files too.
This implies --JSONout too!
--JSONout
Output will be in JSON instead of 'Tabbed'.
--JSONout=<indent>
Output will be in JSON instead of 'Tabbed'. The JSON will
be idented by value
'indent'. (Default is indent=0. Meaning al the JSON will be on 1 line)
-T or --textredundancy=[full|medium|none]
Set the text redundancy level in the tokenizer for text
nodes in FoLiA output: full add text to all levels: <p> <s>
<w> etc. minimal don't introduce text on higher levels, but
retain what is already there. none only introduce text on <w>,
AND remove all text from higher levels
--override=<section>.<parameter>=<value>
Override a parameter from the configuration file with a
different value.
This option may be repeated several times.