frog(1) | General Commands Manual | frog(1) |
NAME¶
frog - Dutch Natural Language Toolkit
SYNOPSIS¶
frog [-t] test-file
frog [options]
DESCRIPTION¶
Frog is an integration of memory‐-based natural language processing (NLP) modules developed for Dutch. Frog's current version will (optionally) tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, add IOB chunks, add Named Entities and will assign a dependency graph to each sentence.
OPTIONS¶
-c <file> or --config=<file>
you can use -c lang/config-file to select the 'config-file' for an installed language 'lang'
--debug=<module><level>,...
(e.g. --debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER to 3 )
Debugging lines are written to a file frog.<number>.debug
-d <level>
--deep‐morph
--compounds
-e <encoding>
-h or --help
--language=<comma separated list of languages>
The first language in the list will be the default, unspecified languages are asumed to be of that default.
e.g. --language=nld,eng,por means: detect Dutch, English and Portuguese, with Dutch being the default, using TextCat. Mainly useful for XML processing.
Specifying a unsupported language is a fatal error. However, you can add the special language 'und' which assures that sentences in an unknown languages will be labeled as such, and processed no further.
IMPORTANT Frog can at the moment handle only one language at a time, as determined by the config file. So other languages mentioned here will be tokenized correctly, but further they will be handled as that language.
-n
Very useful when running interactive, otherwise an empty line is needed to signal end of input.
--nostdout
Especially useful when XML output is specified with -X or --xmldir.
-o <file>
--outputdir <dir>
--retry
--skip=[tlacnmp]
The Tagger cannot be skipped.
Skipping the Multiword Unit implies disabling the Parser too.
--alpino
--alpino=server
-S <port>
-t <file>
This option can be omitted. Frog will run on any <file> found on the qcommand-line. Wildcards are allowed too. When NO files are specified, Frog will start in interactive mode.
Files with the extension '.gz' or '.bz2' are handled too. The corresponding output-files will be compressed using the same compression again. Except when an explicit output filename is specified.
-x <xmlfile>
This option can be omitted. Frog will process files with the 'xml' extension as FoLiA files.
Files with the extension '.xml.gz' or '.xml.bz2' are handled too. The corresponding output-files will be compressed using the same compression again. Except when an explicit output filename is specified.
-X <xmlfile>
When 'xmlfile' is empty, generate FoLiA XML output for every inputfile.
--textclass=<cls>
--inputclass=<cls>
--outputclass=<cls>
--testdir=<dir>
--uttmarker=<mark>
--threads=<n>
-V or --version
--xmldir=<dir>
-X <file>
--id=<id>
--allow-word-corrections
--max-parser-tokens=<num>
The Parser is very memory consuming. 500 Words will already need 16Gb of RAM.
--JSONin
This implies --JSONout too!
--JSONout
--JSONout=<indent>
'indent'. (Default is indent=0. Meaning al the JSON will be on 1 line)
-T or --textredundancy=[full|medium|none]
--override=<section>.<parameter>=<value>
This option may be repeated several times.
BUGS¶
likely
AUTHORS¶
Maarten van Gompel
Ko van der Sloot
Antal van den Bosch
e-mail: lamasoftware@science.ru.nl
SEE ALSO¶
2023 feb 22 |