table of contents
YAZ-ICU(1) | Commands | YAZ-ICU(1) |
NAME¶
yaz-icu - YAZ ICU utilitySYNOPSIS¶
yaz-icu [-c config] [-p opt] [-s] [-x] [infile]
DESCRIPTION¶
yaz-icu is a utility which demonstrates the ICU chain module of yaz. (yaz/icu.h).The utility can be used in two ways. It may read some text using an XML configuration for configuring ICU and show text analysis. This mode is triggered by option -c which specifies the configuration to be used. The input file is read from standard input or from a file if infile is specified.
The utility may also show ICU information. This is triggered by option -p.
OPTIONS¶
-c config-p type
-s
-x
ICU CHAIN CONFIGURATION¶
The ICU chain configuration specifies one or more rules to convert text data into tokens. The configuration format is XML based.The toplevel element must be named icu_chain. The icu_chain element has one required attribute locale which specifies the ICU locale to be used in the conversion steps.
The icu_chain element must include elements where each element specifies a conversion step. The conversion is performed in the order in which the conversion steps are specified. Each conversion element takes one attribute: rule which serves as argument to the conversion step.
The following conversion elements are available:
casemap
l
u
t
f
display
transform
transliterate
tokenize
l
s
w
c
t
join
EXAMPLES¶
The following command analyzes text in file text using ICU chain configuration chain.xml:cat text | yaz-icu -c chain.xml
The chain.xml might look as follows:
<icu_chain locale="en"> <transform rule="[:Control:] Any-Remove"/> <tokenize rule="w"/> <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/> <transliterate rule="xy > z;"/> <display/> <casemap rule="l"/> </icu_chain>
SEE ALSO¶
yaz(7)ICU Home[2]
ICU Transforms[1]
AUTHORS¶
Index DataNOTES¶
- 1.
- ICU Transforms
- 2.
- ICU Home
01/14/2019 | YAZ 5.27.1 |