UNICODE(1) | General Commands Manual | UNICODE(1) |
NAME¶
unicode - command line unicode database query tool
SYNOPSIS¶
unicode [options] string
DESCRIPTION¶
This manual page documents the unicode command.
unicode is a command line unicode database query tool.
OPTIONS¶
- -h
- --help
Show help and exit.
- -x
- --hexadecimal
Assume string to be a hexadecimal number
- -d
- --decimal
Assume string to be a decimal number
- -o
- --octal
Assume string to be an octal number
- -b
- --binary
Assume string to be a binary number
- -r
- --regexp
Assume string to be a regular expression
- -s
- --string
Assume string to be a sequence of characters
- -a
- --auto
Try to guess type of string from one of the above (default)
- -mMAXCOUNT
- --max=MAXCOUNT
Maximal number of codepoints to display, default: 20; use 0 for unlimited
- -iCHARSET
- --io=IOCHARSET
I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to guess this value from your locale, so with properly set up locale, you should not need to specify it.
- --fcp=CHARSET
- --fromcp=CHARSET
Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments.
- -cADDCHARSET
- --charset-add=ADDCHARSET
Show hexadecimal reprezentation of displayed characters in this additional charset.
- -CUSE_COLOUR
- --colour=USE_COLOUR
USE_COLOUR is one of on off auto
--colour=on will use ANSI colour codes to colourise the output
--colour=off won't use colours.
--colour=auto will test if standard output is a tty, and use colours only when it is.
--color is a synonym of --colour
- -v
- --verbose
Be more verbose about displayed characters, e.g. display Unihan information, if available.
- -w
- --wikipedia
Spawn browser pointing to English Wikipedia entry about the character.
- --wt
- --wiktionary
Spawn browser pointing to English Wiktionary entry about the character.
- --brief
-
Display character information in brief format
- --format=fmt
-
Use your own format for character information display. See the README for details.
- --list
-
List (approximately) all known encodings.
- --download
-
Try to download UnicodeData.txt into ~/.unicode/
- --ascii
-
Display ASCII table
- --brexit-ascii
- --brexit
Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)
USAGE¶
unicode tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using
unicode face
will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions - for the latter, use:
unicode -r face
For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (á):
unicode 00E1
unicode U+00E1
unicode á
unicode 'latin small letter a with acute'
You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g.
unicode 0450..0520
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
unicode 0400..
will display just characters from U+0400 up to U+04FF
Use --fromcp to query codepoints from other encodings:
unicode --fromcp cp1250 -d 200
Multibyte encodings are supported: unicode --fromcp big5 -x aff3
and multi-char strings are supported, too:
unicode --fromcp utf-8 -x c599c3adc5a5
BUGS¶
Tabular format does not deal well with full-width, combining, control and RTL characters.
SEE ALSO¶
AUTHOR¶
Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>
2003-01-31 |