Map(3pm) | User Contributed Perl Documentation | Map(3pm) |
NAME¶
Unicode::Map V0.112 - maps charsets from and to utf16 unicodeSYNOPSIS¶
use Unicode::Map();
$Map = new Unicode::Map("ISO-8859-1");
$utf16 = $Map -> to_unicode
("Hello world!");
=> $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
$locale = $Map -> from_unicode
( $utf16);
=> $locale == "Hello world!"
A more detailed description below.
2do: short note about perl's Unicode perspectives.
=> $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
=> $locale == "Hello world!"
DESCRIPTION¶
This module converts strings from and to 2-byte Unicode UCS2 format. All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8 encoding. To transform these use Unicode::String. For historical reasons this module coexists with Unicode::Map8. Please use Unicode::Map8 unless you need to care for two byte character sets, e.g. chinese GB2312. Anyway, if you stick to the basic functionality (see documentation) you can use both modules equivalently. Practically this module will disappear from earth sooner or later as Unicode mapping support needs somehow to get into perl's core. If you like to work on this field please don't hesitate contacting Gisle Aas! This module can't deal directly with utf8. Use Unicode::String to convert utf8 to utf16 and vice versa. Character mapping is according to the data of binary mapfiles in Unicode::Map hierarchy. Binary mapfiles can also be created with this module, enabling you to install own specific character sets. Refer to mkmapfile or file REGISTRY in the Unicode::Map hierarchy.CONVERSION METHODS¶
Probably these are the only methods you will need from this module. Their usage is compatible with Unicode::Map8.- new
- $Map = new
Unicode::Map("GB2312-80")
- from_unicode
- $dest = $Map
-> from_unicode ( $src)
- to_unicode
- $dest = $Map
-> to_unicode ( $src)
- to8
- Alias for from_unicode. For compatibility with Unicode::Map8
- to16
- Alias for to_unicode. For compatibility with Unicode::Map8
WARNINGS¶
You can demand Unicode::Map to issue warnings
at deprecated or incompatible usage with the constants WARN_DEFAULT,
WARN_DEPRECATION or WARN_COMPATIBILITY. The latter both can be ored together.
No special warnings:
$Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT
Warnings for deprecated usage:
$Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION
Warnings for incompatible usage:
$Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY
MAINTAINANCE METHODS¶
Note: These methods are solely for the maintainance of Unicode::Map. Using any of these methods will lead to programs incompatible with Unicode::Map8.- alias
- @list = $Map
-> alias ( $csid)
- mapping
- $path = $Map
-> mapping ( $csid)
- id
- $real_id||"" =
$Map -> id ($test_id)
- ids
- @ids = $Map ->
ids()
- read_text_mapping
- 1||0 = $Map -> read_text_mapping
($csid, $path,
$style )
style description "unicode" A text mapping as of ftp://ftp.unicode.org/MAPPINGS/ "" Same as "unicode" "reverse" Similar to unicode, but both columns are switched "keld" A text mapping as of ftp://dkuug.dk/i18n/charmaps/
- src
- $path = $Map
-> src ( $csid)
- style
- $path = $Map
-> style ( $csid)
- write_binary_mapping
- 1||0 = $Map -> write_binary_mapping
( $csid, $path)
DEPRECATED METHODS¶
Some functionality is no longer promoted.- noise
- Deprecated! Don't use any longer.
- reverse_unicode
- Deprecated! Use Unicode::String::byteswap instead.
BINARY MAPPINGS¶
Structure of binary Mapfiles Unicode character mapping tables have sequences of sequential key and sequential value codes. This property is used to crunch the maps easily. n (0<n<256) sequential characters are represented as a bytecount n and the first character code key_start. For these subsequences the according value sequences are crunched together, also. The value 0 is used to start an extended information block (that is just partially implemented, though). One could think of two ways to make a binary mapfile. First method would be first to write a list of all key codes, and then to write a list of all value codes. Second method, used here, appends to all partial key code lists the according crunched value code lists. This makes value codes a little bit closer to key codes. Note: the file format is still in a very liquid state. Neither rely on that it will stay as this, nor that the description is bugless, nor that all features are implemented. STRUCTURE:- <main>:
-
offset structure value 0x00 word 0x27b8 (magic) 0x02 @(<extended> || <submapping>)
- <submapping>:
-
0x00 byte != 0 charsize1 (bits) 0x01 byte n1 number of chars for one entry 0x02 byte charsize2 (bits) 0x03 byte n2 number of chars for one entry 0x04 @(<extended> || <key_seq> || <key_val_seq) bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)
- <key_val_seq>:
-
0x00 size=0|1|2|4 n, number of sequential characters size bs1 key1 +bs1 bs2 value1 +bs2 bs1 key2 +bs1 bs2 value2 ...
- <key_seq>:
-
0x00 byte n, number of sequential characters 0x01 bs1 key_start, first character of sequence 1+bs1 @(<extended> || <val_seq>)
- <val_seq>:
-
0x00 byte m, number of sequential characters 0x01 bs2 val_start, first character of sequence
- <extended>:
-
0x00 byte 0 0x01 byte ftype 0x02 byte fsize, size of following structure 0x03 fsize bytes something
TO BE DONE¶
- -
- Something clever, when a character has no translation.
- -
- Direct charset -> charset mapping.
- -
- Better performance.
- -
- Support for mappings according to RFC 1345.
SEE ALSO¶
- -
- File "REGISTRY" and binary mappings in directory "Unicode/Map" of your perl library path
- -
- recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3), Unicode::String(3), Unicode::CharName(3), mirrorMappings(1)
- -
- RFC 1345
- -
- Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/
- -
- Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/
- -
- 2do: more references
AUTHOR¶
Martin Schwartz < martin@nacho.de>POD ERRORS¶
Hey! The above document had some coding errors, which are explained below:- Around line 1112:
- You can't have =items (as at line 1118) unless the first thing after the =over is an =item
2002-03-20 | perl v5.14.2 |