Scroll to navigation

FIX_LATIN(1p) User Contributed Perl Documentation FIX_LATIN(1p)

NAME

fix_latin - filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non-ASCII 8 bit) characters

SYNOPSIS

  fix_latin options <input_file >output_file
  Options:
   --use-xs <value> 'auto' | 'always' | 'never'
   --version        list version number
   --help           detailed help message

DESCRIPTION

The script acts as a filter, taking source data which may contain a mix of ASCII, UTF8, ISO8859-1 and CP1252 characters, and producing output will be all ASCII/UTF8.

Multi-byte UTF8 characters will be passed through unchanged (although over-long UTF8 byte sequences will be converted to the shortest normal form). Single byte characters will be converted as follows:

  0x00 - 0x7F   ASCII - passed through unchanged
  0x80 - 0x9F   Converted to UTF8 using CP1252 mappings
  0xA0 - 0xFF   Converted to UTF8 using Latin-1 mappings

OPTIONS

Override default ('auto') behaviour of trying to use XS module and falling back to pure-Perl version if not available. Set to 'never' to always use the Perl version or 'always' to always use XS and die if not available.
Display version number of underlying Encoding::FixLatin and XS modules.
Display this documentation.

EXAMPLES

This script was originally written to assist in converting a Postgres database from SQL-ASCII encoding to UNICODE UTF8 encoding. The following examples illustrate its use in that context.

If you have a SQL format dump file that you would normally restore by piping into 'psql', you can simply filter the dump file through this script:

  fix_latin < dump_file | psql -d database

If you have a compressed dump file that you would normally restore using 'pg_restore', you can omit the '-d' option on pg_restore and pipe the resulting SQL through this script and into psql:

  pg_restore -O dump_file | fix_latin | psql -d database

To take a look at non-ASCII lines in the dump file:

  perl -ne '/^COPY (\S+)/ and $t = $1; print "$t:$_" if /[^\x00-\x7F]/' dump_file

SEE ALSO

This script is implemented using the Encoding::FixLatin Perl module. For more details see the module documentation with the command:

  perldoc Encoding::FixLatin

In particular you should read the 'LIMITATIONS' section to understand the circumstances under which data corruption might occur.

COPYRIGHT & LICENSE

Copyright 2009-2014 Grant McLean "<grantm@cpan.org>"

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

2024-03-05 perl v5.38.2