| DIRCONV(1) | General Commands Manual | DIRCONV(1) |
NAME¶
dirconv —
locate and transcode mixed-encoding file names
SYNOPSIS¶
dirconv |
[-078dFhnpruvw] [-f
charset] [-x
regex] [path ...] |
DESCRIPTION¶
Thedirconv utility recursively scans the specified
path(s) and classifies files and directories according to whether their names
are pure 7-bit ASCII, non-ASCII but valid UTF-8, double-UTF-8 (WTF-8), or
neither.
Names in the latter category are assumed to be Latin-1, unless a
different encoding is specified with the -f
option.
By default, the dirconv utility then
prints the names that are neither pure 7-bit ASCII nor valid UTF-8.
The following options are available:
-0- Print a NUL character rather than a newline after each path. This option
has no effect if the
-noption was also specified. -7- Select names that are pure 7-bit ASCII.
-8- Select names that contain non-ASCII characters but are not valid UTF-8.
This is the default unless the
-7,-uand / or-woptions are specified. -d- Show debugging information. This option can be specified multiple times to increase the level of detail.
-F- In conjunction with the
-roption, force renaming a file when the target already exists. -fcharset- Specify the assumed character set for non-ASCII, non-UTF-8 names. The default is “iso8859-1”.
-h- Print a usage message and exit.
-n- In conjunction with the
-roption, show what would have happened, but do not actually rename any files. -p- Print the selected names.
-r- Attempt to convert the selected names to UTF-8 and rename the files and directories.
-u- Select names which contain non-ASCII characters and are valid UTF-8 but not WTF-8.
-v- Print the source reversion number and exit.
-w- Select names which seem to be WTF-8-encoded.
-xregex- Do not inspect files and directories whose unconverted names match the specified POSIX extended regular expression.
SEE ALSO¶
iconv(1), regex(3).AUTHORS¶
Thedirconv utility and this manual page were written by
Dag-Erling Smørgrav ⟨des@des.no⟩
for the University of Oslo.
NOTES¶
Thedirconv utility works by attempting to decode each
name as if it were a sequence of UTF-8 characters. It is possible, but highly
unlikely, that a random string of characters in a non-UTF single-byte encoding
would look like a valid UTF-8 sequence.
Reliable detection of WTF-8 is only possible if the original 8-bit encoding is known.
The exclusion filter is applied before name conversion. Character classes are unlikely to work as expected on unconverted names.
| November 18, 2014 |