NAME¶
detoxrc —
configuration file for
detox(1)
OVERVIEW¶
detox allows for configuration of its sequences through config files. This
document describes how these files work.
IMPORTANT¶
When setting up a new set of rules, the
safe and
wipeup filters must always be run after a translating filter
(or series thereof), such as the
utf_8 or the
uncgi filters. Otherwise, the risk of introducing illegal
characters into the filename is introduced.
SYNTAX¶
The format of this configuration file is C-like. It is based loosely off named's
configuration files. Each statement is semicolon terminated, and modifiers on
a particular statement are generally contained within braces.
- sequence
“name” {...};
- Defines a sequence of filters to run a filename through.
"name" specifies how the user will refer to the particular
sequence during runtime. Quotes around the sequence name are generally
optional, but should be used if the sequence name does not start with a
letter.
There is a special sequence, named "default", which is the default
sequence used by detox. This can be overridden through the command line
option -s or the environmental variable
DETOX_SEQUENCE.
Sequence names are case sensitive and unique throughout all sequences; that
is, if a system wide file defines normal_seq and a
user has a sequence with the same name in their
.detoxrc, the users' normal_seq
will take precedence.
- iso8859_1
{filename
“/path/to/filename”;};
- This translates ISO 8859-1 (aka Latin-1) characters into
lower ASCII equivalents. The output is not necessarily safe, and should
also be run through the safe filter.
Under normal circumstances, the filename syntax is not needed. Detox looks
in several locations for a file called iso8859_1.tbl,
which is a set of rules defining how an ISO 8859-1 character should be
translated.
In the event this table doesn't exist, you have two options. You can
download or create your own, and tell detox the location of it using the
filename syntax shown above, or you can let detox fall back on its
internal tables. The internal tables translate the same as the stock
translation tables.
You can chain together multiple iso8859_1 translations, as long as the
default value of all but the last one is set to nothing. This is explained
in detox.tbl(5).
This filter is mutually exclusive with the utf_8
filter.
- utf_8
{filename
“/path/to/filename”;};
- This translates Unicode characters, encoded by the UTF-8
translation method, into safe equivalents.
This operates in a manner similar to iso8859_1, except it
looks for a translation table called unicode.tbl.
The default internal translation for Unicode characters only contains the
lower 256 characters of Unicode, which is equivalent to the set of Basic
Latin and Latin-1 characters.
- uncgi;
- This translates CGI escaped strings into their ASCII
equivalents. The output of this is not necessarily safe, and could contain
ISO 8859-1 chars or potentially UTF-8 characters.
- safe
{filename
“/path/to/filename”;};
- This could also be called "safe for UNIX-like
operating systems". It translates characters that are difficult to
work with in UNIX environments into characters that are not.
In earlier versions this filter was entirely internal. Starting with 1.2.0,
this filter is controlled by a translation table. In the absense of the
translation table, the previous code will be employed for the translation.
Also, prior to 1.2.0, the safe filter removed leading dashes to prevent
the hassle of dealing with a filename in the format
-filename. This functionality is exclusively handled by
the wipeup filter now.
See the SAFE section for more details on what
this filter translates by default.
- wipeup
{remove_trailing;};
- This wipes up any excessive characters. For instance,
multiple underscores or dashes will be converted into a single underscore
or dash. Any series of dash and underscore (i.e. "_-_") will be
converted into a single dash.
The remove trailing option removes a dash or underscore followed immediately
by a period.
See the WIPEUP section for more details on
what this filter translates.
- max_length
{length value;};
- This trims a file down to the length specified (or less).
It is conscious of extensions and attempts to preserve anything following
the last period in a filename.
For instance, given a max length of 12, and a filename of
"this_is_my_file.txt", the filter would output
"this_is_.txt".
- lower;
- This translates uppercase characters into lowercase
characters.
- #
Comments
- Any thing after a # on any line is ignored.
EXAMPLE¶
sequence default {
uncgi;
iso8859_1 {
filename "iso8859_1.tbl";
};
# utf_8 {
# filename "unicode.tbl";
# };
safe {
filename "safe.tbl";
};
wipeup {
remove_trailing;
};
# max_length {
# length 128;
# };
};
SAFE¶
The following characters are translated by the stock
safe
filter. They can be tuned by updating safe.tbl or creating a copy of safe.tbl
and updating your rc file.
Rules that apply
anywhere in the filename:¶
| Safe |
Original |
| _and_ |
& |
| _ |
space ` ! @ $ * \ | : ; " ' < > ?
/ |
| - |
( ) [ ] { } |
WIPEUP¶
The following characters are translated by the
wipeup filter.
Rules that apply
anywhere in the filename:¶
| Wipeup |
Original |
| - |
-_ |
| - |
_- |
| - |
-- |
| _ |
__ |
Rules
that apply only at the beginning of a filename:¶
Any leading dashes are stripped to prevent programs from interpreting these
files as command line options.
| Wipeup |
Original |
| removed |
- _ # |
Rules that
apply when remove trailing is enabled:¶
| Wipeup |
Original |
| . |
.- |
| . |
-. |
| . |
._ |
| . |
_. |
SEE ALSO¶
detox(1),
detox.tbl(5).
AUTHORS¶
detox was written by
Doug Harple.