NAME¶
dspamc - DSPAM Anti-Spam Agent (client)
SYNOPSIS¶
dspamc [
--mode=[teft|toe|tum|notrain|unlearn]]
[
--user user1 user2 ... userN]
[
--feature=[ch,no,wh,tb=N,sb]] [
--class=[spam|innocent]]
[
--source=[error|corpus|inoculation] ] [
--profile=[PROFILE] ]
--deliver=[spam,innocent] ] [
--help ] [
--process ]
[
--classify ] [
--signature=[signature] ]
[
--stdout] [
--debug] [
--daemon]
[
--client] [
--rcpt-to] [
--mail-from]
[
delivery_arguments ]
DESCRIPTION ¶
The DSPAM agent provides a direct interface to mail servers for
command-line spam filtering. The agent can masquerade as the mail server's
local delivery agent and will process any email passed to it. The agent will
then call whatever delivery agent was specified at compile time or
quarantine/tag/drop messages identified as spam. The DSPAM agent can function
locally or as a proxy. It is also responsible for processing classification
errors so that DSPAM can learn from its mistakes. This version (dspamc) uses a
connection to a dspam server rather than re-create contexts on each execution.
OPTIONS¶
- --user user1 user2 ... userN
- Specifies the destination users of the incoming message. In
most cases this is the local user on the system, however some
implementations may call for virtual usernames, specific to DSPAM, to be
assigned. The agent processes an incoming message once for each user
specified. If the message is to be delivered, the $u (or %u) parameters of
the argument string will be interpolated for the current user being
processed.
- --mode=[toe|tum|teft|notrain]
- Configures the training mode to be used for this process,
overriding any defaults in dspam.conf:
teft : Train-Everything. Trains on all messages processed. This is a
very thorough training approach and should be considered the standard
training approach for most users. TEFT may, however, prove too volatile on
installations with extremely high per-user traffic, or prove not very
scalable on systems with extremely large user-bases. In the event that
TEFT is proving ineffective, one of the other modes is recommended.
toe : Train-on-Error. Trains only on a classification error, once the
user's metadata has matured to 2500 innocent messages. This training mode
is much less resource intensive, as only occasional metadata writes are
necessary. It is also far less volatile than the TEFT mode of training.
One drawback, however, is that TOE only learns when DSPAM has made a
mistake - which means the data is sometimes too static, and unable to
"ease into" a different type of behavior.
tum : Train-until-Mature. This training mode is a hybrid between the
other two training modes and provides a great balance between volatility
and static metadata. TuM will train on a per-token basis only tokens which
have had fewer than 25 "hits" on them, unless an error is being
retrained in which case all tokens are trained. This training mode
provides a solid core of stable tokens to keep accuracy consistent, but
also allows for dynamic adaptation to any new types of email behavior a
user might be experiencing.
notrain : No training. Do not train the user's data, and do not keep
totals. This should only be used in cases where you want to process mail
for a particular user (based on a group, for example), but don't want the
user to accumulate any learning data.
unlearn : Unlearn original training. Use this if you wish to unlearn
a previously learned message. Be sure to specify --source=error and
--class to whatever the original classification the message was learned
under. If not using TrainPristine, this will require the original
signature from training.
- --feature=[chained,noise,tb=N,whitelist]
- Specifies the features that should be activated for this
filter instance. The following features may be used individually or
combined using a comma as a delimiter:
chained : Chained Tokens (also known as biGrams). Chained Tokens
combines adjacent tokens, presently with a window size of 2, to form token
"chains". Chained tokens uses additional storage resources, but
greatly improves accuracy. Recommended as a default feature.
noise : Bayesian Noise Reduction (BNR). Bayesian Noise Reduction
kicks in at 2500 innocent messages and provides an advanced progressive
noise logic to reduce Bayesian Noise (wordlist attacks) in spams. See
http://bnr.nuclearelephant.com for more information.
tb=N : Sets the training loop buffering level. Training loop
buffering is the amount of statistical sedation performed to water down
statistics and avoid false positives during the user's training loop. The
training buffer sets the buffer sensitivity, and should be a number
between 0 (no buffering whatsoever) to 10 (heavy buffering). The default
is 5, half of what previous versions of DSPAM used. To avoid dulling down
statistics at all during the training loop, set this to 0.
whitelist : Automatic whitelisting. DSPAM will keep track of the
entire "From:" line for each message received per user, and
automatically whitelist messages from senders with more than 20 innocent
messages and zero spams. Once the user reports a spam from the sender,
automatic whitelisting will automatically be deactivated for that sender.
Since DSPAM uses the entire "From:" line, and not just the
sender's email address, automatic whitelisting is a very safe approach to
improving accuracy especially during initial training.
sbph : Sparse Binary Polynomial Hashing. Bill Yerazunis' tokenizer
method from CRM114. Tokenizer method only - works with existing
combination algorithms.
- --class=[spam|innocent]
- Identifies the disposition (if any) of the message being
presented. This flag should be used when a misclassification has occured,
when the user is corpus-feeding a message, or when an inoculation is being
presented. This flag should not be used for standard processing. This flag
must be used in conjunction with the --source flag. Omitting this flag
causes DSPAM to determine the disposition of the message on its own (the
standard operating mode).
- --source=[error|corpus|inoculation]
- Where --class is used, the source of the
classification must also be provided. The source tells dspam how to learn
the message being presented:
error : The message being presented was a message previously
misclassified by DSPAM. When 'error' is provided as a source, DSPAM
requires that the DSPAM signature be present in the message, and will use
the signature to recall the original training metadata. If the signature
is not present, the message will be rejected. In this source mode, DSPAM
will also decrement each token's previous classification's count as well
as the user totals.
You should use error only when DSPAM has made an error in classifying the
message, and should present the modified version of the message with the
DSPAM signature when doing so.
corpus : The message being presented is from a mail corpus, and
should be trained as a new message, rather than re-trained based on a
signature. The message's full headers and body will be analyzed and the
correct classification will be incremented, without its opposite being
decremented.
You should use corpus only when feeding messages in from corpus.
inoculation : The message being presented is in pristine form, and
should be trained as an inoculation. Inoculations are a more intense mode
of training designed to cause DSPAM to train the user's metadata
repeatedly on previoulsy unknown tokens, in an attepmt to vaccinate the
user from future messages similar to the one being presented. You should
use inoculation only on honeypots and the like.
- --profile=[PROFILE]
- Specify a storage profile from dspam.conf. The storage
profile selected will be used for all database connectivity. See
dspam.conf for more information.
- --deliver=[innocent,spam]
- Tells DSPAM to deliver the message if its result
falls within the criteria specified. For example, --deliver=innocent will
cause DSPAM to only deliver the message if its classification has been
determined as innocent. Providing --deliver=innocent,spam will cause DSPAM
to deliver the message regardless of its classification. This flag
provides a significant amount of flexibility for nonstandard
implementations.
- --stdout
- If the message is indeed deemed "deliverable" by
the --deliver flag, this flag will cause DSPAM to deliver the
message to stdout, rather than the configured delivery agent.
- --process
- Tells DSPAM to process the message. This is the
default behavior, and the flag is implied unless --classify is
used.
- --classify
- Tells DSPAM to only classify the message, and not
perform any writes to the user's data or attempt to deliver/quarantine the
message. The results of a classification are printed to stdout in the
following format:
X-DSPAM-Result: User; result="Spam"; probability=1.0000;
confidence=0.80
NOTE : The output of the classification is specific to a user's own
data, and does not include the output of any groups they might be
affiliated with, so it is entirely possible that the message would be
caught as spam by a group the user belongs to, and appear as innocent in
the output of a classification. To get the classification for the
group , use the group name as the user instead of an individual.
- --signature=[signature]
- If only the signature is available for training, and not
the entire message, the --signature flag may be used to feed the signature
into DSPAM and forego the reading of stdin. DSPAM will process the
signature with whatever commandline classification was specified. NOTE:
This should only be used with --source=error
- --debug
- If DSPAM was compiled with --enable-debug
then using --debug will turn on debugging messages to /tmp/dspam.debug.
- --daemon
- If DSPAM was compiled with --enable-daemon
then using --daemon will cause DSPAM to enter daemon mode, where it will
listen for DSPAM clients to connect and actively service requests.
- --client
- If DSPAM was compiled with --enable-daemon
then using --client will cause DSPAM to act as a client and attempt to
connect to the DSPAM server specified in the client's configuration within
dspam.conf. If client behavior is desired, this option must be
specified, otherwise the agent simply operate as self-contained and
processes the message on its own, eliminating any benefit of using the
daemon.
- --rcpt-to
- If DSPAM will be configured to deliver via LMTP or
SMTP, this flag may be used to define the RCPT TOs which will be used for
the delivery of each user specified with --user. If no recipients are
provided, the RCPT TOs will match the username. NOTE: The recipient list
should always be balanced with the user list, or empty. Specifying an
unbalanced number of recipients to users will result in undefined
behavior.
- --mail-from
- If DSPAM will be cofigured to deliver via LMTP or
SMTP, this flag will set the MAIL FROM sent on delivery of the message.
The default MAIL FROM depends on how the message was originally relayed to
DSPAM. If it was relayed via the commandline, an empty MAIL FROM will be
used. If it was relayed via LMTP, the original MAIL FROM will be used.
EXIT VALUE¶
- 0
- Operation was successful.
- other
- Operation resulted in an error. If the error involved an
error in calling the delivery agent, the exit value of the delivery agent
will be returned.
AUTHORS¶
Jonathan A. Zdziarski
For more information, see
http://dspam.nuclearelephant.com.
SEE ALSO¶
dspam_stats(1),
dspam_corpus(1),
dspam_clean(1),
dspam_dump(1),
dspam_merge(1)