IFILE(1) | User Commands | IFILE(1) |
NAME¶
ifile - core executable for the ifile mail filtering systemSYNOPSIS¶
ifile [ -b file] [-q|-Q] [-g] [-k] [ -o] [-v num] [lexing options] file ...DESCRIPTION¶
ifile is a mail filter client that uses machine learning to classify e-mail into folders/mail boxes. The algorithm that it uses is called Naive Bayes. Basically, naive bayes considers each document an unordered collection of words and classifies by matching the document distribution with the most closely matching folder/mailbox distribution.OPTIONS¶
- -b, --db-file=file
- Location to read/store ifile database. Default is ~/.idata
- -c, --concise
- equivalent of "ifile -v 0 | head -1 | cut -f1 -d". Must be used with -q or -Q.
- -d, --delete=folder
- Delete the statistics for each of files from the category folder
- -f, --folder-calcs=folder
- Show the word-probability calculations for folder
- -g, --log-file
- Create and store debugging information in ~/.ifile.log
- -i, --insert=folder
- Add the statistics for each of the files to the category folder
- -k, --keep-infrequent
- Leave in the database words that occur infrequently (normally they are tossed)
- -l, --query-loocv=folder
- For each of the files, temporarily removes file from folder, performs query and then reinserts file in folder. Database is not modified.
- -o, --occur
- Uses document bit-vector representation. Count each word once per document.
- -q, --query
- Output rating scores for each of the files
- -Q, --query-insert
- For each of the files, output rating scores and add statistics for the folder with the highest score
- -T, --threshold=threshold
- When used with both -c and -q, output the two highest ranking categories if their score differs by at most threshold / 1000, which can be used to detect border cases. When used with -q only and any threshold > 0, output the score difference percentage. For example,
ifile -T1 -q foo.txt
might result in
ifile -T93 -q -c foo.txt
will result in
foo.txt spam,non-spam
whereas
ifile -T92 -q -c foo.txt
will result in
foo.txt spam
- -r, --reset-data
- Erases all currently stored information
- -u, --update=folder
- Same as 'insert' except only adds stats if folder already exists
- -v, --verbosity=num
- Amount of output while running: 0=silent, 1=quiet, 2=progress, 3=verbose, 4=debug
- -a, --alpha-lexer
- Lex words as sequences of alphabetic characters (default)
- -A, --alpha-only-lexer
- Only lex space-separated character sequences which are composed entirely of alphabetic characters
- -h, --strip-header
- Skip all of the header lines except Subject:, From: and To:
- -m, --max-length=char
- Ignore portion of message after first char characters. Use entire message if char set to 0. Default is 50,000.
- -p, --print-tokens
- Just tokenize and print, don't do any other processing. Documents are returned as a list of word, frequency pairs.
- -s, --no-stoplist
- Do not throw out overly frequent (stoplist) words when lexing
- -S, --stemming
- Use 'Porter' stemming algorithm when lexing documents
- -w, --white-lexer
- Lex words as sequences of space separated characters
- -?, --help
- Give this help list
- --usage
- Give a short usage message
- -V, --version
- Print program version
FILES¶
- ~/.idata
- ifile database (default location). See FAQ included in ifile package for description of database format.
AUTHOR¶
Jason Rennie <jrennie@csail.mit.edu> and many others. See the ChangeLog for the full list.EXAMPLES¶
Before using ifile, you need to train it. Let's say that you have three folders, "spam", "ifile" and "friends", and the following directory structure:/--+--spam----+--1
| +--2
| +--3
|
+--ifile---+--1
| +--2
| +--3
|
+--friends-+--1
+--2
+--3
| +--2
| +--3
|
+--ifile---+--1
| +--2
| +--3
|
+--friends-+--1
+--2
+--3
word age folder:count
[folder: count ...]
/--inbox--+--1
+--2
+--3
+--2
+--3
ifile -d ifile -i spam
/inbox/1
SEE ALSO¶
Examples of how to use ifile together with procmail(1) and metamail(1) can be found in the directory /usr/share/doc/ifile/examples.November 2004 | ifile 1.3.4 |