NAME¶
Bloom::Filter - Sample Perl Bloom filter implementation
DESCRIPTION¶
A Bloom filter is a probabilistic algorithm for doing existence tests in less
memory than a full list of keys would require. The tradeoff to using Bloom
filters is a certain configurable risk of false positives. This module
implements a simple Bloom filter with configurable capacity and false positive
rate. Bloom filters were first described in a 1970 paper by Burton Bloom, see
<
http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal>.
SYNOPSIS¶
use Bloom::Filter
my $bf = Bloom::Filter->new( capacity => 10, error_rate => .001 );
$bf->add( @keys );
while ( <> ) {
chomp;
print "Found $_\n" if $bf->check( $_ );
}
CONSTRUCTORS¶
- new %PARAMS
- Create a brand new instance. Allowable params are
"error_rate", "capacity".
- init
- Calculates the best number of hash functions and optimum
filter length, creates some random salts, and generates a blank bit
vector. Called automatically by constructor.
ACCESSORS¶
- capacity
- Returns the total capacity of the Bloom filter
- error_rate
- Returns the configured maximum error rate
- length
- Returns the length of the Bloom filter in bits
- key_count
- Returns the number of items currently stored in the
filter
- on_bits
- Returns the number of 'on' bits in the filter
- salts
- Returns the list of salts used to create the hash
functions
PUBLIC METHODS¶
- add @KEYS
- Adds the list of keys to the filter. Will fail, return
"undef" and complain if the number of keys in the filter exceeds
the configured capacity.
- check @KEYS
- Checks the provided key list against the Bloom filter, and
returns a list of equivalent length, with true or false values depending
on whether there was a match.
INTERNAL METHODS¶
- _calculate_shortest_filter_length CAPACITY ERR_RATE
- Given a desired error rate and maximum capacity, returns
the optimum combination of vector length (in bits) and number of hash
functions to use in building the filter, where "optimum" means
shortest vector length.
- _get_cells KEY
- Given a key, hashes it using the list of salts and returns
an array of cell indexes corresponding to the key.
AUTHOR¶
Maciej Ceglowski <maciej@ceglowski.com>
CHANGELOG¶
Feb 2007 big speedup by Dmitriy Ryaboy <dmitriy.ryaboy@ask.com> (thanks!)
COPYRIGHT AND LICENSE¶
(c) 2004 Maciej Ceglowski
This is free software, distributed under version 2 of the GNU Public License
(GPL).