Amber Doctor
March 22, 2022
Code Available at:
https://github.com/amberdoctor/redis_bloom_filter
Slides available at:
A bit array and multiple hashing functions!
* typically the same function with different seeds.
Bit | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Item1
Item2
Fewer than 10 bits per item are needed for
1% false positive probability
Bit | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Grocery
Gas
Restaurant
Bit | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Grocery
Gas
Restaurant
????
Probably Exists
False Positive
Restaurant
Probably Exists
Correctly Found
Scammer
Does Not Exist
Take an array of m bits, and for up to n different elements, either test or set k bits of positions chosen using hash functions.
If all bits are set, the element probably already exists, with a false positive rate of p;
If any of the bits are not set, the element certainly does not exist.
p=(1 – e-kn/m)k
Chances of false positives are low, but non-zero.
As the Bloom filter “fills up” the chances increase.
Redis makes it easy to set the rate with the command
BF.CREATE
import redis
# Redis Client
r_client = redis.Redis(host=host, port=port, db=0)
# Create filter with specified error rate and capacity
# where 0 < error_rate < 1
r_client.bf().create(error_rate, capacity)
import redis
# Redis Client
r_client = redis.Redis(host=host, port=port, db=0)
# Add to bloom filter
# if the bf doesn't exist yet, it will be created
r_client.bf().add(bloom_filter_name, item)
# Check if item exists in bloom filter
r_client.bf().exists(bloom_filter_name, item)
# Add to set - if set doesn't exist, it will be created
r_client.sadd(set_name, item)
# Check if item exists in set
r_client.sismember(set_name, item)
side by side examples of
Redis Bloom Filter
and
Redis Set
Reminder: passwords should always be stored hashed.
As part of their fraud detection, Credit card companies can store information about previous purchase locations to answer if you have shopped somewhere before.
Chrome checks your URL against a bloom filter with known threats and warns you if a site might be a threat.
Media providers can check a bloom filter to avoid showing you content that you may have already seen.
A GET request from ComputerA goes to
its nearest proxy, Proxy1.
Proxy1 searches its cache (a bloom filter) and finds out that the page may have been cached by proxy3.
Proxy1 sends a request to proxy3 to get the page and will return the page to ComputerA.
If proxy3 didn't have the page, proxy1 will just ask the provider.
Before caching an item, check that the item exists in a bloom filter. If it didn't exist, don't cache the item on the first look up. Add items after the check so they are there for future checks.
Saves caching items that are only accessed once.
Before performing an expensive hard disk look up,
check if the rows / columns exist in a bloom filter.
Increases query performance.
Especially good in situations where a high number of queries would result in no data.
Contact Amber Doctor at:
amberdoctor+RedisBloomFilter@gmail.com
Code Available at:
https://github.com/amberdoctor/redis_bloom_filter
Slides available at: