Redis Bloom Filter

Amber Doctor

March 22, 2022

 

Code Available at:

https://github.com/amberdoctor/redis_bloom_filter

 

Slides available at:

https://slides.com/amberdoctor/redis-bloom-filter

protects grandma

What is a Redis Bloom Filter?

  • Space-efficient
  • Probabilistic data structure
  • Use to test whether something is
    • Definitely NOT in the collection
    • Possibly in the collection
  • Use hashes for faster & smaller data structures
  • Trades efficiency for uncertainty
  • Collection in the RedisBloom module
  • Great way to protect Grandma

How does it work?

A bit array and multiple hashing functions!

 * typically the same function with different seeds. 

Bit 0 1 0 0 0 0 1 1
Index 0 1 2 3 4 5 6 7

Item1

Item2

Fewer than 10 bits per item are needed for

1% false positive probability

Adding Items

Bit 1 0 1 0 0 1 1 1
Index 0 1 2 3 4 5 6 7
  1. Create a filter of 8 bits
  2. Each item is hashed with more than one hashing function
  3. Then mapped to an index
  4. For each index, flip the bit to 1

Grocery

Gas

Restaurant

Checking for an item

Bit 1 0 1 0 0 1 1 1
Index 0 1 2 3 4 5 6 7
  1. Use each hashing function on the item to check
  2. Check all the indexes
  3. If any index is zero, then the item is not in the set
  4. If all indexes are 1, then item is probably in the set

Grocery

Gas

Restaurant

????

Probably Exists
False Positive

Restaurant

Probably Exists
Correctly Found

Scammer

Does Not Exist

Calculating Error Rate

Take an array of m bits, and for up to n different elements, either test or set k bits of positions chosen using hash functions.

If all bits are set, the element probably already exists, with a false positive rate of p;

If any of the bits are not set, the element certainly does not exist.

p=(1 – e-kn/m)k

Controlling Error Rate

Chances of false positives are low, but non-zero.

 

As the Bloom filter “fills up” the chances increase.

 

Redis makes it easy to set the rate with the command

BF.CREATE

import redis

# Redis Client
r_client = redis.Redis(host=host, port=port, db=0)

# Create filter with specified error rate and capacity
# where 0 < error_rate < 1
r_client.bf().create(error_rate, capacity)

When to use a

Bloom Filter

Stronger Passwords

When not to use a

Bloom Filter

Call Blocker

Pros

  • Memory Efficient
  • Accurately indicates item is NOT in the collection
  • Typically exhibit better performance and scalability when inserting items compared to Cuckoo Filter (another option in the RedisBloom module)

Cons

  • False Positives that something is in the collection
  • Percent chance of false positive increases as items are added to the collection
  • Can not remove an item from the collection
  • Do not know what items are in the collection

Bloom Filter

  • When a little error is tolerable
  • Large sets of data and space efficiency is important
  • Do not need to know what has been stored

Set

  • No error tolerance
  • Need to be able to remove items
  • Need to know what item has been stored

Code Excerpts

import redis

# Redis Client
r_client = redis.Redis(host=host, port=port, db=0)


# Add to bloom filter
# if the bf doesn't exist yet, it will be created
r_client.bf().add(bloom_filter_name, item)

# Check if item exists in bloom filter
r_client.bf().exists(bloom_filter_name, item)


# Add to set - if set doesn't exist, it will be created
r_client.sadd(set_name, item)

# Check if item exists in set
r_client.sismember(set_name, item)

Code in Context

side by side examples of

Redis Bloom Filter

and

Redis Set

 

https://github.com/amberdoctor/redis_bloom_filter

A password uniqueness checker

Demo Links

Example Use Cases

  • Weak / Duplicate Password Detection
  • Financial Fraud Detection
  • Safe browsing in Google Chrome
  • Content Recommendation System
  • Internet Cache Protocol
  • Prevent One-Hit-Wonders Caching
  • Prevent Expensive Hard Disk Look Up

Weak Password Detection

  • Known Bad Password Checking:
    • Maintain a list of weak passwords to cross reference to warn users to choose a better password.
  • Password Uniqueness:
    • Check that a user does not reuse their password.

 

Reminder: passwords should always be stored hashed.

Financial Fraud Detection

As part of their fraud detection, Credit card companies can store information about previous purchase locations to answer if you have shopped somewhere before.

Safe browsing in Google Chrome

Chrome checks your URL against a bloom filter with known threats and warns you if a site might be a threat.

Content Recommendation System

Media providers can check a bloom filter to avoid showing you content that you may have already seen.

Internet Cache Protocol

A GET request from ComputerA goes to

its nearest proxy, Proxy1.

 

Proxy1 searches its cache (a bloom filter) and finds out that the page may have been cached by proxy3.

 

Proxy1 sends a request to proxy3 to get the page and will return the page to ComputerA.

 

If proxy3 didn't have the page, proxy1 will just ask the provider.

Prevent One-Hit-Wonders Caching

Before caching an item, check that the item exists in a bloom filter.  If it didn't exist, don't cache the item on the first look up.  Add items after the check so they are there for future checks.

 

Saves caching items that are only accessed once.

Prevent Expensive Hard Disk Look Up

Before performing an expensive hard disk look up,

check if the rows / columns exist in a bloom filter.

Increases query performance.

Especially good in situations where a high number of queries would result in no data.

Thank You

Contact Amber Doctor at:

amberdoctor+RedisBloomFilter@gmail.com

 

Code Available at:

https://github.com/amberdoctor/redis_bloom_filter

 

Slides available at:

https://slides.com/amberdoctor/redis-bloom-filter

Redis Bloom Filter

By Amber Doctor

Redis Bloom Filter

  • 1,017