SASI and Secondary Indexes

Hi!

  • Computer Engineer
  • Programming
  • Electronics
  • Math <3 <3
  • Physics
  • Lego
  • Meetups
  • Animals
  • Coffee
  • GIFs

Let's talk about indexes

Regular index

ID name email
110 "Foo" "foo@foo.com"
111 "Bar" "bar@bar.com"

You can search a user by ID and get his/her email.

ID 

email

What happens if given an email you want to retrieve the  user ID? 

Let's have some Math:

y

y

x

Inverse function

In terms of matrices:

ID email
110 "foo@foo.com"
111 "bar@bar.com"
email ID
"foo@foo.com" 110
"bar@bar.com" 111

In terms of matrices:

ID email
110 "foo@foo.com"
111 "bar@bar.com"
email ID
"foo@foo.com" 110
"bar@bar.com" 111

 Regular Index

Form Input

We would have to iterate over users, then compare the emails, then get the ID

But we can find the ID in analogical terms very easily! 

What if we could have an extra index on a column to help on that?

ID name email
110 "Foo" "foo@foo.com"
111 "Bar" "bar@bar.com"

Regular index (key)

Secondary index (on column)

Secondary Indexes - The good parts

  • Practical
  • Easy to make
  • Easy to understand
  • good performance for small sets of data

Secondary Indexes - The bad parts

  • Slow for large sets of data
  • They are applied locally, instead of globally
  • In a ring with 5 instances, 1 query => 5 reads.

Several Strategies for the Secondary Index

References

  • https://dzone.com/articles/cassandra-indexing-good-bad
  • http://www.slideshare.net/edanuff/indexing-in-cassandra
  • http://blog.websudos.com/2014/08/23/a-series-on-cassandra-part-2-indexes-and-keys/
  • http://brianoneill.blogspot.com.br/2012/03/cassandra-indexing-good-bad-and-ugly.html
  • https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes

Things are not lost

Apple opened their second index startegy

SSTableAttachedSecondaryIndex (SASI)

https://github.com/xedin/sasi

SASI through an exmaple

cqlsh> CREATE KEYSPACE foo 
  WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor': '1'
   };
cqlsh> USE foo;
CREATE TABLE bar (
id uuid, fname text, 
lname text,
age int, 
created_at bigint, 
primary key (id))
WITH COMPACT STORAGE;

NOTE: COMPACT STORAGE IS MANDATORY!!11!!111  (for a while)

Creating the indexes

case_sensitive

CREATE CUSTOM INDEX ON bar (fname) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};

https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/analyzer/StandardAnalyzer.java

'mode': 'SUFFIX'

CREATE CUSTOM INDEX ON bar (lname) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SUFFIX'};

Analyses by suffix. 

'mode': 'SPARSE'

CREATE CUSTOM INDEX ON bar (created_at) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SPARSE'};

Analyses ranges of time by timestamp

Understanding the basics of SASI architecture

SASI

Indexing

Querying

memory

disk

memory

disk

Cassandra

SASI takes advantage of Cassandra's Architecture to build a powerful set of data structure that manages information between memory and disk as well.

Cassandra's key features to SASI

  • Write-only
  • Immutability
  • Ordered data sets

memory => disk

SSTable 

starts writing

SSTable 

ends writing

SASI

creates structures in memory

flushes structures to disk

SASI

SASI

Indexing

Querying

Indexing

SSTable

Indexed columns

Index files by SASI

MEMORY

OnDiskIndexBuilder

Indexing

Indexed columns

Index files by SASI

MEMORY

OnDiskIndexBuilder

Disk

OnDiskIndex

Index files by SASI

Optimised data structures (List<ByteBuffer> and custom iterators)

Information o SSTable is turned into these structures

SASI

Indexing

Querying

Querying

Index files by SASI

QueryPlan

Analysis

Execution

 RangeUnionIterator

RangeIntersectionIterator

Limitations

  • Cluster must be configured to use a partitioner that produces LongTokens (Murmur3Partitioner). Does not work with ByteOrderedPartitioner.
  • CQL3 requires COMPACT STORAGE
  • Only Cassandra 2.0.x is supported. If you don't have plans to upgrade, you cannot use SASI properly

Conclusion

Powerful tool for Secondary Indexes

Let's study data structures and more Mathematics :)

References

  • Our meetup group in Sao Paulo - http://www.meetup.com/Sao-Paulo-Cassandra-Users/
  • SASI repo https://github.com/xedin/sasi
  • Cassandra initial docs for secondary indexes http://docs.datastax.com/en/archived/cassandra/1.1/docs/ and http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
  • SecondaryIndexes Q/A https://wiki.apache.org/cassandra/SecondaryIndexes
  • SASI on Cassandra https://issues.apache.org/jira/browse/CASSANDRA-10661

Special Thanks

  • @PatrickMcFadin
  • @wheresLINA
  • @planetcassandra
  • @lafp, @romulostores and @pedrofelipee (GIFs)

Thank you :)

Questions?

 

hannelita@gmail.com

@hannelita

@planetcassandra

@datastax