SASI - A Revolution for Secondary Indexes in Cassandra

Hi!

  • Computer Engineer
  • Programming
  • Electronics
  • Math <3 <3
  • Physics
  • Lego
  • Meetups
  • Animals
  • Coffee
  • GIFs

Presentation goals

  • Intro about Secondary Indexes in Cassandra
  • SASI - Why is it cool?
  • Challenges
  • GIFs and funny images

Let's talk about indexes

Regular index

ID name email
110 "Foo" "foo@foo.com"
111 "Bar" "bar@bar.com"

You can search a user by ID and get his/her email.

ID 

email

What happens if given an email you want to retrieve the  user ID? 

Let's have some Math:

y

y

x

Inverse function

In terms of matrices:

ID email
110 "foo@foo.com"
111 "bar@bar.com"
email ID
"foo@foo.com" 110
"bar@bar.com" 111

In terms of matrices:

ID email
110 "foo@foo.com"
111 "bar@bar.com"
email ID
"foo@foo.com" 110
"bar@bar.com" 111

 Regular Index

Form Input

We would have to iterate over users, then compare the emails, then get the ID

But we can find the ID in analogical terms very easily! 

What if we could have an extra index on a column to help on that?

ID name email
110 "Foo" "foo@foo.com"
111 "Bar" "bar@bar.com"

Regular index (key)

Secondary index (on column)

Secondary Indexes - The good parts

  • Practical / convenient
  • Easy to make
  • Easy to understand
  • Performance is "okay" for small sets of data

Secondary Indexes - The bad parts

  • Slow for large sets of data
  • They are applied locally, instead of globally
  • 5 instances, 1 query => 5 reads.

Several Strategies for the Secondary Index

References

  • https://dzone.com/articles/cassandra-indexing-good-bad
  • http://www.slideshare.net/edanuff/indexing-in-cassandra
  • http://blog.websudos.com/2014/08/23/a-series-on-cassandra-part-2-indexes-and-keys/
  • http://brianoneill.blogspot.com.br/2012/03/cassandra-indexing-good-bad-and-ugly.html
  • https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes

There is hope

Apple opened their second index startegy

SSTableAttachedSecondaryIndex (SASI)

https://github.com/xedin/sasi

Available with Cassandra 3.4

"Cassandra 3.4 and beyond" - Jon Haddad

SASI through an exmaple

cqlsh> CREATE KEYSPACE foo 
  WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor': '1'
   };
cqlsh> USE foo;
CREATE TABLE bar (
id uuid, fname text, 
lname text,
age int, 
created_at bigint, 
primary key (id))
WITH COMPACT STORAGE;

NOTE: COMPACT STORAGE IS MANDATORY!!11!!111  (at least until now)

Creating the indexes

case_sensitive

CREATE CUSTOM INDEX ON bar (fname) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};

https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/analyzer/StandardAnalyzer.java

'mode': 'SUFFIX'

CREATE CUSTOM INDEX ON bar (lname) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SUFFIX'};

Analyses by suffix. 

'mode': 'SPARSE'

CREATE CUSTOM INDEX ON bar (created_at) 
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SPARSE'};

Analyses ranges of time by timestamp

Understanding the basics of SASI architecture

SASI

Indexing

Querying

memory

disk

memory

disk

Cassandra

SASI builds it's data structures without messing up with Cassandra Architecture. It follows the flow of the events that happen at the SSTable.

Cassandra's key features to SASI

  • Write-only
  • Immutability
  • Ordered data sets

memory => disk

SSTable 

starts writing

SSTable 

ends writing

SASI

creates structures in memory

flushes structures to disk

SASI

SSTableFlushObserver was added into Cassandra source code, to handle with SASI

CommitLog

CommitLog

CommitLog

MemTable

MemTable

Cassandra

IndexMemTable

SASI

IndexMemTable

SASI

Indexing

Querying

Indexing

SSTable

Indexed columns

Index files by SASI

MEMORY

OnDiskIndexBuilder

Indexing

Indexed columns

Index files by SASI

MEMORY

OnDiskIndexBuilder

Disk

OnDiskIndex

Index files by SASI

Optimised data structures (List<ByteBuffer> and custom iterators)

Information o SSTable is turned into these structures

SASI

Indexing

Querying

Querying

Index files by SASI

QueryPlan

Analysis

Execution

 RangeUnionIterator

RangeIntersectionIterator

Limitations

  • Cluster must be configured to use a partitioner that produces LongTokens (Murmur3Partitioner). Does not work with ByteOrderedPartitioner.
  • CQL3 requires COMPACT STORAGE
  • Only Cassandra 2.0.x is supported. If you don't have plans to upgrade, you cannot use SASI properly

Conclusion

Powerful tool for Secondary Indexes

Friendly Reminder: Be careful when using secondary indexes.

Let's study data structures and more Mathematics :)

References

  • Our meetup group in Sao Paulo - http://www.meetup.com/Sao-Paulo-Cassandra-Users/
  • SASI repo https://github.com/xedin/sasi
  • Cassandra initial docs for secondary indexes http://docs.datastax.com/en/archived/cassandra/1.1/docs/ and http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
  • SecondaryIndexes Q/A https://wiki.apache.org/cassandra/SecondaryIndexes
  • SASI on Cassandra https://issues.apache.org/jira/browse/CASSANDRA-10661
  • http://www.doanduyhai.com/blog/?p=2058

Special Thanks

  • @PatrickMcFadin
  • @wheresLINA
  • @planetcassandra
  • @lafp, @romulostorel and @pedrofelipee (GIFs)

Thank you :)

Questions?

 

hannelita@gmail.com

@hannelita

@planetcassandra

@datastax