SASI - A Revolution for Secondary Indexes in Cassandra
Hi!
- Computer Engineer
- Programming
- Electronics
- Math <3 <3
- Physics
- Lego
- Meetups
- Animals
- Coffee
- GIFs
Presentation goals
- Intro about Secondary Indexes in Cassandra
- SASI - Why is it cool?
- Challenges
- GIFs and funny images
Let's talk about indexes
Regular index
ID | name | |
---|---|---|
110 | "Foo" | "foo@foo.com" |
111 | "Bar" | "bar@bar.com" |
You can search a user by ID and get his/her email.
ID
What happens if given an email you want to retrieve the user ID?
Let's have some Math:
x
y
y
x
Inverse function
In terms of matrices:
ID | |
---|---|
110 | "foo@foo.com" |
111 | "bar@bar.com" |
ID | |
---|---|
"foo@foo.com" | 110 |
"bar@bar.com" | 111 |
In terms of matrices:
ID | |
---|---|
110 | "foo@foo.com" |
111 | "bar@bar.com" |
ID | |
---|---|
"foo@foo.com" | 110 |
"bar@bar.com" | 111 |
Regular Index
Form Input
We would have to iterate over users, then compare the emails, then get the ID
But we can find the ID in analogical terms very easily!
What if we could have an extra index on a column to help on that?
ID | name | |
---|---|---|
110 | "Foo" | "foo@foo.com" |
111 | "Bar" | "bar@bar.com" |
Regular index (key)
Secondary index (on column)
Secondary Indexes - The good parts
- Practical / convenient
- Easy to make
- Easy to understand
- Performance is "okay" for small sets of data
Secondary Indexes - The bad parts
- Slow for large sets of data
- They are applied locally, instead of globally
- 5 instances, 1 query => 5 reads.
Several Strategies for the Secondary Index
References
- https://dzone.com/articles/cassandra-indexing-good-bad
- http://www.slideshare.net/edanuff/indexing-in-cassandra
- http://blog.websudos.com/2014/08/23/a-series-on-cassandra-part-2-indexes-and-keys/
- http://brianoneill.blogspot.com.br/2012/03/cassandra-indexing-good-bad-and-ugly.html
- https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes
There is hope
Apple opened their second index startegy
SSTableAttachedSecondaryIndex (SASI)
https://github.com/xedin/sasi
Available with Cassandra 3.4
"Cassandra 3.4 and beyond" - Jon Haddad
SASI through an exmaple
cqlsh> CREATE KEYSPACE foo
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
cqlsh> USE foo;
CREATE TABLE bar (
id uuid, fname text,
lname text,
age int,
created_at bigint,
primary key (id))
WITH COMPACT STORAGE;
NOTE: COMPACT STORAGE IS MANDATORY!!11!!111 (at least until now)
Creating the indexes
case_sensitive
CREATE CUSTOM INDEX ON bar (fname)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};
https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/analyzer/StandardAnalyzer.java
'mode': 'SUFFIX'
CREATE CUSTOM INDEX ON bar (lname)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SUFFIX'};
Analyses by suffix.
'mode': 'SPARSE'
CREATE CUSTOM INDEX ON bar (created_at)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
Analyses ranges of time by timestamp
Understanding the basics of SASI architecture
SASI
Indexing
Querying
memory
disk
memory
disk
Cassandra
SASI builds it's data structures without messing up with Cassandra Architecture. It follows the flow of the events that happen at the SSTable.
Cassandra's key features to SASI
- Write-only
- Immutability
- Ordered data sets
memory => disk
SSTable
starts writing
SSTable
ends writing
SASI
creates structures in memory
flushes structures to disk
SASI
SSTableFlushObserver was added into Cassandra source code, to handle with SASI
CommitLog
CommitLog
CommitLog
MemTable
MemTable
Cassandra
IndexMemTable
SASI
IndexMemTable
SASI
Indexing
Querying
Indexing
SSTable
Indexed columns
Index files by SASI
MEMORY
OnDiskIndexBuilder
Indexing
Indexed columns
Index files by SASI
MEMORY
OnDiskIndexBuilder
Disk
OnDiskIndex
Index files by SASI
Optimised data structures (List<ByteBuffer> and custom iterators)
Information o SSTable is turned into these structures
SASI
Indexing
Querying
Querying
Index files by SASI
QueryPlan
Analysis
Execution
RangeUnionIterator
RangeIntersectionIterator
Limitations
- Cluster must be configured to use a partitioner that produces LongTokens (Murmur3Partitioner). Does not work with ByteOrderedPartitioner.
- CQL3 requires COMPACT STORAGE
- Only Cassandra 2.0.x is supported. If you don't have plans to upgrade, you cannot use SASI properly
Conclusion
Powerful tool for Secondary Indexes
Friendly Reminder: Be careful when using secondary indexes.
Let's study data structures and more Mathematics :)
References
- Our meetup group in Sao Paulo - http://www.meetup.com/Sao-Paulo-Cassandra-Users/
- SASI repo https://github.com/xedin/sasi
- Cassandra initial docs for secondary indexes http://docs.datastax.com/en/archived/cassandra/1.1/docs/ and http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
- SecondaryIndexes Q/A https://wiki.apache.org/cassandra/SecondaryIndexes
- SASI on Cassandra https://issues.apache.org/jira/browse/CASSANDRA-10661
- http://www.doanduyhai.com/blog/?p=2058
Special Thanks
- @PatrickMcFadin
- @wheresLINA
- @planetcassandra
- @lafp, @romulostorel and @pedrofelipee (GIFs)
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
@planetcassandra
@datastax
SASI - A Revolution for Secondary Indexes in Cassandra
By Hanneli Tavante (hannelita)
SASI - A Revolution for Secondary Indexes in Cassandra
- 4,609