SASI and Secondary Indexes
Hi!
- Computer Engineer
- Programming
- Electronics
- Math <3 <3
- Physics
- Lego
- Meetups
- Animals
- Coffee
- GIFs
Let's talk about indexes
Regular index
ID | name | |
---|---|---|
110 | "Foo" | "foo@foo.com" |
111 | "Bar" | "bar@bar.com" |
You can search a user by ID and get his/her email.
ID
What happens if given an email you want to retrieve the user ID?
Let's have some Math:
x
y
y
x
Inverse function
In terms of matrices:
ID | |
---|---|
110 | "foo@foo.com" |
111 | "bar@bar.com" |
ID | |
---|---|
"foo@foo.com" | 110 |
"bar@bar.com" | 111 |
In terms of matrices:
ID | |
---|---|
110 | "foo@foo.com" |
111 | "bar@bar.com" |
ID | |
---|---|
"foo@foo.com" | 110 |
"bar@bar.com" | 111 |
Regular Index
Form Input
We would have to iterate over users, then compare the emails, then get the ID
But we can find the ID in analogical terms very easily!
What if we could have an extra index on a column to help on that?
ID | name | |
---|---|---|
110 | "Foo" | "foo@foo.com" |
111 | "Bar" | "bar@bar.com" |
Regular index (key)
Secondary index (on column)
Secondary Indexes - The good parts
- Practical
- Easy to make
- Easy to understand
- good performance for small sets of data
Secondary Indexes - The bad parts
- Slow for large sets of data
- They are applied locally, instead of globally
- In a ring with 5 instances, 1 query => 5 reads.
Several Strategies for the Secondary Index
References
- https://dzone.com/articles/cassandra-indexing-good-bad
- http://www.slideshare.net/edanuff/indexing-in-cassandra
- http://blog.websudos.com/2014/08/23/a-series-on-cassandra-part-2-indexes-and-keys/
- http://brianoneill.blogspot.com.br/2012/03/cassandra-indexing-good-bad-and-ugly.html
- https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes
Things are not lost
Apple opened their second index startegy
SSTableAttachedSecondaryIndex (SASI)
https://github.com/xedin/sasi
SASI through an exmaple
cqlsh> CREATE KEYSPACE foo
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};
cqlsh> USE foo;
CREATE TABLE bar (
id uuid, fname text,
lname text,
age int,
created_at bigint,
primary key (id))
WITH COMPACT STORAGE;
NOTE: COMPACT STORAGE IS MANDATORY!!11!!111 (for a while)
Creating the indexes
case_sensitive
CREATE CUSTOM INDEX ON bar (fname)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};
https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/analyzer/StandardAnalyzer.java
'mode': 'SUFFIX'
CREATE CUSTOM INDEX ON bar (lname)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SUFFIX'};
Analyses by suffix.
'mode': 'SPARSE'
CREATE CUSTOM INDEX ON bar (created_at)
USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
Analyses ranges of time by timestamp
Understanding the basics of SASI architecture
SASI
Indexing
Querying
memory
disk
memory
disk
Cassandra
SASI takes advantage of Cassandra's Architecture to build a powerful set of data structure that manages information between memory and disk as well.
Cassandra's key features to SASI
- Write-only
- Immutability
- Ordered data sets
memory => disk
SSTable
starts writing
SSTable
ends writing
SASI
creates structures in memory
flushes structures to disk
SASI
SASI
Indexing
Querying
Indexing
SSTable
Indexed columns
Index files by SASI
MEMORY
OnDiskIndexBuilder
Indexing
Indexed columns
Index files by SASI
MEMORY
OnDiskIndexBuilder
Disk
OnDiskIndex
Index files by SASI
Optimised data structures (List<ByteBuffer> and custom iterators)
Information o SSTable is turned into these structures
SASI
Indexing
Querying
Querying
Index files by SASI
QueryPlan
Analysis
Execution
RangeUnionIterator
RangeIntersectionIterator
Limitations
- Cluster must be configured to use a partitioner that produces LongTokens (Murmur3Partitioner). Does not work with ByteOrderedPartitioner.
- CQL3 requires COMPACT STORAGE
- Only Cassandra 2.0.x is supported. If you don't have plans to upgrade, you cannot use SASI properly
Conclusion
Powerful tool for Secondary Indexes
Let's study data structures and more Mathematics :)
References
- Our meetup group in Sao Paulo - http://www.meetup.com/Sao-Paulo-Cassandra-Users/
- SASI repo https://github.com/xedin/sasi
- Cassandra initial docs for secondary indexes http://docs.datastax.com/en/archived/cassandra/1.1/docs/ and http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
- SecondaryIndexes Q/A https://wiki.apache.org/cassandra/SecondaryIndexes
- SASI on Cassandra https://issues.apache.org/jira/browse/CASSANDRA-10661
Special Thanks
- @PatrickMcFadin
- @wheresLINA
- @planetcassandra
- @lafp, @romulostores and @pedrofelipee (GIFs)
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
@planetcassandra
@datastax
SASI and Secondary Indexes
By Hanneli Tavante (hannelita)
SASI and Secondary Indexes
- 5,833