CQL and CQLengine
Experiences with Cassandra
using CQL and CQLEngine
Kai Lautaportti
kai@enquos.com
About me
- 10+ years professionally
- web development
- backend / frontend / ops
- love Python :)
- aspiring C* enthusiast
Cassandra & Python
- pycassa
- thrift RPC
- cql
- thrift transport
- db-api 2.0 compatible
- Datastax python driver
- native binary protocol
- async, server notifications, compression
- Cassandra 2.x
- cqlengine
- object mapper
- cql / native driver
CQL3 in Cassandra 1.2+
- approachable SQL-like syntax
- no joins, subqueries, foreign key constraints
- limited ordering
- DDL and DML support
- data dictionary
- tracing
- "upsert" semantics
CQL3 datatypes
- text (ascii, utf-8)
- numbers (int, float, double)
- boolean
- blob
- timestamp
- inet
- uuid (version 1, generic)
- collections (set, map, list)
- counter
Data modeling patterns
- non-clustered tables
- clustered tables
- composite partitions (sharding)
non-clustered tables
CREATE TABLE user (
id timeuuid,
username text,
emails set<text>,
PRIMARY KEY (id));
INSERT INTO user (id, username, emails) VALUES (
5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f,
'dokai',
{'kai@enquos.com', 'kai@gmail.com'});
SELECT * FROM user WHERE id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f;
In Python
user["5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f"] = {
"username": "dokai",
"emails": set(["kai@enquos.com", "kai@gmail.com"]),
}
clustered tables
CREATE TABLE actionlog (
user_id timeuuid,
tstamp timeuuid,
action text,
PRIMARY KEY (user_id, tstamp))
WITH CLUSTERING ORDER BY (tstamp DESC);
INSERT INTO actionlog (user_id, tstamp, action) VALUES (
5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f,
00e13923-e0e9-11e2-a657-28cfe91cca6f,
'login');
-- Latest 10 log entries for a user
SELECT tstamp, dateOf(tstamp), action
FROM actionlog
WHERE user_id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f
LIMIT 10;
clustered tables
In Python
user["5dbbf93a"] = {
("6e6efcb0", ): { "action": "logout"},
("00e13923", ): { "action": "login"},
}
# user["5dbbf93a"].items() is ordered in reversed key order
composite partition key
CREATE TABLE actionlog (
user_id timeuuid,
year int,
tstamp timeuuid,
action text,
PRIMARY KEY ((user_id, year), tstamp))
WITH CLUSTERING ORDER BY (tstamp DESC);
INSERT INTO actionlog (user_id, year, tstamp, action) VALUES (
5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f, 2013,
00e13923-e0e9-11e2-a657-28cfe91cca6f,'login');
-- Latest 10 log entries for a user
SELECT tstamp, dateOf(tstamp), action
FROM actionlog
WHERE user_id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f AND year = 2013
LIMIT 10;
composite partition key
In Python
user[("5dbbf93a", 2013)] = {
("6e6efcb0", ): { "action": "logout"},
("00e13923", ): { "action": "login"},
}
# user["5dbbf93a"].items() is ordered in reversed key order
cqlsh
- command-line CQL interface (Python)
- CQL statements + additional commands
- DESCRIBE
- COPY
- TRACING
- ...
- Tracing is extremely useful in debugging
- Scripting support
- cqlsh -f <file>
- echo "SELECT ..."|cqlsh -f /dev/stdin
cqlengine
- Object (Resource) Mapper
- connection management
- schema management
- query abstraction
Not available (yet)
- identity mapping
- sessions
- lazy loading attributes
- partial writes
cqlengine pros
- higher abstraction level
- short learning curve
- (django orm, sqlalchemy)
- actively developed
- tests and documentation
cqlengine cons
- performance penalty
- masks the C* data model
- work in progress
- limited feature set
- python 3 support (?)
- native driver support (?)
cqlengine models
from cqlengine.models import Model
from cqlengine import columns
class User(Model):
id = columns.TimeUUID(primary_key=True)
username = columns.Text()
emails = columns.Set(columns.Text)
class ActionLog(Model):
user_id = columns.TimeUUID(primary_key=True)
tstamp = columns.TimeUUID(primary_key=True, clustering_order='DESC')
action = columns.Text()
class ActionLogYear(Model):
user_id = columns.TimeUUID(primary_key=True, partition_key=True)
year = columns.Integer(primary_key=True, partition_key=True)
tstamp = columns.TimeUUID(primary_key=True, clustering_order='DESC')
action = columns.Text()
cqlengine example
# Connecting to Cassanda cluster
from cqlengine import connection
connection.setup('127.0.0.1 127.0.0.2 127.0.0.3'.split())
# Creating / updating tables
from cqlengine import management
management.sync_table(User)
management.sync_table(ActionLog)
management.sync_table(ActionLogYear)
cqlengine example
import uuid
user1 = User.create(
id=uuid.uuid1(),
username=u'kai',
emails=set([u'kai@enquos.com', u'lautaportti@acme.com']))
ActionLog.create(
user_id=user1.id,
tstamp=uuid.uuid1(),
action=u'login')
ActionLog.create(
user_id=user1.id,
tstamp=uuid.uuid1(),
action=u'logout')
CQLENGINE EXAMPLE
from cqlengine import functions
from datetime import datetime
from datetime import timedelta
threshold = datetime.utcnow() - timedelta(days=30)
query = ActionLog.objects.filter(
ActionLog.user_id == user1.id,
ActionLog.tstamp >= functions.MinTimeUUID(threshold)).limit(50)
for entry in query:
print(entry.action)
user1.emails.add(u'foo@bar.com')
user1.save()
cqlengine example
from cqlengine import BatchQuery
from cqlengine.query import BatchType
with BatchQuery() as b:
user1.emails.remove(u'foo@bar.com')
user1.batch(b).save()
ActionLog.objects.filter(
ActionLog.user_id == user1.id).batch(b).delete()
with BatchQuery(batch_type=BatchType.Unlogged) as b:
pass
sphinxcontrib-cqlengine
- Sphinx extension
- cqlengine model autodoc
- uses blockdiag, autodoc, intersphinx
- https://github.com/dokai/sphinxcontrib-cqlengine
thank you
Kai Lautaportti
kai@enquos.com
@dokai
Using Cassandra with CQL and Python
By Kai Lautaportti
Using Cassandra with CQL and Python
- 8,139