CQL and CQLengine

Experiences with Cassandra

using CQL and CQLEngine



Kai Lautaportti

kai@enquos.com

About me

  • 10+ years professionally
  • web development
  • backend / frontend / ops
  • love Python :)
  • aspiring C* enthusiast

Cassandra & Python

CQL3 in Cassandra 1.2+

  • approachable SQL-like syntax
  • no joins, subqueries, foreign key constraints
  • limited ordering
  • DDL and DML support
  • data dictionary
  • tracing
  • "upsert" semantics

CQL3 datatypes

  • text (ascii, utf-8)
  • numbers (int, float, double)
  • boolean
  • blob
  • timestamp
  • inet
  • uuid (version 1, generic)
  • collections (set, map, list)
  • counter

Data modeling patterns

  • non-clustered tables
  • clustered tables
  • composite partitions (sharding)

non-clustered tables

 CREATE TABLE user (
    id timeuuid,
    username text,
    emails set<text>,
    PRIMARY KEY (id));

INSERT INTO user (id, username, emails) VALUES (
    5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f,
    'dokai',
    {'kai@enquos.com', 'kai@gmail.com'});

SELECT * FROM user WHERE id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f;

In Python

 user["5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f"] = {
    "username": "dokai",
    "emails": set(["kai@enquos.com", "kai@gmail.com"]),
    }

clustered tables

CREATE TABLE actionlog (
    user_id timeuuid,
    tstamp timeuuid,
    action text,
    PRIMARY KEY (user_id, tstamp))
WITH CLUSTERING ORDER BY (tstamp DESC);

INSERT INTO actionlog (user_id, tstamp, action) VALUES (
    5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f,
    00e13923-e0e9-11e2-a657-28cfe91cca6f,
    'login');

-- Latest 10 log entries for a user
SELECT tstamp, dateOf(tstamp), action
FROM actionlog
WHERE user_id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f
LIMIT 10;

clustered tables

In Python
user["5dbbf93a"] = {
    ("6e6efcb0", ): { "action": "logout"},
    ("00e13923", ): { "action": "login"},
}

# user["5dbbf93a"].items() is ordered in reversed key order 

composite partition key

CREATE TABLE actionlog (
    user_id timeuuid,
    year int,
    tstamp timeuuid,
    action text,
    PRIMARY KEY ((user_id, year), tstamp))
WITH CLUSTERING ORDER BY (tstamp DESC);

INSERT INTO actionlog (user_id, year, tstamp, action) VALUES (
    5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f, 2013,
    00e13923-e0e9-11e2-a657-28cfe91cca6f,'login');

-- Latest 10 log entries for a user
SELECT tstamp, dateOf(tstamp), action
FROM actionlog
WHERE user_id = 5dbbf93a-e0e8-11e2-be2e-28cfe91cca6f AND year = 2013
LIMIT 10;

composite partition key

In Python
 user[("5dbbf93a", 2013)] = {
    ("6e6efcb0", ): { "action": "logout"},
    ("00e13923", ): { "action": "login"},
    }

# user["5dbbf93a"].items() is ordered in reversed key order

cqlsh

  • command-line CQL interface (Python)
  • CQL statements + additional commands
    • DESCRIBE
    • COPY
    • TRACING
    • ...
  • Tracing is extremely useful in debugging
  • Scripting support
    • cqlsh -f <file>
    • echo "SELECT ..."|cqlsh -f /dev/stdin

cqlengine

  • Object (Resource) Mapper
  • connection management
  • schema management
  • query abstraction

Not available (yet)

  • identity mapping
  • sessions
  • lazy loading attributes
  • partial writes

cqlengine pros

  • higher abstraction level
  • short learning curve
    • (django orm, sqlalchemy)
  • actively developed
  • tests and documentation

cqlengine cons

  • performance penalty
  • masks the C* data model 
  • work in progress
  • limited feature set
  • python 3 support (?)
  • native driver support (?)

cqlengine models

from cqlengine.models import Model
from cqlengine import columns

class User(Model):
    id = columns.TimeUUID(primary_key=True)
    username = columns.Text()
    emails = columns.Set(columns.Text)

class ActionLog(Model):
    user_id = columns.TimeUUID(primary_key=True)
    tstamp = columns.TimeUUID(primary_key=True, clustering_order='DESC')
    action = columns.Text()

class ActionLogYear(Model):
    user_id = columns.TimeUUID(primary_key=True, partition_key=True)
    year = columns.Integer(primary_key=True, partition_key=True)
    tstamp = columns.TimeUUID(primary_key=True, clustering_order='DESC')
    action = columns.Text()

cqlengine example

# Connecting to Cassanda cluster
from cqlengine import connection

connection.setup('127.0.0.1  127.0.0.2  127.0.0.3'.split())

# Creating / updating tables
from cqlengine import management

management.sync_table(User)
management.sync_table(ActionLog)
management.sync_table(ActionLogYear)

cqlengine example

import uuid

user1 = User.create(
    id=uuid.uuid1(),
    username=u'kai',
    emails=set([u'kai@enquos.com', u'lautaportti@acme.com']))

ActionLog.create(
    user_id=user1.id,
    tstamp=uuid.uuid1(),
    action=u'login')

ActionLog.create(
    user_id=user1.id,
    tstamp=uuid.uuid1(),
    action=u'logout')

CQLENGINE EXAMPLE

from cqlengine import functions
from datetime import datetime
from datetime import timedelta

threshold = datetime.utcnow() - timedelta(days=30)
query = ActionLog.objects.filter(
    ActionLog.user_id == user1.id,
    ActionLog.tstamp >= functions.MinTimeUUID(threshold)).limit(50)

for entry in query:
    print(entry.action)
    user1.emails.add(u'foo@bar.com')
    user1.save()

cqlengine example

from cqlengine import BatchQuery
from cqlengine.query import BatchType

with BatchQuery() as b:
    user1.emails.remove(u'foo@bar.com')
    user1.batch(b).save()
    ActionLog.objects.filter(
        ActionLog.user_id == user1.id).batch(b).delete()

with BatchQuery(batch_type=BatchType.Unlogged) as b:
    pass

sphinxcontrib-cqlengine

thank you





Kai Lautaportti
kai@enquos.com
@dokai
Made with Slides.com