kdbpy: readable q

It's possible

about me

MA Psychology
- Computational Neuroscience
Pandas core dev
blaze, odo, et al @ContinuumIO

Q overview

Who's hearD of q?

for something other than a binary store.

Who's used q?

the language

Written by Arthur Whitney
Derived from and written in K
K is similar to APL
- Tons of operators
Parses from right to left
- Unless you're in qsql land
Subset of the lang looks like sql if you squint

q demo!

Pray to the q gods and the demo gods

The database

Column store
Handles null values for all types
Nice support for slowly changing string values (symbols)
Excellent time series support
Support for different storage formats that scale to huge tables
Matrix math

The GOOD

Column store
Extremely fast queries
Concise querying via qsql
Great time series support
Nice foreign key syntax

The BAD

Hardly anyone knows q
Operators >>>>> Names
Cryptic error messages
Large programs are unreadable by anyone except the q gods
- q gods never make errors
Strange API breaks
- dev to sdev
Hard to apply traditional database knowledge

THE ugly

q)x:"racecar"
q)n:count x
q)ispal:all{[x;n;i]x[i]=x[n-i+1]}[x;n]each til _:[n%2]+1
q)ispal x
1b

q)1 % "cat"
0.01010101 0.01030928 0.00862069

q)-1["foo"]
foo
-1
q)1["bar"]
bar1

please q gods, give me whitespace

kDBPY

the stack

A SQLAlchemy dialect
That can be driven from Blaze
- With odo for moving data around
Using qpython to talk to a q process

sqlalchemy

I've never seen a project with this many layers

The good

the bad

10 year old, mature project for doing all things SQL, in Python
Huge community
Constantly evolving in a mostly sane way

sqlalchemy Dialect

class QDialect(Dialect):
    def visit_select(self, select, **kwargs):
        # ...

    def visit_insert(self, insert, **kwargs):
        # ...

    # and so on

Generate (some variant of SQL) using the visitor pattern

sqlalchemy Dialect

>>> import sqlalchemy as sa
>>> engine = sa.create_engine('q://localhost/path/to/db')
>>> meta = sa.MetaData(bind=engine)
>>> t = sa.Table('t', meta, autoload=True)
>>> expr = sa.select([t.c.name, sa.func.avg(t.c.amount)]).group_by(t.c.name)
>>> result = conn.execute().fetchall()  # <- a pandas DataFrame

Build up expressions

Blaze

Expression system + interpreter for analytic queries
Built on top of PyData
- pandas, numpy, and others
Works with existing data structures

the good

the bad

Too much hype early on
Suffers from the LCD problem

blaze

Expressions

Dispatch on different backends


>>> df = DataFrame({'name': ['Joe', 'Joe', 'Alice', 'Bob', 'Alice'],
...                 'amount': np.random.rand(5) * 100})
>>> t = symbol('t', discover(df))
>>> expr = by(t.name, avg_amount=t.amount.mean())


>>> result = compute(expr, df)
>>> result = compute(expr, df.to_records(index=False))
>>> result = compute(expr, list(df.itertuples(index=False)))

>>> from blaze import discover, symbol, by, compute
>>> from pandas import DataFrame

demo time!

wish/bug list

DDL Support (already PR'd)
- Creation of splayed and partitioned tables not yet ready for primetime
Foreign and primary key support in blaze expressions
Better time series support all around
Transparent insert to special table types
Optimizations
- Leaning projections in blaze
Views
Python 3 support
Streaming results from q
Open source it!

kdbpy: readable q

It's possible

about me

Q overview

Who's hearD of q?

for something other than a binary store.

Who's used q?

the language

q demo!

The database

The GOOD

The BAD

THE ugly

please q gods, give me whitespace

kDBPY

the stack

sqlalchemy

The good

the bad

sqlalchemy Dialect

sqlalchemy Dialect

Blaze

the good

the bad

blaze

demo time!

wish/bug list

thanks!

kdbpy

kdbpy

Phillip Cloud

kdbpy: readable q

It's possible

about me

Q overview

Who's hearD of q?

for something other than a binary store.

Who's used q?

the language

q demo!

The database

The GOOD

The BAD

THE ugly

please q gods, give me whitespace

kDBPY

the stack

sqlalchemy

The good

the bad

sqlalchemy Dialect

sqlalchemy Dialect

Blaze

the good

the bad

blaze

demo time!

wish/bug list

thanks!

kdbpy

More from Phillip Cloud