kdbpy: readable q

It's possible

about me

  • MA Psychology
    • Computational Neuroscience
  • Pandas core dev
  • blaze, odo, et al @ContinuumIO

Q overview

Who's hearD of q?

for something other than a binary store.

Who's used q?

the language

  • Written by Arthur Whitney
  • Derived from and written in K
  • K is similar to APL
    • Tons of operators
  • Parses from right to left
    • Unless you're in qsql land
  • Subset of the lang looks like sql if you squint

q demo!

Pray to the q gods and the demo gods

The database

  • Column store
  • Handles null values for all types
  • Nice support for slowly changing string values (symbols)
  • Excellent time series support
  • Support for different storage formats that scale to huge tables
  • Matrix math

The GOOD

  • Column store
  • Extremely fast queries
  • Concise querying via qsql
  • Great time series support
  • Nice foreign key syntax

The BAD

  • Hardly anyone knows q
  • Operators >>>>> Names
  • Cryptic error messages
  • Large programs are unreadable by anyone except the q gods
    • q gods never make errors
  • Strange API breaks
    • dev to sdev
  • Hard to apply traditional database knowledge

THE ugly

q)x:"racecar"
q)n:count x
q)ispal:all{[x;n;i]x[i]=x[n-i+1]}[x;n]each til _:[n%2]+1
q)ispal x
1b
q)1 % "cat"
0.01010101 0.01030928 0.00862069
q)-1["foo"]
foo
-1
q)1["bar"]
bar1

please q gods, give me whitespace

kDBPY

the stack

  • A SQLAlchemy dialect
  • That can be driven from Blaze
    • With odo for moving data around
  • Using qpython to talk to a q process 

sqlalchemy

  • I've never seen a project with this many layers

The good

the bad

  • 10 year old, mature project for doing all things SQL, in Python
  • Huge community
  • Constantly evolving in a mostly sane way

sqlalchemy Dialect

class QDialect(Dialect):
    def visit_select(self, select, **kwargs):
        # ...

    def visit_insert(self, insert, **kwargs):
        # ...

    # and so on
  • Generate (some variant of SQL) using the visitor pattern

sqlalchemy Dialect

>>> import sqlalchemy as sa
>>> engine = sa.create_engine('q://localhost/path/to/db')
>>> meta = sa.MetaData(bind=engine)
>>> t = sa.Table('t', meta, autoload=True)
>>> expr = sa.select([t.c.name, sa.func.avg(t.c.amount)]).group_by(t.c.name)
>>> result = conn.execute().fetchall()  # <- a pandas DataFrame
  • Build up expressions

Blaze

  • Expression system + interpreter for analytic queries
  • Built on top of PyData
    • pandas, numpy, and others
  • Works with existing data structures

the good

the bad

  • Too much hype early on
  • Suffers from the LCD problem

blaze

  • Expressions
  • Dispatch on different backends

>>> df = DataFrame({'name': ['Joe', 'Joe', 'Alice', 'Bob', 'Alice'],
...                 'amount': np.random.rand(5) * 100})
>>> t = symbol('t', discover(df))
>>> expr = by(t.name, avg_amount=t.amount.mean())

>>> result = compute(expr, df)
>>> result = compute(expr, df.to_records(index=False))
>>> result = compute(expr, list(df.itertuples(index=False)))
>>> from blaze import discover, symbol, by, compute
>>> from pandas import DataFrame

demo time!

wish/bug list

  • DDL Support (already PR'd)
    • Creation of splayed and partitioned tables not yet ready for primetime
  • Foreign and primary key support in blaze expressions
  • Better time series support all around
  • Transparent insert to special table types
  • Optimizations
    • Leaning projections in blaze
  • Views
  • Python 3 support
  • Streaming results from q
  • Open source it!

thanks!

Made with Slides.com