kdbpy: readable q
It's possible
about me
- MA Psychology
- Computational Neuroscience
- Pandas core dev
- blaze, odo, et al @ContinuumIO
Q overview
Who's hearD of q?
for something other than a binary store.
Who's used q?
the language
- Written by Arthur Whitney
- Derived from and written in K
- K is similar to APL
- Tons of operators
- Parses from right to left
- Unless you're in qsql land
- Subset of the lang looks like sql if you squint
q demo!
Pray to the q gods and the demo gods
The database
- Column store
- Handles null values for all types
- Nice support for slowly changing string values (symbols)
- Excellent time series support
- Support for different storage formats that scale to huge tables
- Matrix math
The GOOD
- Column store
- Extremely fast queries
- Concise querying via qsql
- Great time series support
- Nice foreign key syntax
The BAD
- Hardly anyone knows q
- Operators >>>>> Names
- Cryptic error messages
- Large programs are unreadable by anyone except the q gods
- q gods never make errors
- Strange API breaks
- dev to sdev
- Hard to apply traditional database knowledge
THE ugly
q)x:"racecar"
q)n:count x
q)ispal:all{[x;n;i]x[i]=x[n-i+1]}[x;n]each til _:[n%2]+1
q)ispal x
1b
q)1 % "cat"
0.01010101 0.01030928 0.00862069
q)-1["foo"]
foo
-1
q)1["bar"]
bar1
please q gods, give me whitespace
kDBPY
the stack
- A SQLAlchemy dialect
- That can be driven from Blaze
- With odo for moving data around
- Using qpython to talk to a q process
sqlalchemy
- I've never seen a project with this many layers
The good
the bad
- 10 year old, mature project for doing all things SQL, in Python
- Huge community
- Constantly evolving in a mostly sane way
sqlalchemy Dialect
class QDialect(Dialect):
def visit_select(self, select, **kwargs):
# ...
def visit_insert(self, insert, **kwargs):
# ...
# and so on
- Generate (some variant of SQL) using the visitor pattern
sqlalchemy Dialect
>>> import sqlalchemy as sa
>>> engine = sa.create_engine('q://localhost/path/to/db')
>>> meta = sa.MetaData(bind=engine)
>>> t = sa.Table('t', meta, autoload=True)
>>> expr = sa.select([t.c.name, sa.func.avg(t.c.amount)]).group_by(t.c.name)
>>> result = conn.execute().fetchall() # <- a pandas DataFrame
- Build up expressions
Blaze
- Expression system + interpreter for analytic queries
- Built on top of PyData
- pandas, numpy, and others
- Works with existing data structures
the good
the bad
- Too much hype early on
- Suffers from the LCD problem
blaze
- Expressions
- Dispatch on different backends
>>> df = DataFrame({'name': ['Joe', 'Joe', 'Alice', 'Bob', 'Alice'],
... 'amount': np.random.rand(5) * 100})
>>> t = symbol('t', discover(df))
>>> expr = by(t.name, avg_amount=t.amount.mean())
>>> result = compute(expr, df)
>>> result = compute(expr, df.to_records(index=False))
>>> result = compute(expr, list(df.itertuples(index=False)))
>>> from blaze import discover, symbol, by, compute
>>> from pandas import DataFrame
demo time!
wish/bug list
- DDL Support (already PR'd)
- Creation of splayed and partitioned tables not yet ready for primetime
- Foreign and primary key support in blaze expressions
- Better time series support all around
- Transparent insert to special table types
- Optimizations
- Leaning projections in blaze
- Views
- Python 3 support
- Streaming results from q
- Open source it!
thanks!
kdbpy
By Phillip Cloud
kdbpy
- 1,869