>>> from blaze import symbol, discover, compute
>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Alice', 'Bob', 'Forrest', 'Bubba'],
... 'amount': [10, 20, 30, 40]})
...
>>> t = symbol('t', discover(df))
>>> t.amount.sum()
sum(t.amount)
>>> compute(t.amount.sum(), df)
100
>>> compute(t.amount.sum(), odo(df, list))
100
>>> compute(t.amount.sum(), odo(df, np.ndarray))
100
q)x:"racecar"
q)n:count x
q)all{[x;n;i]x[i]=x[n-i+1]}[x;n]each til _:[n%2]+1
1b
Check if a string is a palindrome
q)-1 x
racecar
-1
q)1 x
racecar1
Print to stdout, with and without a newline
Um, integers are callable?
1 divided by cat
q)1 % "cat"
0.01010101 0.01030928 0.00862069
so....
Why KDB+/Q?
*It's a little more nuanced than that
What is kdbpy?
≈16 GB (trip dataset only)
partitioned in KDB+ on date (year.month.day)
vs
blaze (bcolz + pandas + multiprocessing)
group by on
passenger count
medallion
hack license
sum on
trip time
trip distance
# trip time
avg_trip_time = trip.trip_time_in_secs.mean()
by(trip.medallion, avg_trip_time=avg_trip_time)
by(trip.passenger_count, avg_trip_time=avg_trip_time)
by(trip.hack_license, avg_trip_time=avg_trip_time)
Storage
Compute
Parallelization