StupiDB

Software that isn't too smart

How does a DB work?

How does SELECT work?

How do I learn SELECT?

Learning SELECT from (PostgreSQL) source

Knowledge

Decades of CS and domain knowledge baked into systems like PostgreSQL.

Compilers (algebra)
Query optimization (join)
Data structures (B+ Trees)
Mathematical logic (model)

What's the dumbest DB look like?

What is SELECT, really?

[(f[1](row), ..., f[n](row)) for row in rows]

where

f :: row -> column

SELECT

Yes.

StupiDB uses __iter__ to build relational algebra

Iterators turn out to be a very effective strategy here

Isn't that "just" a generator?

StupiDB

```
Relation = Iterable[Mapping[str, Any]]
```
Categories of relations
- Projection
- Selection (row filtering)
- Join
- Group By (sort of)
APIs for building relations

API

>>> from pprint import pprint
>>> from stupidb.api import aggregate, group_by, mean, table
>>> rows = [
...     dict(name="Bob", age=30, timezone="America/New_York"),
...     dict(name="Susan", age=20, timezone="America/New_York"),
...     dict(name="Joe", age=41, timezone="America/Los_Angeles"),
...     dict(name="Alice", age=39, timezone="America/Los_Angeles"),
... ]
>>> average_age_by_timezone = (
...     table(rows) >> group_by(tz=lambda r: r.timezone)
...                 >> aggregate(avg_age=mean(lambda r: r.age))
... )
>>> pprint(list(average_age_by_timezone), width=79)
[Row({'tz': 'America/New_York', 'avg_age': 25.0}),
 Row({'tz': 'America/Los_Angeles', 'avg_age': 40.0})]

Window Function API

>>> from stupidb.api import Window, over, mean, select, table
>>> from datetime import date, timedelta
>>> today = date(2019, 2, 9)
>>> days = timedelta(days=1)
>>> rows = [
...     {"name": "Alice", "balance": 400, "date": today},
...     {"name": "Alice", "balance": 300, "date": today + 1 * days},
...     {"name": "Alice", "balance": 100, "date": today + 2 * days},
...     {"name": "Bob", "balance": -150, "date": today - 4 * days},
...     {"name": "Bob", "balance": 200, "date": today - 3 * days},
... ]
>>> t = table(rows)
>>> window = Window.range(
...     partition_by=[lambda r: r.name],
...     order_by=[lambda r: r.date],
...     preceding=lambda r: 2 * days  # two days behind + the current row
... )
>>> avg_balance_per_person = table(rows) >> select(
...     name=lambda r: r.name,
...     avg_balance=mean(lambda r: r.balance) >> over(window),
...     balance=lambda r: r.balance,
...     date=lambda r: r.date,
... ) >> order_by(lambda r: r.name, lambda r: r.date)
>>> pprint(list(avg_balance_per_person), width=79)  # noqa: E501
[Row({'name': 'Alice', 'balance': 400, 'date': datetime.date(2019, 2, 9), 'avg_balance': 400.0}),
 Row({'name': 'Alice', 'balance': 300, 'date': datetime.date(2019, 2, 10), 'avg_balance': 350.0}),
 Row({'name': 'Alice', 'balance': 100, 'date': datetime.date(2019, 2, 11), 'avg_balance': 266.667}),
 Row({'name': 'Bob', 'balance': -150, 'date': datetime.date(2019, 2, 5), 'avg_balance': -150.0}),
 Row({'name': 'Bob', 'balance': 200, 'date': datetime.date(2019, 2, 6), 'avg_balance': 25.0})]

Window Functions

Many aspects of software engineering involved
- API design
- Data structures
- Algorithms
  - E.g., range bounds
Four algorithms for aggregation
- Segment trees
- Removable cumulative
- "Append-only"
- Naive

O\left(n\log{n}\right)

O\left(n^{2}\right)\text{*}

O\left(n^{2}\right)

Segment Trees

Characteristics

O(log N) query
O(N log N) construction
O(fanout ** height) space

Conclusions

Databases are hard
Feel great about writing things from scratch*
Build for stupidly bad performance*

StupiDB

By Phillip Cloud

StupiDB

1,603

Phillip Cloud

phillipcloud

StupiDB

How does a DB work?

How does SELECT work?

How do I learn SELECT?

Learning SELECT from (PostgreSQL) source

Knowledge

What's the dumbest DB look like?

What is SELECT, really?

where

f :: row -> column

SELECT

Isn't that "just" a generator?

StupiDB

API

Window Function API

Window Functions

Segment Trees

Conclusions

StupiDB

More from Phillip Cloud