Product, goals, constraints
How we did it
Do this at home
Explore tests Data
Junit files
Query it at will
Query on the flight
Exploratory querying
Accept any Junit
Handle Quotas
Handle grouping functions: count distinct, percentile, ...
Handle Raw Data
Primary key on timestamp
No update of an event
No need for joins
Limited budget
Sparse data
Under 5sec return
AP
Lot of data
Unstable schema
2 devs part time
6 months
Distributed datastore
Columnar store of flat wide KV events
Kafka
Elixir Query Engine
Elixir Quota Enforcer
Liveview query builder
JSON events go into kafka, one topic per dataset
Storage nodes read at their pace and keep offset saved
In case of crash, we rebuild from Kafka
Kafka consumer groups
Kafka divide topics in partitions, natural distribution unit
Want two replicas per Kafka partition
Use Brod, upstreamed some advanced use
Have two nodes for each partitions
Distribute the partitions over the cluster
Avoid churn
Index | message |
---|---|
1 | If |
2 | You |
15 | Look |
40 | There |
.. | |
100 | is |
250 | no |
300 | null |
... | ... |
message.string
Index | message |
---|---|
1 | 124125346864 |
2 | 123144576764 |
3 | 124353426546 |
... | ... |
index.timestamp
Columnar store:
Ask Kafka for nodes holding partitions
Fan out the query through Task.yield_many/2
Timeout = drop the data
Time manipulation is a pain
Versionning your IR is a pain