🐦 Ingesting Petabytes at Tinybird
PyData Madrid, 2024-01-18
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/11057190/pasted-from-clipboard.png)
🤳 Your Host Tonight
Alex Fernández "pinchito"
Silverback developer at Tinybird
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/10259817/pasted-from-clipboard.png)
🗂️ What we will see today
🤔 What does Tinybird do?
📑 Principles
🔧 Techniques
🪄 Tricks
🤔 What does Tinybird do?
And what does Tinybird not do?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/11057281/pasted-from-clipboard.png)
🙅♀️ First, what does Tinybird not do?
It is not a ClickHouse wrapper
It does not store logs
It is not a DWH
🤌 So, what does Tinybird do?
Real time analytics
Process data at scale
Publish API endpoints
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/11057295/pasted-from-clipboard.png)
🗣️ Customer quotes
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/11057256/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/11057268/pasted-from-clipboard.png)
🙊 Non-secret monthly numbers
Ingest petabytes of data
Process many petabytes of data
Serve billions of requests
📑 Principles
How do we do it?
![](https://www.justinkownacki.com/wp-content/uploads/2021/09/Wire-Omar-Code.gif)
⏱️ Real real-time
From days, hours or minutes — to seconds
Process data as it comes
Reduce latencies
Blog: Real-Time Data Ingestion: The Foundation for Real-time Analytics
🕵️ Customer focus
Really close to customers
Direct communication channels
Everyone does customer support
Fast iteration
🏗️ Production centric
Most engineers do on-call
Everyone suffers the pain
Everyone Deploys
Top-notch Production Culture
🐶 Eat your own dog food
Use Tinybird as much as possible
Be your first user
⚡ Speed wins
Iterate fast
Communicate often
Don't wait for permission
Hardest principle to implement — and copy
🔧 Techniques
More to the Point
![](https://i.makeagif.com/media/3-08-2021/9qE3xL.gif)
🗿 Monorepo
Everything in the same repo
Ingest, Backend, UI Together
Includes code, tests, docs, tooling, infra, CI itself
A controversial practice (Google, Facebook)
When it works, works great!
🛬 Continuous deployment
Deploy tens of times per day
Write → Test → Review → Merge → Deploy
Everything goes straight to production
🌐 HTTP interface
🚏 Requests vs events
Event
Request
![](https://i.pinimg.com/originals/4e/44/32/4e44325fb68bce84c15b497df68f761f.gif)
![](https://i.makeagif.com/media/7-07-2015/R0lAlD.gif)
🥅 Gather data
Pre-aggregate data
Aggregate ClickHouse operations
Fastest, more predictable response
📈 Be ahead
Customers increase their data overnight by 2x, 5x, 10x
Be ready for deluges
Look for the next bottleneck
🪄 Tricks
Of the Trade
![](https://media4.giphy.com/media/fsoCk5kgOcYMM/giphy.gif)
🎛️ ClickHouse optimizations
Team of experts
Open source model
Contribute everything upstream
🖇️ Async programming
async/await? Copied from JavaScript? Or .net?
Python servers: tornado, gunicorn, starlette
Fastest performance
🍱 Use types
Explicit types
Checkmypy every time
I hate it
Catches a lot of errors
🔌 C extensions
🙏 Thanks!
❓ Questions?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/7769/images/10778304/pasted-from-clipboard.png)
Ingesting Petabytes of Data at Tinybird
By Alex Fernández
Ingesting Petabytes of Data at Tinybird
Presentation for PyData Madrid, 2024-01-18
- 172