🐦 Ingesting Petabytes at Tinybird
PyData Madrid, 2024-01-18
🤳 Your Host Tonight
Alex Fernández "pinchito"
Silverback developer at Tinybird
🗂️ What we will see today
🤔 What does Tinybird do?
📑 Principles
🔧 Techniques
🪄 Tricks
🤔 What does Tinybird do?
And what does Tinybird not do?
🙅♀️ First, what does Tinybird not do?
It is not a ClickHouse wrapper
It does not store logs
It is not a DWH
🤌 So, what does Tinybird do?
Real time analytics
Process data at scale
Publish API endpoints
🗣️ Customer quotes
🙊 Non-secret monthly numbers
Ingest petabytes of data
Process many petabytes of data
Serve billions of requests
📑 Principles
How do we do it?
⏱️ Real real-time
From days, hours or minutes — to seconds
Process data as it comes
Reduce latencies
Blog: Real-Time Data Ingestion: The Foundation for Real-time Analytics
🕵️ Customer focus
Really close to customers
Direct communication channels
Everyone does customer support
Fast iteration
🏗️ Production centric
Most engineers do on-call
Everyone suffers the pain
Everyone Deploys
Top-notch Production Culture
🐶 Eat your own dog food
Use Tinybird as much as possible
Be your first user
⚡ Speed wins
Iterate fast
Communicate often
Don't wait for permission
Hardest principle to implement — and copy
🔧 Techniques
More to the Point
🗿 Monorepo
Everything in the same repo
Ingest, Backend, UI Together
Includes code, tests, docs, tooling, infra, CI itself
A controversial practice (Google, Facebook)
When it works, works great!
🛬 Continuous deployment
Deploy tens of times per day
Write → Test → Review → Merge → Deploy
Everything goes straight to production
🌐 HTTP interface
🚏 Requests vs events
Event
Request
🥅 Gather data
Pre-aggregate data
Aggregate ClickHouse operations
Fastest, more predictable response
📈 Be ahead
Customers increase their data overnight by 2x, 5x, 10x
Be ready for deluges
Look for the next bottleneck
🪄 Tricks
Of the Trade
🎛️ ClickHouse optimizations
Team of experts
Open source model
Contribute everything upstream
🖇️ Async programming
async/await? Copied from JavaScript? Or .net?
Python servers: tornado, gunicorn, starlette
Fastest performance
🍱 Use types
Explicit types
Checkmypy every time
I hate it
Catches a lot of errors
🔌 C extensions
🙏 Thanks!
❓ Questions?
Ingesting Petabytes of Data at Tinybird
By Alex Fernández
Ingesting Petabytes of Data at Tinybird
Presentation for PyData Madrid, 2024-01-18
- 253