Florian Dambrine - Principal Engineer
Click to trigger!
(Directed Acyclic Graph)
How tasks are being executed by Airflow
Run one task at a time on the Airflow instance (development purposes)
Run multiple tasks at a time on the Airflow instance (pre-forking model / vertical scaling)
Kick off Kubernetes pods to execute tasks and cleans up automatically on job completion (dynamic horizontal scaling)
Delegate tasks runtime to Celery workers. Requires a message broker like Redis (common in production, horizontal scaling)
# XComs (Cross-Communications)
Let tasks exchange messages. XComs are made of key, value, timestamp and task/dag info. They can be pushed or pulled
The connection information to external systems is stored in the Airflow metadata database and managed in the UI
In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.
Airflow offers a generic toolbox for working with data. Using Airflow plugins can be a way for companies to customize their Airflow installation to reflect their ecosystem.
Focus on your Business logic and leverage operators to glue things together...
Send message to SQS
Submits tasks to Druid
Execute command inside container
Perform actions on Jira
Send notification to Slack
Submits a Spark job to Databricks
S3 > local > transform > S3
Run SparkSql queries
Load files from S3 to RS
Execute SQL query