Faster, Cheaper, Easier: Choose 3-
Automatically Optimize dbt at Scale
Overview
# Overview
# Overview
# How it works
# How it Works
# Why dbt?
# 1. Query Incrementalization
# Query Incrementalization
select * from {{ ref('events') }}
{% if is_incremental() %}
where event_time >= (select max(event_timestamp)::date from {{ this }})
{% endif %}# Query Incrementalization
# Query Incrementalization
This gets a lot more complicated.
# Query Incrementalization
# Query Incrementalization
# Query Incrementalization
# Scheduling
# Scheduling
# Scheduling
# Scheduling
# Scheduling
Default Snowflake Runtime:
6 minutes on 1 warehouse
# Scheduling
# Scheduling
Optimized Runtime:
3 minutes on 4 clusters
# Scheduling
Békéski & Galambos
“A 5/4 Linear Time Bin Packing Algorithm”
# Scheduling
We're seeing speed-ups of 30% on some customer workloads.
# Scheduling
We use LLMs to predict query preprocessing time, runtime, scaling, and concurrency
Runtime predictions are obviously helpful for scheduling. They also tell us which queries to incrementalize.
# Scheduling
We use LLMs to predict query preprocessing time, runtime, scaling, and concurrency.
Scaling (i.e. runtime on different-size clusters) and concurrency (i.e. runtime with multiple queries per cluster) let us reduce runtime and increase utilization.
# Scheduling
We’re building towards an ML-powered runtime scheduler, not just an offline job scheduler.
This will work kind of like the borg scheduler or the kubernetes scheduler, but with ML-inferred resource annotations.
# Snowflake dbt Optimization
If you only take one thing away from this talk:
Increase your dbt threadcount!
# Snowflake dbt Optimization
Snowflake will queue work for you.
No downside for standard warehouses.
Can increase costs for multicluster warehouses
This can increase speed as much as increasing your warehouse a size, and for free.