Data Engineering for Startups

Our journey and lessons learned

John McKim

VP of Product & Technology

A Cloud Guru


What is A Cloud Guru

Cloud Training for Organisations and Engineers

ACG in 2017

Small team and Growing Fast 

Series A

Oh shit. We need Metrics.

Data Eng v1

No experience. No problem

Data Eng v1

DynamoDB, Redshift and Segment

Data Eng v1

Not fun.

Time to Level Up

We need help

Levelling up


  • Stakeholders - unsure what they want
  • Missing many data sources
    • Firebase, Salesforce, Zendesk, Braintree, Hubspot
  • Modelling - non-existent

Data Eng V2

Starting with Firebase

Data Eng v2


Firebase Data Pipeline

Data Eng v2

Adopting Fivetran

Data Eng v2


Data Replication Pipeline


  • Pre-built connectors
  • Supports many SaaS services
  • Supports many Warehouses


  • "Guaranteed data delivery" ...

Data Eng v2

Using DBT for Modelling

Data Eng v2

Build Models on your Data

  • SQL Models
  • Reference other models
  • Materialise models as tables or views
{{ config(materialized='table',
    sort = 'full_date',
    dist = 'full_date') }}
    created_at::date as full_date,
    count(case score when 'offered' then 1 else null end) as surveys_sent,
    count(case score when 'offered' then null else 1 end) as responses,
    count(case score when 'good' then 1 else null end) as good_ratings,
    count(case score when 'bad' then 1 else null end) as bad_ratings
from {{ref('dim_zendesk_satisfaction_rating')}}
group by 1,2,3

Data Eng v2

Done ?

Data Eng v2

Maybe not.

Fivetran Reliabilty

Incident Timeline

  • Historical reliability issue detected
  • First historical occurrence - May 2018*
  • First reported - 31 Oct 2018
  • Sad times...
  • Infra fixed - 3rd Jan
  • Data re-synced - 16th Jan

Fivetran Reliability

Incident metrics

Severity: Production Impacted

MTTD: > 5 months

MTTR: > 10 weeks


Semantic Layer

Build Performance

  • Increasing DBT build times & failures
  • Builds > 4 hrs for 150 models
  • Errors indicating deadlocks

Redshift Performance

Ever slowing queries

  • Up to 90sec query planning
  • 90 sec * 150 models = 3.75 hrs per run on query planning alone

Redshift Performance


Redshift Performance

Break through

  • Increased Cluster size as a hail mary
  • Performance improved drastically & degraded
  • Tested another reboot & saw same result

Orange = Query Planning

Redshift Performance

No answers. Only suspicions.

  • Bad Redshift node
  • Segment - COPY command
  • ...
  • Any ideas?

Data Eng - Future

New Replication Service & Data Lake

Lessons Learned

Changing role of Data Eng

  • Cloud provides - managed data infrastructure & pre-built connectors
  • Outsourcing ETL is will have a big on data eng
    • designing, managing and optimizing core data infrastructure
    • building and maintaining custom ingestion pipelines

Lessons Learned

Outsourcing has challenges

  • DON'T build because we had a bad experience
  • BUT, Cloud service selection is important
    • Look for SLAs in contracts
    • Understand your risk
    • Have a Business Continuity Plan


Thanks for Listening!



By John McKim


  • 129