ADAR 2020

Ville Tuulos

Machine Learning Infrastructure @ Netflix

Infrastructure Stack for Modern Data Science

with

a business

problem

predict

churn

a model

to predict

churn

data

a model

to predict

churn

data

model

data

transforms

data

model

data

transforms

results

data

model

data

transforms

results

compute

data

model

data

transforms

results

compute

schedule

action

data

data

transforms

results

compute

schedule

action

data

audits

model

model

audits

data

data

transforms

results

compute

schedule

action

data

audits

model

model

audits

data

transforms

data

audits

model

model

audits

versioning & tracking

Screenplay Analysis Using NLP

Fraud Detection

Title Portfolio Optimization

Estimate Word-of-Mouth Effects

Incremental Impact of Marketing

Classify Support Tickets

Predict Quality of Network

Content Valuation

Cluster Tweets

Intelligent Infrastructure

Machine Translation

Optimal CDN Caching

Predict Churn

Content Tagging

Optimize Production Schedules

Infrastructure Stack for Modern Data Science

Model Development
Feature Engineering
Model Operations
Versioning
Architecture
Orchestration
Compute
Data
Model Development
Feature Engineering
Model Operations
Versioning
Architecture
Orchestration
Compute
Data

Infrastructure Stack for Modern Data Science

How much data scientist cares

How much data scientist cares

How much infrastructure is needed

Model Development
Feature Engineering
Model Operations
Versioning
Architecture
Orchestration
Compute
Data

Human-Centric Infrastructure Stack for Modern Data Science

From Prototype to Production And Back