Ville Tuulos
QCon SF, November 2018
You are hired!
We need a dynamic pricing model.
Optimal pricing model
Great job!
The model works
perfectly!
Could you
predict churn
too?
Optimal pricing model
Optimal churn model
Alex's model
Good job again!
Promising results!
Can you include a causal attribution model for marketing?
Optimal pricing model
Optimal churn model
Alex's model
Attribution
model
Are you sure
these results
make sense?
You are hired!
Pricing model
Churn model
Attribution
model
VS
Screenplay Analysis Using NLP
Fraud Detection
Title Portfolio Optimization
Estimate Word-of-Mouth Effects
Incremental Impact of Marketing
Classify Support Tickets
Predict Quality of Network
Content Valuation
Cluster Tweets
Intelligent Infrastructure
Machine Translation
Optimal CDN Caching
Predict Churn
Content Tagging
Optimize Production Schedules
{
data
compute
prototyping
models
How to run at scale?
Custom Titus executor.
How to schedule the model to update daily?Learn about the job scheduler.
How to access data at scale?
Slow!
How to expose the model to a custom UI? Custom web backend.
Time to production:
4 months
How to monitor models in production?
How to iterate on a new version without breaking the production version?
How to let another data scientist iterate on her version of the model safely?
How to debug yesterday's failed production run?
How to backfill historical data?
How to make this faster?
{
data
compute
prototyping
models
def compute(input):
output = my_model(input)
return output
output
input
compute
# python myscript.py
from metaflow import FlowSpec, step
class MyFlow(FlowSpec):
@step
def start(self):
self.next(self.a, self.b)
@step
def a(self):
self.next(self.join)
@step
def b(self):
self.next(self.join)
@step
def join(self, inputs):
self.next(self.end)
MyFlow()
start
B
A
join
end
# python myscript.py run
metaflow("MyFlow") %>%
step(
step = "start",
next_step = c("a", "b")
) %>%
step(
step = "A",
r_function = r_function(a_func),
next_step = "join"
) %>%
step(
step = "B",
r_function = r_function(b_func),
next_step = "join"
) %>%
step(
step = "Join",
r_function = r_function(join,
join_step = TRUE),
start
B
A
join
end
# RScript myscript.R
134 projects on Metaflow
as of November 2018
start
B
A
join
end
# python myscript.py resume B
x=0
x+=2
x+=3
max(A.x, B.x)
@step
def start(self):
self.x = 0
self.next(self.a, self.b)
@step
def a(self):
self.x += 2
self.next(self.join)
@step
def b(self):
self.x += 3
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
start
B
A
join
end
@titus(cpu=16, gpu=1)
@step
def a(self):
tensorflow.train()
self.next(self.join)
@titus(memory=200000)
@step
def b(self):
massive_dataframe_operation()
self.next(self.join)
16 cores, 1GPU
200GB RAM
# python myscript.py run
start
A
join
end
@step
def start(self):
self.grid = [’x’,’y’,’z’]
self.next(self.a, foreach=’grid’)
@titus(memory=10000)
@step
def a(self):
self.x = ord(self.input)
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
from metaflow import Table
@titus(memory=200000, network=20000)
@step
def b(self):
# Load data from S3 to a dataframe
# at 10Gbps
df = Table('vtuulos', 'input_table')
self.next(self.end)
start
B
A
join
end
S3
1. Build a separate model for every new title with marketing spend.
Parallel foreach.
2. Load and prepare input data for each model.
Download Parquet directly from S3.
Total amount of model input data: 890GB.
3. Fit a model.
Train each model on an instance with 400GB of RAM, 16 cores.
The model is written in R.
4. Share updated results.
Collect results of individual models, write to a table.
Results shown on a Tableau dashboard.
# Access Savin's runs
namespace('user:savin')
run = Flow('MyFlow').latest_run
print(run.id) # = 234
print(run.tags) # = ['unsampled_model']
# Access David's runs
namespace('user:david')
run = Flow('MyFlow').latest_run
print(run.id) # = 184
print(run.tags) # = ['sampled_model']
# Access everyone's runs
namespace(None)
run = Flow('MyFlow').latest_run
print(run.id) # = 184
start
B
A
join
end
david: sampled_model
savin: unsampled_model
start
B
A
join
end
#python myscript.py meson create
start
B
A
join
end
x=0
x+=2
x+=3
max(A.x, B.x)
start
B
A
join
end
x=0
x+=2
x+=3
max(A.x, B.x)
Metaflow
hosting
from metaflow import WebServiceSpec
from metaflow import endpoint
class MyWebService(WebServiceSpec):
@endpoint
def show_data(self, request_dict):
# TODO: real-time predict here
result = self.artifacts.flow.x
return {'result': result}
# curl http://host/show_data
{"result": 3}{
1. Batch optimize launch date schedules for new titles daily.
Batch optimization deployed on Meson.
2. Serve results through a custom UI.
Results deployed on Metaflow Hosting.
3. Support arbitrary what-if scenarios in the custom UI.
Run optimizer in real-time in a custom web endpoint.
Bruno Coldiori
https://www.flickr.com/photos/br1dotcom/8900102170/
https://www.maxpixel.net/Isolated-Animal-Hundeportrait-Dog-Nature-3234285