Tools for
Human-Centric Machine Learning
@
Ville Tuulos
North Star AI, March 2019
a business
problem
predict
churn
a model
to predict
churn
data
a model
to predict
churn
data
model
data
transforms
data
model
data
transforms
results
data
model
data
transforms
results
compute
data
model
data
transforms
results
compute
schedule
action
data
data
transforms
results
compute
schedule
action
data
audits
model
model
audits
data
data
transforms
results
compute
schedule
action
data
audits
model
model
audits
data
transforms
data
audits
model
model
audits
versioning
Screenplay Analysis Using NLP
Fraud Detection
Title Portfolio Optimization
Estimate Word-of-Mouth Effects
Incremental Impact of Marketing
Classify Support Tickets
Predict Quality of Network
Content Valuation
Cluster Tweets
Intelligent Infrastructure
Machine Translation
Optimal CDN Caching
Predict Churn
Content Tagging
Optimize Production Schedules
Human-Centric?
data
model
data
transforms
results
compute
schedule
action
business
owner
data
scientist
product
engineer
data
engineer
ML
engineer
data
model
data
transforms
results
compute
schedule
action
business
owner
data
scientist
product
engineer
data
data
transforms
results
compute
schedule
action
data
audits
model
model
audits
data
transforms
data
audits
model
model
audits
versioning
data
scientist
data
data
transforms
results
compute
schedule
action
data
audits
model
model
audits
data
transforms
data
audits
model
model
audits
versioning
machine
learning
infrastructure
data
scientist
Metaflow
Notebooks: Nteract
Job Scheduler: Meson
Compute Resources: Titus
Query Engine: Spark
Data Lake: S3
{
data
compute
prototyping
ML Libraries: R, XGBoost, TF etc.
models
ML Wrapping: Metaflow
from metaflow import FlowSpec, step
class MyFlow(FlowSpec):
@step
def start(self):
self.next(self.a, self.b)
@step
def a(self):
self.next(self.join)
@step
def b(self):
self.next(self.join)
@step
def join(self, inputs):
self.next(self.end)
MyFlow()
start
How to structure my code?
B
A
join
end
start
How to prototype and test
my code locally?
B
A
join
end
x=0
x+=2
x+=3
max(A.x, B.x)
@step
def start(self):
self.x = 0
self.next(self.a, self.b)
@step
def a(self):
self.x += 2
self.next(self.join)
@step
def b(self):
self.x += 3
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
start
How to distribute work over
many parallel jobs?
A
join
end
@step
def start(self):
self.grid = [’x’,’y’,’z’]
self.next(self.a, foreach=’grid’)
@titus(memory=10000)
@step
def a(self):
self.x = ord(self.input)
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
start
How to get access to more CPUs,
GPUs, or memory?
B
A
join
end
@titus(cpu=16, gpu=1)
@step
def a(self):
tensorflow.train()
self.next(self.join)
@titus(memory=200000)
@step
def b(self):
massive_dataframe_operation()
self.next(self.join)
16 cores, 1GPU
200GB RAM
# Access Savin's runs
namespace('user:savin')
run = Flow('MyFlow').latest_run
print(run.id) # = 234
print(run.tags) # = ['unsampled_model']
# Access David's runs
namespace('user:david')
run = Flow('MyFlow').latest_run
print(run.id) # = 184
print(run.tags) # = ['sampled_model']
# Access everyone's runs
namespace(None)
run = Flow('MyFlow').latest_run
print(run.id) # = 184
start
How to version my results and
access results by others?
B
A
join
end
david: sampled_model
savin: unsampled_model
start
How to deploy my workflow to production?
B
A
join
end
start
How to monitor models and
examine results?
B
A
join
end
x=0
x+=2
x+=3
max(A.x, B.x)
...and much more to improve productivity of data scientists
- Deploy models as microservices
- Support for R
- High-throughput data access
- Flexible parametrization of workflows
- Isolated environments for 3rd party dependencies
- Slack bot!
1. Models are a tiny part of an end-to-end ML system.
2. With proper tooling, data scientists can own the system, end-to-end.
3. Design the tooling with a human-centric mindset.
Summary
To improve results of an ML system,
improve the productivity of humans who operate it.
Tools for Human-Centric Machine Learning Infrastructure north star ai 2019
By Ville Tuulos
Tools for Human-Centric Machine Learning Infrastructure north star ai 2019
North Star AI 2019
- 2,130