Kubeflow
Data Science on Steroids
About Me
saschagrunert
mail@
.de
The Evolution of Machine Learning
1950
Stochastic Neural Analog Reinforcement Calculator (SNARC) Maze Solver
2000
Data Science
Workflow
Data Source Handling
fixed set of technologies
changing set of input data
Exploration
Regression?
Classification?
Supervised?
Unsupervised?
Baseline Modelling
Results
Trained Model
Results to share
Results
Deployment
Automation?
Infastructure?
Yet Another Machine Learning Solution?
YES!
Cloud Native
Commercially available of-the-shelf (COTS) applications
The awareness of an application to run inside something like Kubernetes.
vs
Can we improve
the classic data scientists workflow
by utilizing Kubernetes?
announced 2017
abstracting machine learning best practices
Deployment
and
Test Setup
SUSE CaaS Platform
github.com/SUSE/skuba
> skuba cluster init --control-plane 172.172.172.7 caasp-cluster
> cd caasp-cluster
> skuba node bootstrap --target 172.172.172.7 caasp-master
> skuba node join --role worker --target 172.172.172.8 caasp-node-1
> skuba node join --role worker --target 172.172.172.24 caasp-node-2
> skuba node join --role worker --target 172.172.172.16 caasp-node-3
> cp admin.conf ~/.kube/config
> kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
caasp-master Ready master 2h v1.15.2 172.172.172.7 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-1 Ready <none> 2h v1.15.2 172.172.172.8 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-2 Ready <none> 2h v1.15.2 172.172.172.24 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-3 Ready <none> 2h v1.15.2 172.172.172.16 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
Storage Provisioner
> helm install nfs-client-provisioner \
-n kube-system \
--set nfs.server=caasp-node-1 \
--set nfs.path=/mnt/nfs \
--set storageClass.name=nfs \
--set storageClass.defaultClass=true \
stable/nfs-client-provisioner
> kubectl -n kube-system get pods -l app=nfs-client-provisioner -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfs-client-provisioner-777997bc46-mls5w 1/1 Running 3 2h 10.244.0.91 caasp-node-1 <none> <none>
Load Balancing
> kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 172.172.172.251-172.172.172.251
> kubectl apply -f metallb-config.yml
Deploying Kubeflow
> wget -O kfctl_existing_arrikto.yaml \
https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_existing_arrikto.0.7.0.yaml
> kfctl apply -V -f kfctl_existing_arrikto.yaml
available command line tool called kfctl
Credentials for the default user are
admin@kubeflow.org:12341234
Improving the Data Scientists Workflow
Machine Learning Pipelines
reusable end-to-end machine learning workflows via pipelines
separate Python SDK
pipeline steps are executed directly in Kubernetes
within its own pod
A first pipeline step
executes a given command within a container
input and output can be passed around
from kfp.dsl import ContainerOp
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
Execution order
Execution order can be made dependent
Arange components using step2.after(step1)
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
Creating a pipeline
Pipelines are created using the @pipeline
decorator and compiled afterwards via compile()
from kfp.compiler import Compiler
from kfp.dsl import ContainerOp, pipeline
@pipeline(name='My pipeline', description='')
def pipeline():
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
if __name__ == '__main__':
Compiler().compile(pipeline)
When to use ContainerOp?
Useful for deployment tasks
Running of complex training scenarios
(make use of training scripts)
Getting the pipeline to run
Pipelines are compiled via dsl-compile
> sudo pip install \
https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz
> dsl-compile --py pipeline.py --output pipeline.tar.gz
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be useful to save notebook resources
- Enables developers to combine prototyping with creating an automated training workflow
from kfp.compiler import Compiler
# Setup the pipeline
pipeline_func = training_pipelinep
pipeline_filename = pipeline_func.__name__ + '.pipeline.yaml'
# Compile it
Compiler().compile(pipeline_func, pipeline_filename)
Running pipelines from Notebooks
import kfp
client = kfp.Client()
try:
experiment = client.create_experiment("Prototyping")
except Exception:
experiment = client.get_experiment(experiment_name="Prototyping")
arguments = {'pretrained': 'False'}
run_name = pipeline_func.__name__ + ' test_run'
run_result = client.run_pipeline(
experiment.id,
run_name,
pipeline_filename,
arguments)
Continuous Integration
Kubeflow provides a REST API
That’s it.
https://github.com/saschagrunert/
kubeflow-data-science-on-steroids
https://slides.com/saschagrunert/
kubeflow-containerday-2019
Kubeflow - Data Science on Steroids (ContainerDay.it 2019)
By Sascha Grunert
Kubeflow - Data Science on Steroids (ContainerDay.it 2019)
A presentation about Kubeflow
- 1,291