Kubeflow
Data Science on Steroids
About Me
saschagrunert
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/5801520/GitHub-Mark-Light-120px-plus.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/5801555/6BEBD8D9-61EA-46D1-94DB-C74230E48925.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/5801560/891AD106-5C76-41AE-BE10-CA666600BF9B.png)
mail@
.de
The Evolution of Machine Learning
1950
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496588/snarc.png)
Stochastic Neural Analog Reinforcement Calculator (SNARC) Maze Solver
2000
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496609/torch.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496608/tf.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496605/pandas-logo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496606/pytorch-logo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496604/numpy.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496607/python-logo.png)
Data Science
Workflow
Data Source Handling
fixed set of technologies
changing set of input data
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496622/jupyter.png)
Exploration
Regression?
Classification?
Supervised?
Unsupervised?
Baseline Modelling
Results
Trained Model
Results to share
Results
Deployment
Automation?
Infastructure?
Yet Another Machine Learning Solution?
YES!
Cloud Native
Commercially available of-the-shelf (COTS) applications
The awareness of an application to run inside something like Kubernetes.
vs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496699/kubernetes-logo.png)
Can we improve
the classic data scientists workflow
by utilizing Kubernetes?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496717/kubeflow-logo.png)
announced 2017
abstracting machine learning best practices
Deployment
and
Test Setup
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496735/suse.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6229087/crio-logo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496771/logo.svg.png)
SUSE CaaS Platform
github.com/SUSE/skuba
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496820/openstack.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496735/suse.png)
> skuba cluster init --control-plane 172.172.172.7 caasp-cluster
> cd caasp-cluster
> skuba node bootstrap --target 172.172.172.7 caasp-master
> skuba node join --role worker --target 172.172.172.8 caasp-node-1
> skuba node join --role worker --target 172.172.172.24 caasp-node-2
> skuba node join --role worker --target 172.172.172.16 caasp-node-3
> cp admin.conf ~/.kube/config
> kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
caasp-master Ready master 2h v1.15.2 172.172.172.7 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-1 Ready <none> 2h v1.15.2 172.172.172.8 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-2 Ready <none> 2h v1.15.2 172.172.172.24 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-3 Ready <none> 2h v1.15.2 172.172.172.16 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
Storage Provisioner
> helm install nfs-client-provisioner \
-n kube-system \
--set nfs.server=caasp-node-1 \
--set nfs.path=/mnt/nfs \
--set storageClass.name=nfs \
--set storageClass.defaultClass=true \
stable/nfs-client-provisioner
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496856/helm-horizontal-white.png)
> kubectl -n kube-system get pods -l app=nfs-client-provisioner -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfs-client-provisioner-777997bc46-mls5w 1/1 Running 3 2h 10.244.0.91 caasp-node-1 <none> <none>
Load Balancing
> kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 172.172.172.251-172.172.172.251
> kubectl apply -f metallb-config.yml
Deploying Kubeflow
> wget -O kfctl_existing_arrikto.yaml \
https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_existing_arrikto.0.7.0.yaml
> kfctl apply -V -f kfctl_existing_arrikto.yaml
available command line tool called kfctl
Credentials for the default user are
admin@kubeflow.org:12341234
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1005350/images/6496901/kfctl_existing_arrikto-architecture.png)
Improving the Data Scientists Workflow
Machine Learning Pipelines
reusable end-to-end machine learning workflows via pipelines
separate Python SDK
pipeline steps are executed directly in Kubernetes
within its own pod
A first pipeline step
executes a given command within a container
input and output can be passed around
from kfp.dsl import ContainerOp
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
Execution order
Execution order can be made dependent
Arange components using step2.after(step1)
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
Creating a pipeline
Pipelines are created using the @pipeline
decorator and compiled afterwards via compile()
from kfp.compiler import Compiler
from kfp.dsl import ContainerOp, pipeline
@pipeline(name='My pipeline', description='')
def pipeline():
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
if __name__ == '__main__':
Compiler().compile(pipeline)
When to use ContainerOp?
Useful for deployment tasks
Running of complex training scenarios
(make use of training scripts)
Getting the pipeline to run
Pipelines are compiled via dsl-compile
> sudo pip install \
https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz
> dsl-compile --py pipeline.py --output pipeline.tar.gz
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be useful to save notebook resources
- Enables developers to combine prototyping with creating an automated training workflow
from kfp.compiler import Compiler
# Setup the pipeline
pipeline_func = training_pipelinep
pipeline_filename = pipeline_func.__name__ + '.pipeline.yaml'
# Compile it
Compiler().compile(pipeline_func, pipeline_filename)
Running pipelines from Notebooks
import kfp
client = kfp.Client()
try:
experiment = client.create_experiment("Prototyping")
except Exception:
experiment = client.get_experiment(experiment_name="Prototyping")
arguments = {'pretrained': 'False'}
run_name = pipeline_func.__name__ + ' test_run'
run_result = client.run_pipeline(
experiment.id,
run_name,
pipeline_filename,
arguments)
Continuous Integration
Kubeflow provides a REST API
That’s it.
https://github.com/saschagrunert/
kubeflow-data-science-on-steroids
https://slides.com/saschagrunert/
kubeflow-containerday-2019
Kubeflow - Data Science on Steroids (ContainerDay.it 2019)
By Sascha Grunert
Kubeflow - Data Science on Steroids (ContainerDay.it 2019)
A presentation about Kubeflow
- 1,221