Kubeflow
Data Science on Steroids
About us
saschagrunert
mail@
.de
About us
mbu93
The Evolution of Machine Learning
1950
Stochastic Neural Analog Reinforcement Calculator (SNARC) Maze Solver
2000
Data Science
Workflow
Data Source Handling
fixed set of technologies
Exploration
Regression?
Classification?
Supervised?
Unsupervised?
Baseline Modelling
Results
Trained Model
Results to share
Results
Deployment
Automation?
Infastructure?
Data Exploration
- select the data source
- collect statistical measures
- data cleaning
- typically done in notebook systems such as jupyter
import dill
with open("dataframe.dill", "rb") as fp:
dataframe = dill.load(fp)
dataframe.sample(frac=1).head(10)
Data Munging
- creating datasets
- preparing data for cross validation
from fastai.vision import ImageDataBunch
dataset = ImageDataBunch.from_folder("dataset",
train="training",
valid="test")
dataset
ImageDataBunch;
Train: LabelList (3040 items)
x: ImageList
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33)
y: CategoryList
3,3,3,3,3
Path: dataset;
Valid: LabelList (760 items)
x: ImageList
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33)
y: CategoryList
3,3,3,3,3
Path: dataset;
Test: None
Baseline modelling
- “Do we need supervised or unsupervised learning?”
- "Can we solve the problem with classification or regression?”
- most of the time experimental work
from fastai.vision import *
res18_learner = cnn_learner(dataset,
models.resnet18,
pretrained=False,
metrics=accuracy)
res18_learner.fit_one_cycle(5)
Evaluation and deployment
- check the models performance
- select and deploy a model
- create automated training process to update data and model
interp = res18_learner.interpret()
interp.plot_confusion_matrix(figsize=(8, 8))
Yet Another Machine Learning Solution?
YES!
Cloud Native
Commercially available of-the-shelf (COTS) applications
The awareness of an application to run inside something like Kubernetes.
vs
“Can we improve
the classic data scientists workflow
by utilizing Kubernetes?”
announced 2017
abstracting machine learning best practices
Deployment and Test Setup
SUSE CaaS Platform
github.com/SUSE/skuba
> skuba cluster init --control-plane 172.172.172.7 caasp-cluster
> cd caasp-cluster
> skuba node bootstrap --target 172.172.172.7 caasp-master
> skuba node join --role worker --target 172.172.172.8 caasp-node-1
> skuba node join --role worker --target 172.172.172.24 caasp-node-2
> skuba node join --role worker --target 172.172.172.16 caasp-node-3
> cp admin.conf ~/.kube/config
> kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
caasp-master Ready master 2h v1.15.2 172.172.172.7 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-1 Ready <none> 2h v1.15.2 172.172.172.8 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-2 Ready <none> 2h v1.15.2 172.172.172.24 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-3 Ready <none> 2h v1.15.2 172.172.172.16 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
Storage Provisioner
> helm install nfs-client-provisioner \
-n kube-system \
--set nfs.server=caasp-node-1 \
--set nfs.path=/mnt/nfs \
--set storageClass.name=nfs \
--set storageClass.defaultClass=true \
stable/nfs-client-provisioner
> kubectl -n kube-system get pods -l app=nfs-client-provisioner -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfs-client-provisioner-777997bc46-mls5w 1/1 Running 3 2h 10.244.0.91 caasp-node-1 <none> <none>
Load Balancing
> kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 172.172.172.251-172.172.172.251
> kubectl apply -f metallb-config.yml
Deploying Kubeflow
> kfctl init kfapp --config=https://raw.githubusercontent.com/kubeflow/kubeflow/1fa142f8a5c355b5395ec0e91280d32d76eccdce/bootstrap/config/kfctl_existing_arrikto.yaml
> export KUBEFLOW_USER_EMAIL="sgrunert@suse.com"
> export KUBEFLOW_PASSWORD="my-strong-password"
> cd kfapp
> kfctl generate all -V
> kfctl apply all -V
available command line tool called kfctl
Improving the Data Scientists Workflow
Machine Learning Pipelines
- Kubeflow provides reusable end-to-end machine learning workflows via pipelines
- pipeline components are built using Kubeflows Python SDK
- Every pipeline step is executed directly in Kubernetes within its own pod
Basic component using ContainerOp
- simply executes a given command within a container
- input and output can be passed around
from kfp.dsl import ContainerOp
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
Execution order
- Execution order can be made dependent
-
Arange components using op1.after(op2) syntax
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
Creating a pipeline
-
Pipelines are created using the @pipeline decorator and compiled afterwards via compile()
from kfp.compiler import Compiler
from kfp.dsl import ContainerOp, pipeline
@pipeline(name='My pipeline', description='')
def pipeline():
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
if __name__ == '__main__':
Compiler().compile(pipeline)
When should I use ContainerOp?
- Useful for deployment tasks
- Good option to query for new data
- Running of complex training scenarios (make use of training scripts)
Lightweight Python Components
- Directly run python functions in a container
- Limitations:
- need to be standalone (can't use code, imports or variables defined in another scope)
- Imported packages need to be available in the container image.
- Input parameters need to be set to str (default), int or float
Example: Sum of squares
- Code will be processed as a string and passed to the interpreter as below
from kfp.components import func_to_container_op
def sum_sq(a: int, b: int) -> int:
import numpy
dist = np.sqrt(a**2 + b**2)
return dist
add_op = func_to_container_op(sum_sq)(1, 2)
python -u -c "def sum_sq(a: int, b:int):\n\
import numpy\n\
dist = np.sqrt(a**2 + b**2)\n\
return dist\n(...)"
The need for storage
- Lightweight components only accept str, float or int inputs
- To pass objects around they must be stored/loaded
- For that purpose a volume must be attached
op = func_to_container_op(fit_squeezenet,
base_image="alpine:latest")("out/model", False,
"squeeze.model")
op.add_volume(
k8s.V1Volume(name='volume',
host_path=k8s.V1HostPathVolumeSource(
path='/data/out'))).add_volume_mount(
k8s.V1VolumeMount(name='volume', mount_path="out"))
Up- and Downsides of Lightweight Components
- Very useful to run simple functions
- Can reduce the effort to create pipelines
- May not be used for complex functions
- Practical alternative: Google Cloud Platforms options
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> sudo pip install https://storage.googleapis.com/ml-pipeline/release/0.1.27/kfp.tar.gz
> dsl-compile --py pipeline.py --output pipeline.tar.gz
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> kubectl get workflow
NAME AGE
my-pipeline-8lwcc 6m47s
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> argo get my-pipeline-8lwcc
Name: my-pipeline-8lwcc
Namespace: kubeflow
ServiceAccount: pipeline-runner
Status: Succeeded
Created: Tue Aug 27 13:06:06 +0200 (4 minutes ago)
Started: Tue Aug 27 13:06:06 +0200 (4 minutes ago)
Finished: Tue Aug 27 13:06:22 +0200 (4 minutes ago)
Duration: 16 seconds
STEP PODNAME DURATION MESSAGE
✔ my-pipeline-8lwcc
├-✔ step1 my-pipeline-8lwcc-818794353 6s
└-✔ step2 my-pipeline-8lwcc-768461496 7s
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
from kfp.compiler import Compiler
pipeline_func = training_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.yaml'
Compiler().compile(pipeline_func, pipeline_filename)
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
import kfp
client = kfp.Client()
try:
experiment = client.create_experiment("Prototyping")
except Exception:
experiment = client.get_experiment(experiment_name="Prototyping")
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
arguments = {'pretrained': 'False'}
run_name = pipeline_func.__name__ + ' test_run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)
Continuous Integration Lookout
Kubeflow provides a REST API
That’s it.
https://github.com/saschagrunert/
kubeflow-data-science-on-steroids
Kubeflow - Data Science on Steroids
By Sascha Grunert
Kubeflow - Data Science on Steroids
A presentation about Kubeflow
- 1,423