Kubeflow
Data Science on Steroids
About us
saschagrunert
data:image/s3,"s3://crabby-images/6bd99/6bd99cc33914b9217c263fc79d6efa82d17fcfeb" alt=""
data:image/s3,"s3://crabby-images/d5938/d5938056bef392a76b190cb10261912b68d3083c" alt=""
data:image/s3,"s3://crabby-images/e3106/e3106c9a45f3ce87c63de68ab85e2a702784692e" alt=""
mail@
.de
About us
mbu93
data:image/s3,"s3://crabby-images/6bd99/6bd99cc33914b9217c263fc79d6efa82d17fcfeb" alt=""
The Evolution of Machine Learning
1950
data:image/s3,"s3://crabby-images/08947/0894737f1908c34345bc0d682b01798ccff62137" alt=""
Stochastic Neural Analog Reinforcement Calculator (SNARC) Maze Solver
2000
data:image/s3,"s3://crabby-images/921dd/921dddca6af84a21d794401e9194eddff1832899" alt=""
data:image/s3,"s3://crabby-images/6b1a9/6b1a98c4ecd92f6dadab7105c1fbe52839a94b53" alt=""
data:image/s3,"s3://crabby-images/c9eff/c9effb61fddd80c0e46d2d608405840a97c995a4" alt=""
data:image/s3,"s3://crabby-images/8d4f0/8d4f07e5c29a35c8700378d273ed0abdd603726a" alt=""
data:image/s3,"s3://crabby-images/4fafa/4faface755703438fdfd35f241122f86d1307ead" alt=""
data:image/s3,"s3://crabby-images/88a90/88a903cfc0177a8f93885cffb51972579b4b5588" alt=""
Data Science
Workflow
Data Source Handling
fixed set of technologies
data:image/s3,"s3://crabby-images/83fc4/83fc44e4448e7cdd078e518090b2336910eee0e1" alt=""
Exploration
Regression?
Classification?
Supervised?
Unsupervised?
Baseline Modelling
Results
Trained Model
Results to share
Results
Deployment
Automation?
Infastructure?
Data Exploration
- select the data source
- collect statistical measures
- data cleaning
- typically done in notebook systems such as jupyter
import dill
with open("dataframe.dill", "rb") as fp:
dataframe = dill.load(fp)
dataframe.sample(frac=1).head(10)
data:image/s3,"s3://crabby-images/d2a01/d2a019d2ba2178b2ea65e36e6c470d9e906a6eea" alt=""
Data Munging
- creating datasets
- preparing data for cross validation
from fastai.vision import ImageDataBunch
dataset = ImageDataBunch.from_folder("dataset",
train="training",
valid="test")
dataset
ImageDataBunch;
Train: LabelList (3040 items)
x: ImageList
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33)
y: CategoryList
3,3,3,3,3
Path: dataset;
Valid: LabelList (760 items)
x: ImageList
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33), Image (3, 17, 33),
Image (3, 17, 33)
y: CategoryList
3,3,3,3,3
Path: dataset;
Test: None
Baseline modelling
- “Do we need supervised or unsupervised learning?”
- "Can we solve the problem with classification or regression?”
- most of the time experimental work
from fastai.vision import *
res18_learner = cnn_learner(dataset,
models.resnet18,
pretrained=False,
metrics=accuracy)
res18_learner.fit_one_cycle(5)
data:image/s3,"s3://crabby-images/b7657/b7657f1dc453814c7ba4af91165bf13d58209395" alt=""
Evaluation and deployment
- check the models performance
- select and deploy a model
- create automated training process to update data and model
interp = res18_learner.interpret()
interp.plot_confusion_matrix(figsize=(8, 8))
data:image/s3,"s3://crabby-images/d0a01/d0a0131d7f729c67507c3d59f4ef123bb5050ad6" alt=""
Yet Another Machine Learning Solution?
YES!
Cloud Native
Commercially available of-the-shelf (COTS) applications
The awareness of an application to run inside something like Kubernetes.
vs
data:image/s3,"s3://crabby-images/41c9d/41c9de78a322e3aa087c32fbd1934f464d067af1" alt=""
“Can we improve
the classic data scientists workflow
by utilizing Kubernetes?”
data:image/s3,"s3://crabby-images/dc1b5/dc1b505667ebe29af3a5bc2a24e7b8d140a275cf" alt=""
announced 2017
abstracting machine learning best practices
Deployment and Test Setup
data:image/s3,"s3://crabby-images/b8f6b/b8f6b32ebb9d672062bb6444fa7e404e2c11a27a" alt=""
data:image/s3,"s3://crabby-images/db965/db96513c9023a23839eae44b6499c0a0ece4288a" alt=""
data:image/s3,"s3://crabby-images/bc079/bc0796a0bda7687352251f96f0a048ac87ce0fdb" alt=""
SUSE CaaS Platform
github.com/SUSE/skuba
data:image/s3,"s3://crabby-images/5cce5/5cce5bd28de21f87a32e1b53539729e9980b0791" alt=""
data:image/s3,"s3://crabby-images/b8f6b/b8f6b32ebb9d672062bb6444fa7e404e2c11a27a" alt=""
> skuba cluster init --control-plane 172.172.172.7 caasp-cluster
> cd caasp-cluster
> skuba node bootstrap --target 172.172.172.7 caasp-master
> skuba node join --role worker --target 172.172.172.8 caasp-node-1
> skuba node join --role worker --target 172.172.172.24 caasp-node-2
> skuba node join --role worker --target 172.172.172.16 caasp-node-3
> cp admin.conf ~/.kube/config
> kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
caasp-master Ready master 2h v1.15.2 172.172.172.7 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-1 Ready <none> 2h v1.15.2 172.172.172.8 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-2 Ready <none> 2h v1.15.2 172.172.172.24 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
caasp-node-3 Ready <none> 2h v1.15.2 172.172.172.16 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.15-default cri-o://1.15.0
Storage Provisioner
> helm install nfs-client-provisioner \
-n kube-system \
--set nfs.server=caasp-node-1 \
--set nfs.path=/mnt/nfs \
--set storageClass.name=nfs \
--set storageClass.defaultClass=true \
stable/nfs-client-provisioner
data:image/s3,"s3://crabby-images/fc871/fc871f82427851ad59a715e01a9ecf29e41ad86b" alt=""
> kubectl -n kube-system get pods -l app=nfs-client-provisioner -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfs-client-provisioner-777997bc46-mls5w 1/1 Running 3 2h 10.244.0.91 caasp-node-1 <none> <none>
Load Balancing
> kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 172.172.172.251-172.172.172.251
> kubectl apply -f metallb-config.yml
Deploying Kubeflow
> kfctl init kfapp --config=https://raw.githubusercontent.com/kubeflow/kubeflow/1fa142f8a5c355b5395ec0e91280d32d76eccdce/bootstrap/config/kfctl_existing_arrikto.yaml
> export KUBEFLOW_USER_EMAIL="sgrunert@suse.com"
> export KUBEFLOW_PASSWORD="my-strong-password"
> cd kfapp
> kfctl generate all -V
> kfctl apply all -V
available command line tool called kfctl
data:image/s3,"s3://crabby-images/c42b8/c42b85d2a192a1ff386e159f1de6f80b00225b92" alt=""
Improving the Data Scientists Workflow
Machine Learning Pipelines
- Kubeflow provides reusable end-to-end machine learning workflows via pipelines
- pipeline components are built using Kubeflows Python SDK
- Every pipeline step is executed directly in Kubernetes within its own pod
Basic component using ContainerOp
- simply executes a given command within a container
- input and output can be passed around
from kfp.dsl import ContainerOp
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
Execution order
- Execution order can be made dependent
-
Arange components using op1.after(op2) syntax
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
Creating a pipeline
-
Pipelines are created using the @pipeline decorator and compiled afterwards via compile()
from kfp.compiler import Compiler
from kfp.dsl import ContainerOp, pipeline
@pipeline(name='My pipeline', description='')
def pipeline():
step1 = ContainerOp(name='step1',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2 = ContainerOp(name='step2',
image='alpine:latest',
command=['sh', '-c'],
arguments=['echo "Running step"'])
step2.after(step1)
if __name__ == '__main__':
Compiler().compile(pipeline)
When should I use ContainerOp?
- Useful for deployment tasks
- Good option to query for new data
- Running of complex training scenarios (make use of training scripts)
Lightweight Python Components
- Directly run python functions in a container
- Limitations:
- need to be standalone (can't use code, imports or variables defined in another scope)
- Imported packages need to be available in the container image.
- Input parameters need to be set to str (default), int or float
Example: Sum of squares
- Code will be processed as a string and passed to the interpreter as below
from kfp.components import func_to_container_op
def sum_sq(a: int, b: int) -> int:
import numpy
dist = np.sqrt(a**2 + b**2)
return dist
add_op = func_to_container_op(sum_sq)(1, 2)
python -u -c "def sum_sq(a: int, b:int):\n\
import numpy\n\
dist = np.sqrt(a**2 + b**2)\n\
return dist\n(...)"
The need for storage
- Lightweight components only accept str, float or int inputs
- To pass objects around they must be stored/loaded
- For that purpose a volume must be attached
op = func_to_container_op(fit_squeezenet,
base_image="alpine:latest")("out/model", False,
"squeeze.model")
op.add_volume(
k8s.V1Volume(name='volume',
host_path=k8s.V1HostPathVolumeSource(
path='/data/out'))).add_volume_mount(
k8s.V1VolumeMount(name='volume', mount_path="out"))
Up- and Downsides of Lightweight Components
- Very useful to run simple functions
- Can reduce the effort to create pipelines
- May not be used for complex functions
- Practical alternative: Google Cloud Platforms options
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> sudo pip install https://storage.googleapis.com/ml-pipeline/release/0.1.27/kfp.tar.gz
> dsl-compile --py pipeline.py --output pipeline.tar.gz
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> kubectl get workflow
NAME AGE
my-pipeline-8lwcc 6m47s
Getting the pipeline to run
- Pipelines are compiled using dsl-compile script
- They are then transformed to an Argo workflow and executed
> argo get my-pipeline-8lwcc
Name: my-pipeline-8lwcc
Namespace: kubeflow
ServiceAccount: pipeline-runner
Status: Succeeded
Created: Tue Aug 27 13:06:06 +0200 (4 minutes ago)
Started: Tue Aug 27 13:06:06 +0200 (4 minutes ago)
Finished: Tue Aug 27 13:06:22 +0200 (4 minutes ago)
Duration: 16 seconds
STEP PODNAME DURATION MESSAGE
✔ my-pipeline-8lwcc
├-✔ step1 my-pipeline-8lwcc-818794353 6s
└-✔ step2 my-pipeline-8lwcc-768461496 7s
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
from kfp.compiler import Compiler
pipeline_func = training_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.yaml'
Compiler().compile(pipeline_func, pipeline_filename)
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
import kfp
client = kfp.Client()
try:
experiment = client.create_experiment("Prototyping")
except Exception:
experiment = client.get_experiment(experiment_name="Prototyping")
Running pipelines from Notebooks
- Create and run pipelines by importing the kfp package
- Can be usefull to save notebook ressources
- Enable developers to combine prototyping with creating an automated training workflow
arguments = {'pretrained': 'False'}
run_name = pipeline_func.__name__ + ' test_run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)
Continuous Integration Lookout
Kubeflow provides a REST API
That’s it.
https://github.com/saschagrunert/
kubeflow-data-science-on-steroids
Kubeflow - Data Science on Steroids
By Sascha Grunert
Kubeflow - Data Science on Steroids
A presentation about Kubeflow
- 1,481