Data Pipelines with OpenWhisk

Jowanza Joseph

@jowanza

Agenda

Problem Space
Data Pipelines
Serverless / Event Driven
Apache OpenWhisk
Project Architecture
Kubernetes
Tips
Questions

The Problem Space

Ergonomics

The Problem Space

How do we deploy new pipelines quickly?
Have minimal impact on the system?
Maintain custom environments and runtime?
Maintain ergonomic simplicity?

Data Pipelines

Serverless

Benefits

Cheap
Quick to implement and isolate
Event Driven
Easy to take advantage of

Costs

Cloud specific implementations
Hard to test / version
Language specific

Supported APIs

Javascript
Python
PHP
Swift
Java
DOCKER

More Stuff

Sequences
Packages

Architecture

Load Balancing
That's pretty much it

Controller

Akka
Service Mapping
Job Queuing

Which code to run
History of code that has been run
Logs
Authentication details

Service Discovery
That's it

Pub/Sub
Exactly Once Delivery
Retention
Distributed

Invoker

Docker Container
Isolation
Control

APIs

Whisk CLI
API Gateway
- Supports Versioning
- Supports packaging

Whisk CLI

Whisk API Gateway

https://{APIHOST}/api/v1/namespaces/{namespace}/actions
https://{APIHOST}/api/v1/namespaces/{namespace}/triggers
https://{APIHOST}/api/v1/namespaces/{namespace}/rules
https://{APIHOST}/api/v1/namespaces/{namespace}/packages
https://{APIHOST}/api/v1/namespaces/{namespace}/activations

Custom Whisk Actions

How It Works

Train Models In Spark
Bundle
Run via MLeap Runtime
Profit

How It Works

Benefits

Type Safety
Simplified Execution Context
Similar APIs to Spark

Project Architecture

Really Nice Things

Logging
Performance Monitoring
Shared Actions
Scaling

Trade Offs

Excellent Isolation
Scalability
Customizability

OpenWhisk is a little hard
Some of the semantics are hard to grasp
MLeap Requires extra effort

Data Pipelines with OpenWhisk

Agenda

The Problem Space

Ergonomics

The Problem Space

Data Pipelines

Serverless

Benefits

Costs

Supported APIs

More Stuff

Architecture

Controller

Invoker

APIs

Whisk CLI

Whisk API Gateway

Custom Whisk Actions

How It Works

How It Works

Benefits

Project Architecture

Really Nice Things

Trade Offs

A Word On Images

Base OpenWhisk Image

A Word On Deployment

Data Pipelines with OpenWhisk

Data Pipelines with OpenWhisk

Jowanza Joseph

Data Pipelines with OpenWhisk

Agenda

The Problem Space

Ergonomics

The Problem Space

Data Pipelines

Serverless

Benefits

Costs

Supported APIs

More Stuff

Architecture

Controller

Invoker

APIs

Whisk CLI

Whisk API Gateway

Custom Whisk Actions

How It Works

How It Works

Benefits

Project Architecture

Really Nice Things

Trade Offs

A Word On Images

Base OpenWhisk Image

A Word On Deployment

Data Pipelines with OpenWhisk

More from Jowanza Joseph