Data Pipelines with OpenWhisk

Jowanza Joseph

@jowanza

Agenda

  • Problem Space
  • Data Pipelines
  • Serverless / Event DrivenĀ 
  • Apache OpenWhisk
  • Project Architecture
  • Kubernetes
  • Tips
  • Questions

The Problem Space

Ergonomics

The Problem Space

  • How do we deploy new pipelines quickly?
  • Have minimal impact on the system?
  • Maintain custom environments and runtime?
  • Maintain ergonomic simplicity?

Data Pipelines

Serverless

Benefits

  • Cheap
  • Quick to implement and isolate
  • Event Driven
  • Easy to take advantage of

Costs

  • Cloud specific implementations
  • Hard to test / version
  • Language specific

Supported APIs

  • Javascript
  • Python
  • PHP
  • Swift
  • Java
  • DOCKER

More Stuff

  • Sequences
  • Packages

Architecture

  • Load Balancing
  • That's pretty much it

Controller

  • Akka
  • Service Mapping
  • Job Queuing
  • Which code to run
  • History of code that has been run
  • Logs
  • Authentication details
  • Service Discovery
  • That's it
  • Pub/Sub
  • Exactly Once Delivery
  • Retention
  • Distributed

Invoker

  • Docker Container
  • Isolation
  • Control

APIs

  • Whisk CLI
  • API Gateway
    • Supports Versioning
    • Supports packaging

Whisk CLI

Whisk API Gateway

https://{APIHOST}/api/v1/namespaces/{namespace}/actions
https://{APIHOST}/api/v1/namespaces/{namespace}/triggers
https://{APIHOST}/api/v1/namespaces/{namespace}/rules
https://{APIHOST}/api/v1/namespaces/{namespace}/packages
https://{APIHOST}/api/v1/namespaces/{namespace}/activations

Custom Whisk Actions

How It Works

  1. Train Models In Spark
  2. Bundle
  3. Run via MLeap Runtime
  4. Profit

How It Works

Benefits

  • Type Safety
  • Simplified Execution Context
  • Similar APIs to Spark

Project Architecture

Really Nice Things

  • Logging
  • Performance Monitoring
  • Shared Actions
  • Scaling

Trade Offs

  • Excellent Isolation
  • Scalability
  • Customizability
  • OpenWhisk is a little hard
  • Some of the semantics are hard to grasp
  • MLeap Requires extra effort

A Word On Images

Base OpenWhisk Image

A Word On Deployment

Data Pipelines with OpenWhisk

By Jowanza Joseph

Data Pipelines with OpenWhisk

  • 1,389