Data Pipelines With Streamsets

Jowanza Joseph

@jowanza

Agenda

  • About me
  • The Problem Space
  • Streaming
  • StreamSets
  • Demo
  • Questions

About Me

  • Software Engineer at One ClickRetail
  • Scala / Spark / Mesos / Kubernetes
  • Author: Apache Spark Fieldbook
  • Cyclist
  • Husband and father

Retail Intelligence

Data Size

Real-Time

Operational Complexity

Batch Processing

What Are Data Pipelines?

What Problems Do They Solve?

  • Scalability
  • Complexity
  • Observability
  • Extendability 

Lambda Architecture

Kappa Architecture

Goals

  • Data Provenance
  • Guaranteed Delivery 
  • Configurable
  • Extendable
  • Multi-Protocol Support
  • DAG
  • Distribute

Based on Streams

Architecture

Running on Mesos

Analytics Data

Real-Time Data

Our Use Case

Demo

Data Pipelines With Streamsets

By Jowanza Joseph

Data Pipelines With Streamsets

  • 1,842