Data Pipelines With Streamsets
Jowanza Joseph
@jowanza
Agenda
About me
The Problem Space
Streaming
StreamSets
Demo
Questions
About Me
Software Engineer at One ClickRetail
Scala / Spark / Mesos / Kubernetes
Author: Apache Spark Fieldbook
Cyclist
Husband and father
Retail Intelligence
Data Size
Real-Time
Operational Complexity
Batch Processing
What Are Data Pipelines?
What Problems Do They Solve?
Scalability
Complexity
Observability
Extendability
Lambda Architecture
Kappa Architecture
Goals
Data Provenance
Guaranteed Delivery
Configurable
Extendable
Multi-Protocol Support
DAG
Distribute
Based on Streams
Architecture
Running on Mesos
Analytics Data
Real-Time Data
Our Use Case
Demo
Made with Slides.com