Data Pipelines With Apache Spark and Couchbase

Jowanza Joseph

@jowanza

www.jowanza.com

Agenda

  • About Me ~ 1 Min
  • What are data pipelines? ~ 3 Min
  • Challenges ~ 3 Min
  • Couchbase ~ 3 Min
  • Spark ~ 3 Min
  • Marshalling ~ 2 Min
  • Demo ~ 15 Min
  • Questions

About Me

  • Software Engineer at One Click Retail
  • Spark, Scala / Java
  • Apache Spark Field Book
  • Cycling / Golf
  • Father

What are data pipelines?

Challenges

  • Volume
  • Drowning
  • Latency
  • Swiss Army Knife
  • Flexibility

Volume

Drowning

Latency

Swiss Army Knife

Flexibility

Cap Theorem

Why Spark?

  • Glue Framework
  • Type Safety 
  • Distributed
  • Fault Tolerant

Why Couchbase?

  • Batch & Stream
  • Full-text search
  • Flexible Schema
  • N1QL
  • Easy Distribution
  • Minimal Operational Overhead

Marshall

Demo

Data Pipelines With Apache Spark and Couchbase

By Jowanza Joseph

Data Pipelines With Apache Spark and Couchbase

Big Mountain Data Conference - Spring 2017

  • 2,316