Elastic Search and Apache Spark

Jowanza Joseph

@jowanza

www.jowanza.com

Agenda

  • About Me ~ 1 Minute
  • Data Pipelines ~ 5 Minutes
  • Distributed Data ~ 5 Minutes
  • Batch / Streaming ~ 3 Minutes
  • Operations ~ 5 Minutes
  • Elastic + Spark ~ 3 Minutes
  • Code ~ 15 Minutes

About Me

  • Senior Software Engineer at One Click Retail
  • Scala/ Java: Spark, Flink, Kafka
  • Writing a Book about Apache Spark
  • Cyclist / Golfer
  • Father

Data Pipelines

Distributed Data

A Brief History

Streaming

Operations

Lambda Architecture

ElasticSearch

  • Master-Master
  • Memory Optimizations
  • Supports The Hadoop Ecosystem
  • Supports Streaming & Batch
  • Full-text Search

Is JSON Good?

  • No Types
  • No Compression
  • Universal?
  • Difficult to Marshall
  • No suitable for joins

Marshalling

Apache Spark

  • In-Memory Computing
  • Rich Ecosystem
  • Mature Query APIs
  • Performance

Elastic + Spark

  • Easily Distributed
  • Batch Workloads
  • Full-Text Search
  • Streaming Workloads
  • Large Ecosystem
  • Costly (But that's OK)

Batch Demo

Streaming Demo

Code

Elastic Search and Apache Spark

By Jowanza Joseph

Elastic Search and Apache Spark

SLC ElasticSearch Meetup May 2017

  • 1,675