Elastic Search and Apache Spark
Jowanza Joseph
@jowanza
www.jowanza.com
Agenda
About Me ~
1 Minute
Data Pipelines ~
5 Minutes
Distributed Data ~
5 Minutes
Batch / Streaming ~ 3
Minutes
Operations ~
5 Minutes
Elastic + Spark ~
3 Minutes
Code ~
15 Minutes
About Me
Senior Software Engineer at One Click Retail
Scala/ Java: Spark, Flink, Kafka
Writing a Book about Apache Spark
Cyclist / Golfer
Father
Data Pipelines
Distributed Data
A Brief History
Streaming
Operations
Lambda Architecture
ElasticSearch
Master-Master
Memory Optimizations
Supports The Hadoop Ecosystem
Supports Streaming & Batch
Full-text Search
Is JSON Good?
No Types
No Compression
Universal?
Difficult to Marshall
No suitable for joins
Marshalling
Apache Spark
In-Memory Computing
Rich Ecosystem
Mature Query APIs
Performance
Elastic + Spark
Easily Distributed
Batch Workloads
Full-Text Search
Streaming Workloads
Large Ecosystem
Costly (But that's OK)
Batch Demo
Streaming Demo
Code
Made with Slides.com