Speeding Up The Distributed File System

Jowanza Joseph

@jowanza

Agenda

  • About Me
  • A Brief History of the Distributed File System
  • Back Breaking Scale
  • Alluxio
  • Demo Spark + Alluxio
  • Questions 

About Me

  • Senior Software Engineer One Click Retail
  • Scala / Java , Distributed Systems
  • Husband / Father

The Need For Distributed Storage

Data Powers Applications

Slow Data Is Hard Too Action

Streaming Is Valuable

Data Storage Is Expensive

Disk Based Analytics Is Expensive

The Hero

  • Reliability

  • Scalability

  • Distributed

  • Fault Tolerant

  • Operationally Difficult

Why Hadoop?

Hadoop Ecosystem

SQL On Hadoop

Lowest Common Denominator

What is Alluxio

Architecture

Works With

Demo

Reading Files From s3

Multiple Test Performance

Cache Performance

Mount Alluxio

Cache Performance

Questions?

Speeding Up The Distributed File System

By Jowanza Joseph

Speeding Up The Distributed File System

  • 1,224