Gabor Ratky
CTO at Secret Sauce Partners

Decoupling Big Data

Budapest Big Data Meetup, 1/17/2017

Secret Sauce Partners

6TBs of Big Data

Conceptual model

Answer = Data + Computation

Conceptual model

Information = Raw data + Processing

Conceptual model

Analytics = Storage + Compute

Conceptual model

Computer = SSD/RAM + CPU

Conceptual model

Hadoop = HDFS + MapReduce

2006

"[..] from single servers to thousands of machines, each offering local computation and storage."

Conceptual model

Hadoop = (HDFS + MapReduce)

2006

"[..] from single servers to thousands of machines, each offering local computation and storage."

Conceptual model

Data Warehouse = (HDFS + MapReduce) + Hive

2008

Conceptual model

Data Warehouse = Amazon Redshift

2012

Conceptual model

Data Application = Storage + Spark

2012

Conceptual model

Data Application = HDFS + Spark

2012

Conceptual model

Data Application = Amazon S3 + Spark

2012

Conceptual model

Data Warehouse = HDFS + Spark SQL

2014

How much is this going to cost me?

Conceptual model

Cost = Storage + Compute

Conceptual model

Cost = Storage + Compute + Operations

Conceptual model

Value = (Storage + Compute) * Utilization

Utilization

Decoupling Storage & Compute

  • Scale storage and compute independently from each other
  • Have unlimited storage and only pay for what you use
  • Run ephemeral clusters and only pay for what you use
  • Fine grained control over performance, availability, durability and cost
  • Run many different workloads on top of the same storage
  • Use cloud-based services to perform your workloads
  • Makes a lot of sense in the cloud, less so on-prem

Conceptual model

Data Application = S3 + Spark (EMR)

2015

Conceptual model

Data Lake = S3 + Amazon Athena

2016

Conceptual model

BI = RDS/Redshift/S3 + Amazon QuickSight

2016

Conceptual model

Data Science = S3 + Databricks Notebook

2016

Serverless Computing

  • Amazon Quicksight (BI)
  • ​Amazon Athena (Hive/JDBC SQL)
  • AWS Glue (ETL)
  • Google BigQuery
  • Google Cloud Dataflow
  • Google Cloud Dataproc
  • Amazon Lex/Polly/Recognition/ML (AI)
  • Google Cloud Machine Learning, Speech/Translation/Vision API

2017

Thanks!

Questions?

 

@rgabo

gabor@secretsaucepartners.com

Made with Slides.com