Big Data 2016

Gabor Ratky
CTO at Secret Sauce Partners

Climbing the slope of enlightenement

Secret Sauce Partners

Hype Cycle

Three things we're excited about

EMR 4

EMR is the largest Hadoop distribution by market share

EMR 4 based on Apache Bigtop

Integration with AWS services (EC2, S3, Redshift)

Ephemeral clusters

Intelligent resizing

EMR Sandbox (Zeppelin)

Spark 1.5

Not for everyone (cloud lock-in)

Spark

We write distributed applications

Right level of abstraction to reason about

"Write once, run everywhere"

Huge momentum (also hype)

Frameworks on top of primitives (DataFrame, MLLib)

Zeppelin

80% of data science/engineering is data munging

Document datasets, share steps and results

Killer app to work with and collaborate on datasets

Familiar REPL experience

Predictions for 2016

Consolidation

SQL

Tooling

+1: IoT

Big Data 2016

By Secret Sauce Partners, Inc.

Big Data 2016

  • 2,595