Serverless SQL

Lynn Langit

The Ultimate

Domain-specific language

~ 45 years old

@lynnlangit
@lynnlangit

SQL Server

  • On Linux
  • With Docker

Got $ 50 ?

Cost?  ~ $ 25 USD

Big Query

@lynnlangit

Serverless SQL since 2011

  • LOAD CSV, JSON, Avro
  • QUERY via ANSI SQL 
  • USE public datasets
  • PAY per Query Execution
  • CONTROL cost - alerts/quotas
@lynnlangit

Serverless SQL

BQ Details

  • Partition tables by date
  • Stream 100k rows/sec
  • Update via DML
  • Use JSON functions & UDFs
  • Connect via JDBC/ODBC
  • Secure via controls
  • Control cost - alerts/quotas
  • Integrate w/GCP
@lynnlangit

Why didn't
more people notice?

It's the pipeline...

@lynnlangit

Werner says...

Serverless SQL Services

@lynnlangit

AWS S3 Select

AWS S3 Select

@lynnlangit

Use it...

3 Steps

  • Store in S3
  • Format in CSV or JSON
  • Use 'Select From'

Use it better...

Works with

  • Encrypted files (server side)
  • Compressed files (gzip)
  • Files with headers
  • In console up to 40 MB from files up 128 MB
  • Subset of SQL expressions
@lynnlangit

AWS Athena

AWS Athena

@lynnlangit

Use it...

3 Steps

  • Store in S3
  • Define Table Schema
  • Run ANSI SQL query

Use it better...

Save 30-90% $/perf 

  • Compress
  • Partition by any key
  • Convert to columnar
@lynnlangit

Details

  • Uses Presto - serverless and auto-scaling
  • Supports joins, windows and arrays
  • Input via CSV, JSON, ORC, Avro, Parquet, CloudTrail
  • Includes JDBC driver
  • Supported by QuickSight
  • Secured via S3 IAM permissions and encryption
  • No UDFs or stored_procs
  • Compression - gzip, LZO, snappy or zlib
  • Query results streamed to console and also stored in S3
@lynnlangit

Cost

Item AWS Athena GCP Big Query
Query $ 5 / TB scanned $ 5 / TB scanned
Store Data $ 20 / TB** in S3 $ 20 / TB** in BQ

*   Loading and Exporting data is FREE

** Price per TB per month

@lynnlangit

Cost Example - Athena

@lynnlangit

Speed

Item AWS Athena GCP Big Query
Query
Input data
Output data

TBD

@lynnlangit

Moving Data

Use Cases

  • Explore / Experiment
  • Ad hoc reporting
  • Data Warehouse
  • Data Processing
@lynnlangit

ETL via
   Glue and...

@lynnlangit

Glue Tranformations

Glue Tranformations

Kinesis Analytics

@lynnlangit

Serverless Services

@lynnlangit
@lynnlangit

Athena

Best for 'ad-hoc' SQL queries of data in S3

Redshift Spectrum

Best for 'frequently-accessed' and 'highly-structured' data stored in S3

@lynnlangit

More about Redshift Spectrum

  • Datatypes - parquet*, textfile, sequencefile, rcfile
  • Compression - gzip, snappy, bz2
  • Encryption - server-side only
  • TIP - break files into equal sizes (100 MB - 1 GB)
  • CREATE EXTERNAL SCHEMA - Athena, Hive or external
@lynnlangit

Serverless Service

@lynnlangit

GCP Pipeline "A"

@lynnlangit

GCP Pipeline "B"

@lynnlangit

Pipeline Pattern

@lynnlangit

What's the bottom line?

Serverless SQL

Use Case AWS  GCP 
Ad hoc S3 Select/ Athena BigQuery
Data Warehouse Redshift Spectrum BigQuery
Batch Transform Glue - PySpark BigQuery - UDFs
Streaming Kinesis Analytics BigQuery
Relational Aurora Serverless BigQuery
@lynnlangit

What's Next?

What's Next?

@lynnlangit

"SQL still rules the world"

-- Eric Meijer

ML-SQL?

Serverless SQL Services

Lynn Langit

Serverless SQL

By Lynn Langit

Serverless SQL

Deck on AWS Athena and more for Velocity Con in San Jose June 2018

  • 1,603