Serverless SQL

Lynn Langit

SQL is the ultimate Domain-specific language

~ 45 years old

@lynnlangit

Microsoft
SQL Server

Show SQL

(Microsoft)

SQL Server

  • On Linux
  • With Docker
@lynnlangit
@lynnlangit

Got $ 50 ?

Cost?  ~ $ 25 USD

Show SQL

(Google)

@lynnlangit

Serverless SQL since 2011

  • LOAD CSV, JSON, Avro
  • QUERY via ANSI SQL 
  • USE public datasets
  • PAY per Query Execution
  • CONTROL cost - alerts/quotas
@lynnlangit

Serverless SQL

BQ Details

  • Partition tables by date
  • Stream 100k rows/sec
  • Update via DML
  • Use JSON functions & UDFs
  • Connect via JDBC/ODBC
  • Secure via controls
  • Control cost - alerts/quotas
  • Integrate w/GCP
@lynnlangit

Why didn't
more people notice?

It's the pipeline...

@lynnlangit

Serverless SQL Services

@lynnlangit

Werner says...

AWS S3 Select

@lynnlangit

Show SQL

(AWS)

Use it...

3 Steps

  • Store in S3
  • Format in CSV or JSON
  • Use 'Select From'

Use it better...

Works with

  • Encrypted files (server side)
  • Compressed files (gzip)
  • Files with headers
  • In console up to 40 MB from files up 128 MB
  • Subset of SQL expressions
@lynnlangit

AWS Athena

@lynnlangit

Show SQL

(AWS)

Use it...

3 Steps

  • Store in S3
  • Define Table Schema
  • Run ANSI SQL query

Use it better...

Save 30-90% $/perf 

  • Compress
  • Partition by any key
  • Convert to columnar
@lynnlangit

Athena Details

  • Uses Presto - serverless and auto-scaling
  • Supports joins, windows and arrays
  • Input via CSV, JSON, ORC, Avro, Parquet, CloudTrail
  • Includes JDBC driver
  • Supported by QuickSight
  • Secured via S3 IAM permissions and encryption
  • No UDFs or stored_procs
  • Compression - gzip, LZO, snappy or zlib
  • Query results streamed to console and also stored in S3
@lynnlangit

Cost Example - Athena

@lynnlangit
@lynnlangit

RDS Aurora Serverless

Aurora Serverless Details

  • Use for variable workloads
  • Set the min/max DB capacity 
  • Purchase ACUs (Aurora Capacity Units)
@lynnlangit
@lynnlangit

Redshift Spectrum

Redshift Spectrum Details

  • Datatypes - parquet*, textfile, sequencefile, rcfile
  • Compression - gzip, snappy, bz2
  • Encryption - server-side only
  • Partition files into equal sizes (100 MB - 1 GB)
  • Schema - Athena, Hive or external
@lynnlangit

Athena

Best for 'ad-hoc' SQL queries of data in S3

Redshift Spectrum

For aggregate queries

Use Parquet files

@lynnlangit

Aurora Serverless

Elastic relational database

Moving Data

ETL via
   AWS Glue 

@lynnlangit

Show SQL

(AWS)

Glue Tranformations

@lynnlangit

Streams via
   Kinesis 

@lynnlangit

Kinesis Analytics

@lynnlangit

Serverless Services

@lynnlangit

Serverless Use Cases

  • Explore / Experiment
  • Ad hoc reporting
  • Data Warehouse
  • Data Processing
@lynnlangit
@lynnlangit

AWS Pipeline

Serverless Service

@lynnlangit

GCP Pipeline

@lynnlangit

Comparing Vendors

Serverless SQL

Use Case AWS  GCP 
Ad hoc S3 Select/ Athena BigQuery
Data Warehouse Redshift Spectrum BigQuery
Batch Transform Glue - PySpark BigQuery - UDFs
Streaming Kinesis Analytics BigQuery
Relational Aurora Serverless BigQuery
@lynnlangit

Updated Pipeline

@lynnlangit

A Real World Example

CSIRO Example

@lynnlangit

CSIRO Futures

@lynnlangit

What's Next?

"SQL" Machine Learning

@lynnlangit

Show SQL

(Google)

Serverless SQL Services

@lynnlangit
@lynnlangit

Serverless SQL Services

Lynn Langit

Serverless SQL Queries

By Lynn Langit

Serverless SQL Queries

Deck on AWS Athena and more for NDC Sydney Sept 2018

  • 1,118