Snowplow at Secret Sauce

Gabor Ratky
CTO at Secret Sauce Partners

Event analytics at scale

Secret Sauce Partners

Secret Sauce Partners

Founded in 2010

B2B apparel e-commerce company by 2012

SV startup in Budapest

Integrated into high traffic websites

New market, unproven technology

millions

of pageviews

millions

of events

10 millions

of requests

Yesterday:

100Ms of pageviews+events/month
Billions of requests/month

Use case: AB testing

Control group

Use case: AB testing

Test group

Use case: AB testing

Measure the difference between the two groups

Segment visitors and look at impact in different segments

Calculate the impact

We can also

Understand user behavior and tweak the product

Spot application issues/bugs in the data (QA in Analytics)

Solution #1: DIY

Tracking events at scale is hard

Collecting events at scale is hard

Processing events at scale is hard

Storing events and making them queryable at scale is hard

DNDIY

at first

Solution #2: Piwik

Open Analytics Platform

Open Source Web Analytics

Ecommerce tracking

Limited custom attributes, no event tracking at the time 

Robust JavaScript tracker

LAMP stack

Great Web and People analytics

Solution #3: Kissmetrics

Software-as-a-service Event Analytics

Nice Web UI for Analytics

Robust JavaScript Library

Track events and properties

Funnels, A/B testing

Very limited query capabilities

Broken attributes, funnels

$31,000/month

Price

Solution #4: Mixpanel

Software-as-a-service Event Analytics

Nice Web UI for Analytics

Robust JavaScript Library

Track events and properties

Funnels, A/B testing

Very limited query capabilities

Broken attributes, funnels

$31,000/month

Price

Solution #5: Snowplow

Self-hosted Event Analytics

Rely on AWS for scaling (CloudFront, EMR, Redshift)

Robust JavaScript Library (based on piwik.js)

Track unstructured events and properties

"Bring-your-own" analytics (Custom SQL, Tableau, Looker)

~$3K-5K/month

Snowplow

10TB (5x ds2.xlarge) Redshift cluster

3.5TB gzipped CloudFront logs collected since May 2013

5.2 GBs of logs processed daily (4 hour schedule)

1B+ rows in atomic.events table

Custom SQL for analysis and reports

Takeaways

Every technology decision is a trade-off

SaaS is great as long as it makes sense economically

Don't DIY until you understand the problem and hit the wall

Open source is often a great trade-off

Price is not the only cost

Thanks!

Questions?

@rgabo

gabor@secretsaucepartners.com