Open Source Business Intelligence

Jowanza Joseph

@jowanza

Agenda

  • Introduce myself
  • Data Engineering
  • History of Business Intelligence
  • Elements of modern BI
  • Explore modern BI Architecture
  • Open source elements of BI

About Me

Disclaimer

  • Don't do this if it's not an element of your product
  • Don't do this if you consider BI a trivial company expense
  • Don't do this unless your life depends on it

Data Engineering

Data Engineering

  • Ingesting data from varied sources
  • Extracting valuable bits from those sources
  • Converting data from one format for another
  • Enriching data
  • Storing data
  • Creating interfaces for other teams and engineers to use the data
  • With those interfaces enhance the product*

Business Intelligence

Business Intelligence

Systems Side

BI Market

  • $16B Market
  • Growing to $22B by 2022*
  • Big Players & Upstarts
  • Lots of money on the table

Vendors

Traditional BI

  • Tightly coupled architecture
  • More costly to scale up
  • Non-extensible
  • Vendor 🔒 
  • Limited product use*

Hadoop

Advantages

  • Commodity Hardware
  • Scales out to thousands of nodes
  • Fast (enough)
  • Extensible

Downsides

  • Transactions
  • Complexity
  • Failover
  • Scalability Model

Let's Build Something New

Wish List

  • All (mostly) open source
  • Scale storage and compute separately
  • Allow for some elements of self-service
  • Provide a SQL Interface for the data
  • Allow for data discovery
  • Allow for querying of real-time data*

Primatives

Data Storage

Advantages

  • Has primitives for scalable storage
  • Large community
  • S3 Compliant API
  • Cloud Native

Disadvantages

  • Durability
  • Replication (Global)
  • Industry focused on S3
  • Ecosystem

Ingestion

Advantages

  • Proven at massive scale 
  • Ecosystem
  • Extensibility

Disdvantages

  • Cumbersome to operate
  • Overhead of learning a new DSL

How to store the data

Compressed

Querying

Advantages

  • Battle Tested
  • Scales separate from the storage
  • Works with multiple file types
  • Vast ecosystem
  • Plugs into many tools for monitoring
  • Plugs into tools for access control

Disadvantages

  • Not Cloud Native
  • Many moving parts
  • High Total Cost of Ownership

Visualization

Metabase

Discovery

Sharing

Advantages

  • Self-service
  • Extensible
  • Deployment Model
  • Ecosystem

Disadvantages

  • Self-service
  • Extensible
  • Deployment Model
  • Ecosystem

Streaming

Use Cases

Summary

Wins

  • Open Source
  • Built on cloud-native technology
  • Can horizontally scale
  • Meets the needs of the product

L's

  • Complicated
  • Maintenance overhead
  • Lots of configuration

The Future

  • More self-service tools like Metabase
  • More flexible file formats for storing data
  • Better standards for data lake architectures
  • Better stream/data storage interpolation
  • Simple access control systems

Resources

Questions

Open Source Business Intelligence

By Jowanza Joseph

Open Source Business Intelligence

  • 983