Open Source Business Intelligence
Jowanza Joseph
@jowanza
Agenda
Introduce myself
Data Engineering
History of Business Intelligence
Elements of modern BI
Explore modern BI Architecture
Open source elements of BI
About Me
Disclaimer
Don't do this if it's not an element of your product
Don't do this if you consider BI a trivial company expense
Don't do this unless your life depends on it
Data Engineering
Data Engineering
Ingesting data from varied sources
Extracting valuable bits from those sources
Converting data from one format for another
Enriching data
Storing data
Creating interfaces for other teams and engineers to use the data
With those interfaces enhance the product*
Business Intelligence
Business Intelligence
Systems Side
BI Market
$16B Market
Growing to $22B by 2022*
Big Players & Upstarts
Lots of money on the table
Vendors
Traditional BI
Tightly coupled architecture
More costly to scale up
Non-extensible
Vendor 🔒Â
Limited product use*
Hadoop
Advantages
Commodity Hardware
Scales out to thousands of nodes
Fast (enough)
Extensible
Downsides
Transactions
Complexity
Failover
Scalability Model
Let's Build Something New
Wish List
All (mostly) open source
Scale storage and compute separately
Allow for some elements of self-service
Provide a SQL Interface for the data
Allow for data discovery
Allow for querying of real-time data*
Primatives
Data Storage
Advantages
Has primitives for scalable storage
Large community
S3 Compliant API
Cloud Native
Disadvantages
Durability
Replication (Global)
Industry focused on S3
Ecosystem
Ingestion
Advantages
Proven at massive scaleÂ
Ecosystem
Extensibility
Disdvantages
Cumbersome to operate
Overhead of learning a new DSL
How to store the data
Compressed
Querying
Advantages
Battle Tested
Scales separate from the storage
Works with multiple file types
Vast ecosystem
Plugs into many tools for monitoring
Plugs into tools for access control
Disadvantages
Not Cloud Native
Many moving parts
High Total Cost of Ownership
Visualization
Metabase
Discovery
Sharing
Advantages
Self-service
Extensible
Deployment Model
Ecosystem
Disadvantages
Self-service
Extensible
Deployment Model
Ecosystem
Streaming
Use Cases
Summary
Wins
Open Source
Built on cloud-native technology
Can horizontally scale
Meets the needs of the product
L's
Complicated
Maintenance overhead
Lots of configuration
The Future
More self-service tools like Metabase
More flexible file formats for storing data
Better standards for data lake architectures
Better stream/data storage interpolation
Simple access control systems
Resources
Questions
Made with Slides.com