Open Source Business Intelligence
Jowanza Joseph
@jowanza
Agenda
- Introduce myself
- Data Engineering
- History of Business Intelligence
- Elements of modern BI
- Explore modern BI Architecture
- Open source elements of BI
About Me
Disclaimer
- Don't do this if it's not an element of your product
- Don't do this if you consider BI a trivial company expense
- Don't do this unless your life depends on it
Data Engineering
Data Engineering
- Ingesting data from varied sources
- Extracting valuable bits from those sources
- Converting data from one format for another
- Enriching data
- Storing data
- Creating interfaces for other teams and engineers to use the data
- With those interfaces enhance the product*
Business Intelligence
Business Intelligence
Systems Side
BI Market
- $16B Market
- Growing to $22B by 2022*
- Big Players & Upstarts
- Lots of money on the table
Vendors
Traditional BI
- Tightly coupled architecture
- More costly to scale up
- Non-extensible
- Vendor 🔒Â
- Limited product use*
Hadoop
Advantages
- Commodity Hardware
- Scales out to thousands of nodes
- Fast (enough)
- Extensible
Downsides
- Transactions
- Complexity
- Failover
- Scalability Model
Let's Build Something New
Wish List
- All (mostly) open source
- Scale storage and compute separately
- Allow for some elements of self-service
- Provide a SQL Interface for the data
- Allow for data discovery
- Allow for querying of real-time data*
Primatives
Data Storage
Advantages
- Has primitives for scalable storage
- Large community
- S3 Compliant API
- Cloud Native
Disadvantages
- Durability
- Replication (Global)
- Industry focused on S3
- Ecosystem
Ingestion
Advantages
- Proven at massive scaleÂ
- Ecosystem
- Extensibility
Disdvantages
- Cumbersome to operate
- Overhead of learning a new DSL
How to store the data
Compressed
Querying
Advantages
- Battle Tested
- Scales separate from the storage
- Works with multiple file types
- Vast ecosystem
- Plugs into many tools for monitoring
- Plugs into tools for access control
Disadvantages
- Not Cloud Native
- Many moving parts
- High Total Cost of Ownership
Visualization
Metabase
Discovery
Sharing
Advantages
- Self-service
- Extensible
- Deployment Model
- Ecosystem
Disadvantages
- Self-service
- Extensible
- Deployment Model
- Ecosystem
Streaming
Use Cases
Summary
Wins
- Open Source
- Built on cloud-native technology
- Can horizontally scale
- Meets the needs of the product
L's
- Complicated
- Maintenance overhead
- Lots of configuration
The Future
- More self-service tools like Metabase
- More flexible file formats for storing data
- Better standards for data lake architectures
- Better stream/data storage interpolation
- Simple access control systems
Resources
Questions
Open Source Business Intelligence
By Jowanza Joseph
Open Source Business Intelligence
- 983