Tech Talk #11

Domains vs OLAP Shootout

Sherman Wood, Ernesto Ongaro

Session Overview

Explore the difference between Domains and OLAP in these dimensions:

  • Introduction of Jaspersoft OLAP and Domains
  • Volume - Size of data
  • Complexity of queries + calculations
  • Velocity - How quickly do users expect data to be returned?
  • Velocity - How quickly does the data change underneath?
  • Variety - Where does the data come from?

OLAP vs Domains





An introduction

Domains

  •  A Domain is like "view" of the database in relational terms
  • Require a JDBC, SQL datasource
  • Are defined in the web interface (by an XML file)
  • Take DomEL as a query language
  • Generate SQL or use in-memory to calculate aggregates
  • Have an in-memory, per-query cache

Jaspersoft OLAP

  • OLAP employs a technique called Multidimensional Analysis
  • Uses the open-source engine called Mondrian
  • Require a JDBC, SQL datasource stored in a Star or Snowflake schema
  • Metadata model is designed in a separate tool called OLAP Workbench, saved as XML and published to the server
  • MDX is the query language, which issues SQL to the underlying datasource, this is what makes Mondrian a "Relational OLAP solution"
  • There's an in-memory cache and it can become aware of aggregate tables

Philosophies


  • OLAP requires a Star Schema which typically requires ETL and a proper DataWarehouse or DataMart being built
  • ETL can lead to cleaner data
  • Cleaner data can lead to better analysis

  • Domains don't require much. They let you get in trouble much faster because of this
  • Tempting to use Operational data
  • Flexibility is good for some cases

Thinking Dimensionally

(not dementia!)

  • OLAP is all about dimensions and measures:

Location is an example Dimension:
Earth -> Country -> State -> City

Population is an example measure

Slicing measures over dimensions is what we're talking about today. Analysis.

The Importance of Time


  • MDX has special functions for time like LastPeriod, QTD
  • Domains, you have to define them



Velocity -

 How Quick do End Users Expect Queries to Return?

Caching


Domain Cache: 
  • Operates at  query level - only an exact hit will trigger cache
  • Stored using ehcache

OLAP cache:
  • Aggregation Cache: Aggregates measures over dimensions and can be utilized for several types of queries needing cache
  • Stored using ehcache

Aggregate Tables


  • Jaspersoft OLAP - can aggregate pre-selected measures over a dimension. Say Sales over Time. Queries against Sales/Time will try to look up there first. Very powerful!

  • Domains - Can't do this (today)

Winner:



OLAP



Complexity of queries + calculations

Calculations

Domain's Calculated Fields:
  • Let you define calculated fields using DomEL or Groovy in app-context.xml
  • Can push down work to underlying database or do it in memory
  • Not "aware" of dimensionality

OLAP's Calculated Members :
  • Not trivial to add custom calculations (user defined functions) - requires a custom class
  • Pushes work to underlying database
  • Is aware of dimensions, for example "PrevMember"    

Shared Dimensions

In some cases you might have two fact tables with shared dimensions. For example:

Time and Store are shared between Sales and Inventory. Today only OLAP can handle this gracefully.

Multi-Pass SQL


OLAP will issue multiple SQL queries to generate answers and glue them together. Domains are limited to one query per question


The winner: OLAP

OLAP: 2
Domains: 0



Velocity - How quickly does the data change underneath?

Caching



If data changes slowly, then OLAP's multiple cache + aggregate table mechanisms are clear winners. If the data changes almost constantly then they're not advantages at all!

ETL

ETL jobs add overhead between how quickly the data changed and how quickly you can access it

Domains: tend to need less ETL
OLAP: tend to need an ETL process


Winner:
Domains!
OLAP: 2
Domains: 1



Variety - Where does the data come from?

Structure of Data

Domains:
  • Domains offer lots of flexibility here, data does not need to be in any particular form (careful!)
  • No need for ETL (careful!)

OLAP:
  • Data must be in a star or snowflake schema
  • ETL is mandatory to put into this structure

Sources of Data


Domains: Data must come from a SQL 92 compliant datasource

OLAP: Data must come from a SQL 92 compliant datasource (in a star or snowflake schema)

Topics:  Data can come from ANYWHERE. MongoDB, Google Big Query, Hadoop HBASE, Flat files. Without requiring any ETL





Though topics are not Domains, they are somewhat related. We'll give the point to Domains :)

Winner: Domains

OLAP: 2
Domains: 2




Volume - Size of data

Considerations

  • Consider that both Domains and OLAP can employ "push down" technology.
  • Consider that most analysis is done over aggregate data (datasets are large, result sets are small)
  • Consider that Topics can load lots of records into memory if needed (ehcahe pluggable)
  • OLAP will do better than domains for a given size of relational database, Aggregate tables rule.
     

Winner: OLAP. Push down is king, let the database do the work.
OLAP: 4
Domains: 3

Conclusion...



No two Business Intelligence projects are the same. There is no silver bullet technology. Both are good in their own ways.

Future...expect Domains to overtake OLAP!

Thanks for tuning in!


  • March 26: Linux Installation Tips + Tricks with Francois Cerbelle

tt_domains_vs_olap

By ernestoo

tt_domains_vs_olap

  • 2,434