Reporting, Dashboards and Analytics for Big Data and NoSQL


ernesto ongaro - dublin
november 2013

Agenda

  • What is Big Data and NoSQL about?
  • Explore 3 ways to get at your NoSQL data
    • Indirect batch analysis
    • Batch analysis
    • Interactive exploration
  • Example + Demonstration
  • Q&A

Relational Databases

  • Databases are sets of tables, those tables have fields
  • Data is stored in columns and rows, rows have fields of a specific type. Fixed schema
  • Queried through SQL - can restrict, project and join
  • Use an RDBMS unless you find limitations with it

Buzz words






NoSQL + BigData usually are synonymous. Not true!









NoSQL: A database that is not relational, data is stored more flexibly than column and rows. Queried through other ways than just SQL




Big Data: Data that is probably bigger than what can comfortably fit into an RDBMS

NoSQL Databases

Store data differently: documents, key value stores, graphs

source: rackspace

Document Database

  • Instead of rows/columns you have documents
  • Flexible
  • Excellent example of NoSQL
  • Examples: MongoDB, CouchDB

Graph Stores

Image: Wikipedia Graph Database article

  • Specialized storage in nodes, properties and edges
  • Node: item, in this case a person
  • Properties: Describe a node
  • Edges: Relationships between nodes

Key-Value Stores

Key: 1 Sex: Female Score: 199 Age: 27



Key: 2 Fruit: Apple Score: 222 Age: 19 Active: false



  • Data is stored in "bins" - each bin has a name and a value. A key relates to a set of bins
  • Good for very fast operations, not ideal for BI
  • Examples: Riak, Redis

Column Stores

  • Use tables, have no joins, cheap to make wide and leave values blank
  • Don't confuse them with column-oriented databases like Vertica or Infobright
  • Tend to be fast for reads 
  • Examples: Hadoop HBase, Cassandra

Why NoSQL?




  • Scaling
  • Simpler data model
  • Volume (stream i/o)
  • No schema



Reporting and Analytics?



  • Reporting is typically about filtering columns and rows and arranging them how you want the data to be displayed

  • Analytics is typically about aggregating the data in those rows and visualizing it in a crosstab or chart





this is true for both NoSQL and SQL data



exceptions: graph and tree visualizations and other specialized visualizations

So....





The labor of reporting and analytics on NoSQL is "flattening" data and fitting it into rows and columns 





3 Ways to get at your NoSQL data

 

Indirect Batch Analysis

  • Benefits: 
    • Use your BI tool of choice
    • ETL lets you "clean" data 
  • Down falls:
    • Latency and maintenance of ETL process
    • Making copy of data

Interactive Data Exploration


  • Benefits:
    • No latency
    • No development of ETL
  • Downfalls:
    • Data quality issues
    • No metadata, queries still written by developers

Direct batch reporting


  • Benefits:
    • Leverage native query language
    • Low/No latency
  • Downfalls:
    • No data quality filter
    • Queries on NoSQL can be hard (no joins, etc)

Jaspersoft:



Does all three!

Direct batch and Live exploration connectors for:
MongoDB, Cassandra, Hadoop Hbase, Hadoop Hive

ETL components (from Talend):






  • Demo on MongoDB ETL
  • Demo on MongoDB Reporting
  • Demo on data MongoDB Exploration




Questions?



Thank you!
@not_a_poet
Made with Slides.com