Demystifying NoSQL + Big Data





october 2014

BUSINESS ANALYTICS FOR ALL 

Ernesto Ongaro

Agenda




  • Part 1: Relational and SQL
  • Part 2: NoSQL and BigData
  • Part 3: Value in NoSQL + Big Data
  • Part 4: What you can do today




Part 1: Relational and SQL


1970 Elvis 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 peace 1970 1970 1970 1970 1970 1970 1970 love 1970 1970 Nixon 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 birth of the relational database

A Relational Model of Data for Large Shared Data Banks

(Edgar F. Codd, IBM, 1970)

  • Defined need for relations
  • Defined need for query language

Full Name Birth Date Department
John Smith Jan 1, 1970 Marketing
Jonas McKnight Jun 7, 1965 Sales
Michael Jones Dec 19, 1980 Marketing

Relational Databases

SEQUEL: A Structured English Query Language

(Donald D. Chamberlin; Raymond F. Boyce, IBM, 1974)


SELECT * FROM employees WHERE Department=Marketing

Full Name Birth Date Department
John Smith Jan 1, 1970 Marketing
Jonas McKnight Jun 7, 1965 Sales
Michael Jones Dec 19, 1980 Marketing

Relational Databases + SQL...

a long and happy marriage

Created with Highcharts 3.0.8ValuesDatabase Vendor Shares 2012source: : IDC - Annual Worldwide RDBMS Vendor Shares 45201817OracleMicrosoftIBMOthers01020304050OracleMarket share: 45%Highcharts Cloud ALPHA

Relational Databases

  • Still very relevant today to store operational data
  • Use an RDBMS if you need all of these:
    • Complete query language
    • Transactions
    • Predefined schemas
    • Consistency between replicas
  • Common limitations:
      • scale
      • replication
      • unstructured data storage
      • speed




Part 2: NoSQL and Big Data

Buzz words






NoSQL + Big Data are not synonymous. 









NoSQL: A database that is not relational, data is stored more flexibly than column and rows. Queried through other ways than just SQL





Big Data: Data that is probably bigger than what can comfortably fit into a conventional system




NoSQL - different flavors

  • Document databases pair each key with a complex data structure known as a document.

  • Graph stores are used to store information about networks, such as social connections. 

  • Key-value stores  are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. 

  • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

source

Example: Document Database

 {
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    },
    "phoneNumbers": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}

Instead of rows/columns you have documents
  • Flexible
  • Excellent example of NoSQL
  • Examples: MongoDB, CouchDB
  • Why NoSQL?




    • Scaling
    • Simpler data model
    • Volume (stream i/o)
    • No schema



    Why now?



    • Price of storage ▼
    • Speed of storage 
    • Amount of data ▲
    • Expectations from end users ▲
    • Internet of Things (digitization)
    • Availability of analytics tools




    Part 3: Value in NoSQL and Big Data

    Exploitative vs Explorative




    Exploitative innovation builds on top of existing knowledge for existing products, markets and customers
    Explorative innovations are radical in their nature and aim to depart from the established way of thinking about a product, market or process



    (Jansen, Van Den Bosch, & Volberda, 2006, p. 1662)

    Online vs Offline Processing





    Online: ingest, store, manage and sometimes analyze some data in real-time

    Offline: batch analytical jobs



    source

    Opportunity Matrix

    Explorative Exploitative
    Online Data
    Offline Data

    © Ernesto Ongaro

    Super Bowl Power Outage

    Exploitative Online

    Explorative Exploitative
    Online Data
    Offline Data

    "Our analysis powers a daily Klout Score on a scale from 1-100 that shows how much influence social media users have and on what topics. We are using [Apache] Storm to develop a realtime scoring and moments generation pipeline"
    © Ernesto Ongaro - source

    Explorative Offline

    Explorative Exploitative
    Online Data

    Offline Data

    © Ernesto Ongaro

    Netflix Case Study:



    Netflix developed House of Cards by analyzing millions of users’ preferences for actors, themes, and delivery methods to create what they are calling their most successful series ever

    source

    Exploitative Offline

    Explorative Exploitative
    Online Data - -
    Offline Data -

    © Ernesto Ongaro

    Amazon Recommendations




    New Patterns of Innovation

    Pattern 1: Augmenting Products to Generate Data
    Pattern 2: Digitizing Assets
    Pattern 3: Combining Data Within and Across Industries
    Pattern 4: Trading Data
    Pattern 5: Codifying a Distinctive Service Capability



    source Rashik Parmar, Ian Mackenzie, David Cohn, and David Gann

    Reporting and Analytics?



    • Reporting is typically about filtering columns and rows and arranging them how you want the data to be displayed

    • Analytics is typically about aggregating the data in those rows and visualizing it in a crosstab or chart





    this is true for both NoSQL and SQL data



    exceptions: graph and tree visualizations and other specialized visualizations

    So....





    The labor of reporting and analytics on NoSQL is "flattening" data and fitting it into rows and columns 




    Part 4: What you can do today

    Take Inventory


    What data is on hand?
    What tools are available?
    What skills are available?



    What is your strategy?


    Remember: exploitative and explorative innovation using data is not accidental
    (source: Jaspersoft Big Data Survey, 2014)



    Be prepared to fail,

    but recover quickly and iterate.



    Questions?







    Next session: 4:20 PM
     Factors Preventing Successful Performance Management




    Thank you!

    ba4all

    By ernestoo