Demystifying NoSQL + Big Data
october 2014
Agenda
- Part 1: Relational and SQL
- Part 2: NoSQL and BigData
- Part 3: Value in NoSQL + Big Data
- Part 4: What you can do today
Part 1: Relational and SQL
1970 Elvis 1970 1970 1970 1970
1970 1970 1970 1970 1970 1970 1970 1970 1970 peace 1970 1970 1970 1970 1970 1970 1970 love 1970
1970 Nixon 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 birth of the relational database
A Relational Model of Data for Large Shared Data Banks
- Defined need for relations
- Defined need for query language
Full Name | Birth Date | Department |
---|---|---|
John Smith | Jan 1, 1970 | Marketing |
Jonas McKnight | Jun 7, 1965 | Sales |
Michael Jones | Dec 19, 1980 | Marketing |
Relational Databases
SEQUEL: A Structured English Query Language
SELECT * FROM employees WHERE Department=Marketing
Full Name | Birth Date | Department |
---|---|---|
John Smith | Jan 1, 1970 | Marketing |
Michael Jones | Dec 19, 1980 | Marketing |
Relational Databases + SQL...
a long and happy marriage
Relational Databases
- Still very relevant today to store operational data
- Use an RDBMS if you need all of these:
- Complete query language
- Transactions
-
Predefined schemas
- Consistency between replicas
-
Common limitations:
- scale
- replication
- unstructured data storage
- speed
Part 2: NoSQL and Big Data
Buzz words
NoSQL: A database that is not relational, data is stored more flexibly than column and rows. Queried through other ways than just SQL
Big Data: Data that is probably bigger than what can comfortably fit into a conventional system
NoSQL - different flavors
-
Document databases pair each key with a complex data structure known as a document.
-
Graph stores are used to store information about networks, such as social connections.
-
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value.
-
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
Example: Document Database
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
Why NoSQL?
- Scaling
- Simpler data model
- Volume (stream i/o)
- No schema
Why now?
- Price of storage ▼
- Speed of storage ▲
- Amount of data ▲
- Expectations from end users ▲
- Internet of Things (digitization)
- Availability of analytics tools
Part 3: Value in NoSQL and Big Data
Exploitative vs Explorative
Online vs Offline Processing
Opportunity Matrix
Explorative |
Exploitative
|
|
---|---|---|
Online Data |
|
|
Offline Data |
© Ernesto Ongaro
Super Bowl Power Outage
Power out? No problem. pic.twitter.com/dnQ7pOgC
— Oreo Cookie (@Oreo) February 4, 2013
Exploitative Online
Explorative |
Exploitative
|
|
---|---|---|
Online Data | ||
Offline Data |
© Ernesto Ongaro - source
Explorative Offline
Explorative |
Exploitative
|
|
---|---|---|
Online Data |
|
|
Offline Data |
|
© Ernesto Ongaro
Netflix Case Study:
Netflix developed House of Cards by analyzing millions of users’ preferences for
actors, themes, and delivery methods to create what they are calling their most successful series
ever
Exploitative Offline
Explorative |
Exploitative
|
|
---|---|---|
Online Data | - | - |
Offline Data | - |
© Ernesto Ongaro
Amazon Recommendations
New Patterns of Innovation
Reporting and Analytics?
-
Reporting is typically about filtering columns and rows and arranging them how you want the data to be displayed
-
Analytics is typically about aggregating the data in those rows and visualizing it in a crosstab or chart
this is true for both NoSQL and SQL data
So....
Part 4: What you can do today
Take Inventory
What is your strategy?
Be prepared to fail,
Questions?
ba4all
By ernestoo