Graph Databases
or
Crash Course to Data Management Paradigms
Felix Cornelius -[:FOR]-> ENG-INF-EBW F2 -[:AT]->
Outline
- Relational databases
- The Bookshop: A case study
- Enter The Graph
- Use cases
- Bottom line
Relational database
- Strictly structured
- Relational Model
where
{
{
Attribute
Tuple
Relation
ID | Name | Surname | From | Knows |
---|---|---|---|---|
0001 | John | Snow | Winterfell | NULL |
Primary Key
Schema
SQL
- De facto standard query language
- Relational algebra
- Create, Read, Update and Delete
Problems
1. 'Many-to-many' Relations
Author
Book
*
*
ID | Title | Authors |
---|---|---|
Books Table
primitive types!
ID | Title | Author1 | Author2 | ... |
---|---|---|---|---|
ID | Title |
---|---|
0001 | English G 2000 |
Junction Table
Book ID | Author ID |
---|---|
#0001 | #0001 |
#0001 | #0002 |
Authors Table
ID | Name |
---|---|
0001 | Hellmut Schwarz |
0002 | Barbara Derkow |
SQL 'Join' to map Primary and Foreign Keys
- Joins are executed at query time
- Memory intesive
- Exponential Cost
2. Static schema for each table
with strict structure
"Consistency"
Case Study: Book Shop
- Shop stores its Inventory and User Data
- Goal: Data Insight and Recommendations
ID | Title | Type | ISBN | Price |
---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 |
Inventory Table
Inventory-Authors Table
Inventory ID | Author ID |
---|---|
#0001 | #0001 |
#0001 | #0002 |
... | ... |
Authors Table
ID | Name |
---|---|
0001 | Hellmut Schwarz |
0002 | Barbara Derkow |
... | ... |
ID | Title | Type | ISBN | Price |
---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 |
0002 | Wired | Magazine | 1339-4832 | 6.90 |
ID | Title | Type | ISBN | Price | IssueNr |
---|---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
ID | Title | Type | ISBN | Price | IssueNr |
---|---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
0003 | The Lord of the Rings: The Two Towers | Book | 1763928562 | 15,00 | N/A |
0004 | The Lord of the Rings: Return of the King | Book | 1792625340 | 15,00 | N/A |
Follow Up Table
Inventory ID 1 | Inventory ID 2 |
---|---|
#0003 | #0004 |
Series Table
ID | Name |
---|---|
0001 | The Lord of the Rings |
Inventory-Series Table
Inventory ID | Series ID |
---|---|
#0003 | #0001 |
#0004 | #0001 |
Medium Table
ID | Name |
---|---|
0001 | Paper Medium |
0002 | Digital Text Medium |
0003 | Digital Audio Medium |
Type Table
ID | Name |
---|---|
0001 | Book |
0002 | Book Series |
0003 | Magazine |
Type-Medium Table
Type ID | Medium ID |
---|---|
#0001 | #0001 |
#0002 | #0001 |
#0003 | #0001 |
G=(V,E) to the rescue!
The Two Towers
Paper
Return of the King
The Lord of the Rings
J.R.R Tolkien
Peter
Price: 15.00
ISBN: 1792625340
Avg. User Rating: 8.5
:Medium
:Book
:User
:Book
:BookSeries
:Author
[is_a]
[is_a]
Type: Hardcover
[wrote]
[wrote]
[is_part_of]
[is_part_of]
[preceeds]
[succedes]
[purchased]
(date: '11.01.2017')
[rated]
(rating: 8.5)
Query the thing!
Cypher
MATCH (a:Author)-[:WROTE]->(b:Book)-[:PART_OF]->(:BookSeries)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books
List all books by author
MATCH (a:Author)-[wrt:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN COUNT(wrt) as count
Count all books by author
List all Series by author
MATCH (a:Author)-[:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books
Basic Queries
Fun Queries
List all authors who wrote a book which is part of a series and received a higher rating than all its preceeding ones in that series
MATCH (a:Author)-[:WROTE]->(b1:Book)-[:FOLLOWS*]->(b2:Book)
WHERE b1.rating > b2.rating
RETURN a.name as authors
List all books not yet purchased by a user that are part of a series in which the user has rated a book with at least 8/10
START user=node:users(ID:1)
MATCH (user)-[rate:RATED]->(:Book)->[:PART_OF]->(:BookSeries)<-[:PART_OF]-(b:Book)
WHERE rate.rating >= 8 AND NOT (user)-[:PURCHASED]->(b)
RETURN b as book
START user1=node:users(ID:1)
MATCH (user1)-[rate1:RATED]->(b:Book)<-[:rate2:RATED]-(user2:User)
WHERE rate1.rating = rate2.rating
WITH collect(user2) as users
RETURN users order by count(b) desc
Show the users that a user shares the most book ratings with, sorted by rating count descending
Not convinced yet?
Scalability
Relational databases scale great
homogenous data with an infinite supply of hardware upgrades
on centralized,
- Not designed for distribution
- 'Innovative architectures' as shared storage, in-memory processing, distributed caching don't solve the core problem
The Query itself cannot scale
Query Complexity
Time RDBMS:
One table, n Rows:
Join two Tables; n,m Rows:
Indexes may reduce join to:
Gernerally m joint tables, n Rows:
Time Graph:
Node-to-node Traversal:
(Equivalent to single join)
Multiple Node Traversal:
(with m visited nodes, equivalent to m joins)
Query time is independent of graph size
Data variety
Data from IDC Digital Universe, April 2014
Todays data is dynamic, constantly changing.
WIFIMS: Would it fit into my spreadsheet?
Graph databases map our domain model objects to rich, optionally schemed nodes in the graph
Use Cases
Connected World where isolated pieces of information do not exist
- Graph-Based Search
- Fraud Detection
- Real-Time Recommendation Engines
- Social Networks
- Network & IT Operations
- GeoRouting
Bottom line
Nope.
- Structured data following a constant schema
- Simple queries, marginal interconnection
- Data is predictable
Relational Model:
Graphbased:
- Heterogenous, unstructured data
- Insight into relationships within changing data
- Complex real time queries
Appendix
Fun with Facebook Graph Search
(please try this at home)
Thank you!
Questions!
...please?
Copy of deck
By Maju Ansari
Copy of deck
- 826