
Graph Databases
or
Crash Course to Data Management Paradigms
Felix Cornelius -[:FOR]-> ENG-INF-EBW F2 -[:AT]->

Outline
- Relational databases
- The Bookshop: A case study
- Enter The Graph
- Use cases
- Bottom line

Relational database
- Strictly structured
- Relational Model
where

{
{
Attribute
Tuple
Relation
| ID | Name | Surname | From | Knows |
|---|---|---|---|---|
| 0001 | John | Snow | Winterfell | NULL |
Primary Key
Schema
SQL
- De facto standard query language
- Relational algebra
- Create, Read, Update and Delete


Problems
1. 'Many-to-many' Relations
Author
Book
*
*
| ID | Title | Authors |
|---|---|---|
Books Table
primitive types!
| ID | Title | Author1 | Author2 | ... |
|---|---|---|---|---|
| ID | Title |
|---|---|
| 0001 | English G 2000 |
Junction Table
| Book ID | Author ID |
|---|---|
| #0001 | #0001 |
| #0001 | #0002 |
Authors Table
| ID | Name |
|---|---|
| 0001 | Hellmut Schwarz |
| 0002 | Barbara Derkow |
SQL 'Join' to map Primary and Foreign Keys
- Joins are executed at query time
- Memory intesive
- Exponential Cost

2. Static schema for each table
with strict structure

"Consistency"
Case Study: Book Shop
- Shop stores its Inventory and User Data
- Goal: Data Insight and Recommendations
| ID | Title | Type | ISBN | Price |
|---|---|---|---|---|
| 0001 | English G 2000 | Book | 3464350452 | 25.50 |
Inventory Table
Inventory-Authors Table
| Inventory ID | Author ID |
|---|---|
| #0001 | #0001 |
| #0001 | #0002 |
| ... | ... |
Authors Table
| ID | Name |
|---|---|
| 0001 | Hellmut Schwarz |
| 0002 | Barbara Derkow |
| ... | ... |
| ID | Title | Type | ISBN | Price |
|---|---|---|---|---|
| 0001 | English G 2000 | Book | 3464350452 | 25.50 |
| 0002 | Wired | Magazine | 1339-4832 | 6.90 |
| ID | Title | Type | ISBN | Price | IssueNr |
|---|---|---|---|---|---|
| 0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
| 0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
| ID | Title | Type | ISBN | Price | IssueNr |
|---|---|---|---|---|---|
| 0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
| 0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
| 0003 | The Lord of the Rings: The Two Towers | Book | 1763928562 | 15,00 | N/A |
| 0004 | The Lord of the Rings: Return of the King | Book | 1792625340 | 15,00 | N/A |
Follow Up Table
| Inventory ID 1 | Inventory ID 2 |
|---|---|
| #0003 | #0004 |
Series Table
| ID | Name |
|---|---|
| 0001 | The Lord of the Rings |
Inventory-Series Table
| Inventory ID | Series ID |
|---|---|
| #0003 | #0001 |
| #0004 | #0001 |
Medium Table
| ID | Name |
|---|---|
| 0001 | Paper Medium |
| 0002 | Digital Text Medium |
| 0003 | Digital Audio Medium |
Type Table
| ID | Name |
|---|---|
| 0001 | Book |
| 0002 | Book Series |
| 0003 | Magazine |
Type-Medium Table
| Type ID | Medium ID |
|---|---|
| #0001 | #0001 |
| #0002 | #0001 |
| #0003 | #0001 |

G=(V,E) to the rescue!
The Two Towers

Paper
Return of the King
The Lord of the Rings
J.R.R Tolkien
Peter
Price: 15.00
ISBN: 1792625340
Avg. User Rating: 8.5
:Medium
:Book
:User
:Book
:BookSeries
:Author
[is_a]
[is_a]
Type: Hardcover
[wrote]
[wrote]
[is_part_of]
[is_part_of]
[preceeds]
[succedes]
[purchased]
(date: '11.01.2017')
[rated]
(rating: 8.5)
Query the thing!

Cypher

MATCH (a:Author)-[:WROTE]->(b:Book)-[:PART_OF]->(:BookSeries)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as booksList all books by author
MATCH (a:Author)-[wrt:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN COUNT(wrt) as countCount all books by author
List all Series by author
MATCH (a:Author)-[:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as booksBasic Queries
Fun Queries
List all authors who wrote a book which is part of a series and received a higher rating than all its preceeding ones in that series
MATCH (a:Author)-[:WROTE]->(b1:Book)-[:FOLLOWS*]->(b2:Book)
WHERE b1.rating > b2.rating
RETURN a.name as authorsList all books not yet purchased by a user that are part of a series in which the user has rated a book with at least 8/10
START user=node:users(ID:1)
MATCH (user)-[rate:RATED]->(:Book)->[:PART_OF]->(:BookSeries)<-[:PART_OF]-(b:Book)
WHERE rate.rating >= 8 AND NOT (user)-[:PURCHASED]->(b)
RETURN b as bookSTART user1=node:users(ID:1)
MATCH (user1)-[rate1:RATED]->(b:Book)<-[:rate2:RATED]-(user2:User)
WHERE rate1.rating = rate2.rating
WITH collect(user2) as users
RETURN users order by count(b) descShow the users that a user shares the most book ratings with, sorted by rating count descending

Not convinced yet?
Scalability

Relational databases scale great
homogenous data with an infinite supply of hardware upgrades
on centralized,
- Not designed for distribution
- 'Innovative architectures' as shared storage, in-memory processing, distributed caching don't solve the core problem
The Query itself cannot scale
Query Complexity
Time RDBMS:
One table, n Rows:
Join two Tables; n,m Rows:
Indexes may reduce join to:
Gernerally m joint tables, n Rows:
Time Graph:
Node-to-node Traversal:
(Equivalent to single join)
Multiple Node Traversal:
(with m visited nodes, equivalent to m joins)
Query time is independent of graph size
Data variety

Data from IDC Digital Universe, April 2014
Todays data is dynamic, constantly changing.

WIFIMS: Would it fit into my spreadsheet?
Graph databases map our domain model objects to rich, optionally schemed nodes in the graph

Use Cases
Connected World where isolated pieces of information do not exist
- Graph-Based Search
- Fraud Detection
- Real-Time Recommendation Engines
- Social Networks
- Network & IT Operations
- GeoRouting
Bottom line

Nope.
- Structured data following a constant schema
- Simple queries, marginal interconnection
- Data is predictable
Relational Model:
Graphbased:
- Heterogenous, unstructured data
- Insight into relationships within changing data
- Complex real time queries
Appendix
Fun with Facebook Graph Search
(please try this at home)




Thank you!
Questions!
...please?
Graph Databases Crashcourse
By Felix Cornelius
Graph Databases Crashcourse
- 1,023