Graph Databases
or
Crash Course to Data Management Paradigms
Felix Cornelius -[:FOR]-> ENG-INF-EBW F2 -[:AT]->
where
{
{
Attribute
Tuple
Relation
ID | Name | Surname | From | Knows |
---|---|---|---|---|
0001 | John | Snow | Winterfell | NULL |
Primary Key
Schema
SQL
Problems
1. 'Many-to-many' Relations
Author
Book
*
*
ID | Title | Authors |
---|---|---|
Books Table
primitive types!
ID | Title | Author1 | Author2 | ... |
---|---|---|---|---|
ID | Title |
---|---|
0001 | English G 2000 |
Junction Table
Book ID | Author ID |
---|---|
#0001 | #0001 |
#0001 | #0002 |
Authors Table
ID | Name |
---|---|
0001 | Hellmut Schwarz |
0002 | Barbara Derkow |
SQL 'Join' to map Primary and Foreign Keys
2. Static schema for each table
with strict structure
"Consistency"
ID | Title | Type | ISBN | Price |
---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 |
Inventory Table
Inventory-Authors Table
Inventory ID | Author ID |
---|---|
#0001 | #0001 |
#0001 | #0002 |
... | ... |
Authors Table
ID | Name |
---|---|
0001 | Hellmut Schwarz |
0002 | Barbara Derkow |
... | ... |
ID | Title | Type | ISBN | Price |
---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 |
0002 | Wired | Magazine | 1339-4832 | 6.90 |
ID | Title | Type | ISBN | Price | IssueNr |
---|---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
ID | Title | Type | ISBN | Price | IssueNr |
---|---|---|---|---|---|
0001 | English G 2000 | Book | 3464350452 | 25.50 | N/A |
0002 | Wired | Magazine | 1339-4832 | 6.90 | 331 |
0003 | The Lord of the Rings: The Two Towers | Book | 1763928562 | 15,00 | N/A |
0004 | The Lord of the Rings: Return of the King | Book | 1792625340 | 15,00 | N/A |
Follow Up Table
Inventory ID 1 | Inventory ID 2 |
---|---|
#0003 | #0004 |
Series Table
ID | Name |
---|---|
0001 | The Lord of the Rings |
Inventory-Series Table
Inventory ID | Series ID |
---|---|
#0003 | #0001 |
#0004 | #0001 |
Medium Table
ID | Name |
---|---|
0001 | Paper Medium |
0002 | Digital Text Medium |
0003 | Digital Audio Medium |
Type Table
ID | Name |
---|---|
0001 | Book |
0002 | Book Series |
0003 | Magazine |
Type-Medium Table
Type ID | Medium ID |
---|---|
#0001 | #0001 |
#0002 | #0001 |
#0003 | #0001 |
G=(V,E) to the rescue!
The Two Towers
Paper
Return of the King
The Lord of the Rings
J.R.R Tolkien
Peter
Price: 15.00
ISBN: 1792625340
Avg. User Rating: 8.5
:Medium
:Book
:User
:Book
:BookSeries
:Author
[is_a]
[is_a]
Type: Hardcover
[wrote]
[wrote]
[is_part_of]
[is_part_of]
[preceeds]
[succedes]
[purchased]
(date: '11.01.2017')
[rated]
(rating: 8.5)
Query the thing!
Cypher
MATCH (a:Author)-[:WROTE]->(b:Book)-[:PART_OF]->(:BookSeries)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books
List all books by author
MATCH (a:Author)-[wrt:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN COUNT(wrt) as count
Count all books by author
List all Series by author
MATCH (a:Author)-[:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books
Basic Queries
Fun Queries
List all authors who wrote a book which is part of a series and received a higher rating than all its preceeding ones in that series
MATCH (a:Author)-[:WROTE]->(b1:Book)-[:FOLLOWS*]->(b2:Book)
WHERE b1.rating > b2.rating
RETURN a.name as authors
List all books not yet purchased by a user that are part of a series in which the user has rated a book with at least 8/10
START user=node:users(ID:1)
MATCH (user)-[rate:RATED]->(:Book)->[:PART_OF]->(:BookSeries)<-[:PART_OF]-(b:Book)
WHERE rate.rating >= 8 AND NOT (user)-[:PURCHASED]->(b)
RETURN b as book
START user1=node:users(ID:1)
MATCH (user1)-[rate1:RATED]->(b:Book)<-[:rate2:RATED]-(user2:User)
WHERE rate1.rating = rate2.rating
WITH collect(user2) as users
RETURN users order by count(b) desc
Show the users that a user shares the most book ratings with, sorted by rating count descending
Not convinced yet?
Relational databases scale great
homogenous data with an infinite supply of hardware upgrades
on centralized,
The Query itself cannot scale
Time RDBMS:
One table, n Rows:
Join two Tables; n,m Rows:
Indexes may reduce join to:
Gernerally m joint tables, n Rows:
Time Graph:
Node-to-node Traversal:
(Equivalent to single join)
Multiple Node Traversal:
(with m visited nodes, equivalent to m joins)
Query time is independent of graph size
Data from IDC Digital Universe, April 2014
Todays data is dynamic, constantly changing.
WIFIMS: Would it fit into my spreadsheet?
Graph databases map our domain model objects to rich, optionally schemed nodes in the graph
Connected World where isolated pieces of information do not exist
Nope.
Relational Model:
Graphbased:
Fun with Facebook Graph Search
(please try this at home)
Questions!
...please?