Graph Databases

 

or

Crash Course to Data Management Paradigms

 

Felix Cornelius -[:FOR]-> ENG-INF-EBW F2 -[:AT]->

Outline

  • Relational databases
  • The Bookshop: A case study
  • Enter The Graph
  • Use cases
  • Bottom line

Relational database

  • Strictly structured
  • Relational Model
R \subseteq A_1 \times ... \times A_n
RA1×...×AnR \subseteq A_1 \times ... \times A_n

where

A_1 \times ... \times A_n = \{(a_1,...,a_n) \space | \space a_1\in A_1,...,a_n\in A_n\}
A1×...×An={(a1,...,an)  a1A1,...,anAn}A_1 \times ... \times A_n = \{(a_1,...,a_n) \space | \space a_1\in A_1,...,a_n\in A_n\}

{

{

Attribute

Tuple

Relation

ID Name Surname From Knows
0001 John Snow Winterfell NULL

Primary Key

Schema

SQL

  • De facto standard query language
  • Relational algebra
  • Create, Read, Update and Delete

Problems

1. 'Many-to-many' Relations

Author

Book

*

*

ID Title Authors

Books Table

primitive types!

ID Title Author1 Author2 ...
ID Title
0001 English G 2000

Junction Table

Book ID Author ID
#0001 #0001
#0001 #0002

Authors Table

ID Name
0001 Hellmut Schwarz 
0002 Barbara Derkow

SQL 'Join' to map Primary and Foreign Keys

  • Joins are executed at query time
  • Memory intesive
  • Exponential Cost

2. Static schema for each table

with strict structure

"Consistency"

Case Study: Book Shop

  • Shop stores its Inventory and User Data
  • Goal: Data Insight and Recommendations
ID Title Type ISBN Price
0001 English G 2000 Book 3464350452 25.50

Inventory Table

Inventory-Authors Table

Inventory ID Author ID
#0001 #0001
#0001 #0002
... ...

Authors Table

ID Name
0001 Hellmut Schwarz 
0002 Barbara Derkow
... ...
ID Title Type ISBN Price
0001 English G 2000 Book 3464350452 25.50
0002 Wired Magazine 1339-4832 6.90
ID Title Type ISBN Price IssueNr
0001 English G 2000 Book 3464350452 25.50 N/A
0002 Wired Magazine 1339-4832 6.90 331
ID Title Type ISBN Price IssueNr
0001 English G 2000 Book 3464350452 25.50 N/A
0002 Wired Magazine 1339-4832 6.90 331
0003 The Lord of the Rings: The Two Towers Book 1763928562 15,00 N/A
0004 The Lord of the Rings: Return of the King Book 1792625340 15,00 N/A

Follow Up Table

Inventory ID 1 Inventory ID 2
#0003 #0004

Series Table

ID Name
0001 The Lord of the Rings

Inventory-Series Table

Inventory ID Series ID
#0003 #0001
#0004 #0001

Medium Table

ID Name
0001 Paper Medium
0002 Digital Text Medium
0003 Digital Audio Medium

Type Table

ID Name
0001 Book
0002 Book Series
0003 Magazine

Type-Medium Table

Type ID Medium ID
#0001 #0001
#0002 #0001
#0003 #0001

G=(V,E) to the rescue!

The Two Towers

Paper

Return of the King

The Lord of the Rings

J.R.R Tolkien

Peter

Price: 15.00

ISBN: 1792625340

Avg. User Rating: 8.5

:Medium

:Book

:User

:Book

:BookSeries

:Author

[is_a]

[is_a]

Type: Hardcover

[wrote]

[wrote]

[is_part_of]

[is_part_of]

[preceeds]

[succedes]

[purchased]

(date: '11.01.2017')

[rated]

(rating: 8.5)

Query the thing!

Cypher
MATCH (a:Author)-[:WROTE]->(b:Book)-[:PART_OF]->(:BookSeries)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books

List all books by author 

MATCH (a:Author)-[wrt:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN COUNT(wrt) as count

Count all books by author 

List all Series by author 

MATCH (a:Author)-[:WROTE]->(b:Book)
WHERE a.name = "J.J.R. Tolkien"
RETURN b.name as books

Basic Queries

Fun Queries

List all authors who wrote a book which is part of a series and received a higher rating than all its preceeding ones in that series

MATCH (a:Author)-[:WROTE]->(b1:Book)-[:FOLLOWS*]->(b2:Book)
WHERE b1.rating > b2.rating
RETURN a.name as authors

List all books not yet purchased by a user that are part of a series in which the user has rated a book with at least 8/10 

START user=node:users(ID:1)
MATCH (user)-[rate:RATED]->(:Book)->[:PART_OF]->(:BookSeries)<-[:PART_OF]-(b:Book)
WHERE rate.rating >= 8 AND NOT (user)-[:PURCHASED]->(b)
RETURN b as book
START user1=node:users(ID:1)
MATCH (user1)-[rate1:RATED]->(b:Book)<-[:rate2:RATED]-(user2:User)
WHERE rate1.rating = rate2.rating
WITH collect(user2) as users
RETURN users order by count(b) desc

Show the users that a user shares the most book ratings with, sorted by rating count descending

Not convinced yet?

Scalability

Relational databases scale great

homogenous data with an infinite supply of hardware upgrades

on centralized,

  • Not designed for distribution
  • 'Innovative architectures' as shared storage, in-memory processing, distributed caching don't solve the core problem

The Query itself cannot scale

Query Complexity

Time RDBMS:

\mathcal{O}(n)
O(n)\mathcal{O}(n)

One table, n Rows:

Join two Tables; n,m Rows:

\mathcal{O}(nm)
O(nm)\mathcal{O}(nm)

Indexes may reduce join to:

\mathcal{O}(log(n)m)
O(log(n)m)\mathcal{O}(log(n)m)

Gernerally m joint tables, n Rows:

\mathcal{O}(n^m)
O(nm)\mathcal{O}(n^m)

Time Graph:

Node-to-node Traversal:

\mathcal{O}(1)
O(1)\mathcal{O}(1)

(Equivalent to single join)

Multiple Node Traversal:

\mathcal{O}(m)
O(m)\mathcal{O}(m)

(with m visited nodes, equivalent to m joins)

Query time is independent of graph size

Data variety

Todays data is dynamic, constantly changing.

WIFIMS: Would it fit into my spreadsheet?

Graph databases map our domain model objects to rich, optionally schemed nodes in the graph

Use Cases

Connected World where isolated pieces of information do not exist

  • Graph-Based Search
  • Fraud Detection
  • Real-Time Recommendation Engines
  • Social Networks
  • Network & IT Operations
  • GeoRouting

Bottom line

Nope.

  • Structured data following a constant schema
  • Simple queries, marginal interconnection
  • Data is predictable

Relational Model:

Graphbased:

  • Heterogenous, unstructured data
  • Insight into relationships within changing data
  • Complex real time queries

Appendix

Fun with Facebook Graph Search

(please try this at home)

Thank you!

Questions!

...please?

Copy of deck

By Maju Ansari

Copy of deck

  • 816