GraphDB

introduction

Agenda

  • Why GraphDB?

  • What is GraphDB?

  • What make Graph DB
    faster than SQL?

  • Why GraphDB not popular?

  • Demo

Goal

Become interested in GraphDB
&
Know a bit about it

Why GraphDB?

SQL and NoSQL

SQL: relational model

MySQL

NoSQL: Not Only SQL

MongoDB

MySQL

  • Mature solution
  • Stable, reliable and powerful
  • Atomic transactions support
  • Round the clock up time
  • JOIN support
  • Privilege and password security system

MongoDB

Poor performance scaling

  • Mature solution
  • Stable, reliable and powerful
  • Atomic transactions support
  • Round the clock up time
  • JOIN support
  • Privilege and password security system
  • Offers auto-sharing
  • High scalability
  • Document validation
  • Easy environment setup
  • Easy schema change

Poor performance scaling

Poor support for transcation

High memory usage

MySQL

  • Complex queries
  • Transactional Applications
  • Stable solution

MongoDB

  • Data storage
  • Scalability
  • MVP

But what if relation is the leading role in your system?

MySQL? MongoDB?

  • Planty of data
  • Complex queries

Graph DB

What is GraphDB?

Graph

 A graph is a collection of vertices and edges

If SQL is

If SQL is

How it transform to GraphDB

SQL Statement

Graph
Statement

SELECT name FROM Person
LEFT JOIN Person_Department
  ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
  ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"

SQL Statement

Graph
Statement

SELECT name FROM Person
LEFT JOIN Person_Department
  ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
  ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"
MATCH (p:Person)-[:WORKS_AT]->(d:Dept)
WHERE d.name = "IT Department"
RETURN p.name

Let's try a sample

Who are Bob’s friends?

Who is friends with Bob?

Who are friends of my friends?

SELECT p1.Person
FROM Person p1 JOIN PersonFriend
  ON PersonFriend.FriendID = p1.ID
JOIN Person p2
  ON PersonFriend.PersonID = p2.ID
WHERE p2.Person = ‘Bob’
SELECT p1.Person
FROM Person p1 JOIN PersonFriend
  ON PersonFriend.FriendID = p1.ID
JOIN Person p2
  ON PersonFriend.PersonID = p2.ID
WHERE p2.Person = ‘Bob’
SELECT p1.Person AS PERSON, p2.Person AS FRIEND_OF_FRIEND
FROM PersonFriend pf1 JOIN Person p1
  ON pf1.PersonID = p1.ID
JOIN PersonFriend pf2
  ON pf2.PersonID = pf1.FriendID
JOIN Person p2
  ON pf2.FriendID = p2.ID
WHERE p1.Person = ‘Alice’ AND pf2.FriendID <> p1.ID

DB Graph 

Performance

What make Graph DB
faster than SQL?

Two Join Approaches: Nested Loop and Hash Join

Nested Loop

FOR erow IN (select * from employees where X=Y) LOOP
  FOR drow IN (select * from departments where erow is matched) LOOP
    output values from erow and drow
  END LOOP
END LOOP

Hash Join

FOR small_table_row IN (SELECT * FROM small_table)
LOOP
  slot_number := HASH(small_table_row.join_key);
  INSERT_HASH_TABLE(slot_number,small_table_row);
END LOOP

FOR large_table_row IN (SELECT * FROM large_table)
LOOP
   slot_number := HASH(large_table_row.join_key);
   small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
   IF small_table_row FOUND
   THEN
      output small_table_row + large_table_row;
   END IF;
END LOOP;

A simple example

By joining E with D and then E with P, we can calculate for example sum payment for a given department.

Given departments d1 and d2, we want to find all the payments to its employees. 

Nested Loop

FOR erow IN (select * from employees where X=Y) LOOP
  FOR drow IN (select * from departments where erow is matched) LOOP
    output values from erow and drow
  END LOOP
END LOOP

Worst Case:

2 * |E|*|F|

Hash Join

FOR small_table_row IN (SELECT * FROM small_table)
LOOP
  slot_number := HASH(small_table_row.join_key);
  INSERT_HASH_TABLE(slot_number,small_table_row);
END LOOP

FOR large_table_row IN (SELECT * FROM large_table)
LOOP
   slot_number := HASH(large_table_row.join_key);
   small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
   IF small_table_row FOUND
   THEN
      output small_table_row + large_table_row;
   END IF;
END LOOP;

Worst Case:

2 +|E|+2*|E|+|P|

Graph

Worst Case:

k

 (k is payments result number)

Time complexity

  • Nested Loop:

  • Hash Join:

  • GraphDB:

O(|E| * |P|)
O(|E|+|P|)
O(k)

Why GraphDB
is not popular?

Why GraphDB
is not popular?

  • New solution
  • Hard to decentralized
  • Join-depth problem

Join Depth Problem

DGraph

  • Write by Golang
  • Native support GraphQL
  • No dependency with 3th party library
type Task {
    ...
}

type User {
    ...
}
type Task {
    id: ID!
    title: String!
    completed: Boolean!
    user: User!
}

type User {
    username: String! @id
    name: String
    tasks: [Task] @hasInverse(field: user)
}

Running

// Run Dgraph
docker run -it -p 8080:8080 dgraph/standalone:master

// Update schema
curl -X POST localhost:8080/admin/schema --data-binary '@schema.graphql'

Mutation Data

mutation {
  addUser(input: [
    {
      username: "amber.yan@graphdb.com",
      name: "Amber",
      tasks: [
        {
          title: "Avoid touching your face",
          completed: false,
        },
        {
          title: "Stay safe",
          completed: false
        },
        {
          title: "Avoid crowd",
          completed: true,
        },
        {
          title: "Wash your hands often",
          completed: true
        }
      ]
    }
  ]) {
    user {
      username
      name
      tasks {
        id
        title
      }
    }
  }
}
COPY

Q & A

Made with Slides.com