Neo4j

Graph Database

Graph Database?

  • Made up of nodes (aka vertices/points) and relationships (aka edges)

Node

Relationship

Graph Database.

  • Maps very well to many collections of data: social networks, hierarchies, etc.
  • The conceptual data model is the only data model
  • Relationships as a first-class concept

Neo4j

  • Written in Java 😣 & Scala
    • ​But! You never need to deal with Java
    • Acts as a black-box server in practice, unless you are using a language on the JVM
    • You do need to deal with the JVM and its appetite for resources

Neo4j

  • Open source-ish (restricted enterprise features)
  • Has a comprehensive HTTP REST API
  • Awesome web-based console/data explorer
  • The most widely and actively used graph database, lots of support available.
  • Performance: don't ask me, I write javascript for a living

Neo4j: Negatives

  • Some the of most important features you would want in production are hidden behind the prohibitively expensive enterprise license.
    • ​Hot backups
      • Ways to get around this...
    • Clustering (Sharding & Replication)
  • JVM
  • No database segmentation: no good way to share a database (but a few bad ways).

Neo4j as a graph db

  • Nodes can be given one or more types ("labels").
(:beer:ipa { name: "Lervig Rye IPA"})
  • Has indexing, constraints
  • Supports various graph algorithms by default (shortest path, dijkstra)
  • Has it's own query language (Cypher) for graph traversal, which has become reasonably mature
  • Cypher is the star of the show

Setup:

wget http://neo4j.com/artifact.php?name=neo4j-co...
tar xf neo4j-community-2.3.1-unix.tar.gz
cd neo4j-community-2.3.1-unix
bin/neo4j start

Done!

 

  • Caveat: doesn't work super well with upstart, but you weren't using that anyway right?

Usage example

Neo4j with Node.js (using the Seraph library which I maintain and co/es6)

let co = require('co');
let seraph = require('seraph/co');
let db = seraph('http://localhost:7474');

co(function *() {
  let jon = yield db.save({ name: 'Jon Packer' }, 'person');
  let brik = yield db.save({ name: 'BRIK Videobase AS' }, 'company');
  let rel = yield db.relate(jon, 'works_at', brik, { for: '4 years' });
  return { jon, brik, rel }
}).then(function(output) {
  console.log(output);
});

Cypher

  • Sort of like an SQL for graph databases.
    • Except not completely insane
  • So far the only implementation is Neo4j's, but they're working to change that
  • Familiar if you've ever written SQL and code.

That example again...

This time in Cypher.

CREATE (jon:person { name: 'Jon' })
         -[:works_at { for: '4 years' }]->
         (brik:company { name: 'BRIK Videobase AS' })

* looks even better when you don't need to split it over 3 lines!

Cypher

  • Declarative graph query language
  • Before it, all the popular graph traversal methods were using imperative languages
  • Reasonably simple language and syntax—borrows much from SQL for familiarity

Cypher vs. SQL

  • Query to get brewery, beer and stock level of Lervig Rye IPA at Bergen Bystasjonen.

SQL:

SELECT *
FROM beers
INNER JOIN breweries ON breweries.brewery_id = beers.beer_id
INNER JOIN beer_stock ON beer_stock.beer_id = beers.beer_id
INNER JOIN stores ON beer_stock.store_id = stores.store_id
WHERE beers.beer_title = 'Lervig Rye IPA'
AND stores.store_name = 'Bergen, Bergen Storsenter Vinmonopol'

Cypher:

MATCH (ipa:beer { title: 'Lervig Rye IPA' })<-[:brews]-(lervig:brewery),
      ipa-[stock:in_stock]->(store:store { name: 'Bergen, Bergen Storsenter Vinmonopol' })
RETURN *

Cypher: MATCH

  • The MATCH statement starts a query/traversal and specifies a subset of the graph to start with
MATCH (ipa:beer { title: 'Lervig Rye IPA' })

identifier

label

predicate

  • Could also be written with a WHERE, which gives more flexibility but worse performance
MATCH (ipa:beer)
WHERE ipa.title = 'Lervig Rye IPA'
OR ipa.title = 'Lervig Galaxy IPA'
RETURN ipa

Cypher: MATCH

  • MATCH can specify many different types of nodes and relationships.
MATCH (:brewery)-[:brews]->(:beer)-[:brewed_in]->(:country)

relationship

directionality

  • Like nodes, relationships can specify a predicate
MATCH (b:beer)-[stock:in_stock { quanitity: 25 }]-(s:store)

Cypher: WHERE

  • WHERE must immediately follow a selector clause like MATCH, and further reduces that selection.
MATCH (beer:beer)-[:has_style]->(:style { name: 'India Pale Ale (IPA)' }),
      beer-[stockLevel:in_stock]->(store:store)
WHERE store.name =~ 'Bergen.*'
AND beer.ratebeerWeightedAverage > 3.9
RETURN *

Cypher: CREATE

  • CREATE will create a new pattern in the graph. Every node and relationship will be created, regardless of whether or not something similar already exists.
CREATE (beer:beer { title: 'I made this up' })
        <-[:brews]-(:brewery { name: 'Monadic Ale' })
  • If there was already a "Monadic Ale" brewery, now there's two...

Cypher: MERGE

  • MERGE is like MATCH | CREATE, it will create the entire pattern if it does not find it in the graph
MERGE (beer:beer { title: 'I made this up' })
        <-[:brews]-(:brewery { name: 'Monadic Ale' })

Cypher: MERGE

  • MERGE can be used in conjuction with MATCH, to create or match part of a graph
MATCH (monadic:brewery { name: 'Monadic Ale' })
MERGE monadic-[:brews]->(beer:beer { title: 'Katajanjoulu' })
  • The MERGE will only do something if the MATCH matched a :brewery
  • If there is already a beer "Katajanjoulu" brewed by "Monadic Ale", this will do nothing
  • If we didn't do the match first, and the brewery already existed, a duplicate brewery would be created: MERGE either matches the entire pattern or creates it.

Cypher: CREATE UNIQUE

  • CREATE UNIQUE is the terser version of what we just did:
CREATE UNIQUE (:brewery { name: 'Monadic Ale' })
   -[:brews]->(beer:beer { title: 'Katajanjoulu' })
  • Assuming our Brewery already exists in the graph, a duplicate will not be created
  • Only the parts that do not already exist will be created
  • Will throw an error if there is ambiguity

Cypher: ..UD

  • Various other commands exist that work in predictable ways, such as:
    • SET - update a property
    • REMOVE - remove a property
    • DELETE - delete a node (error if relationships)
    • DETACH DELETE - delete a node and relationships

Cypher: RETURN

  • RETURN declares what will be output from your query
  • Various transformations can be done on the data selected by your query before returning it
  • Here's a few examples. Output is shown as JS objects read by Seraph in Node.

Cypher: COLLECT

  • COLLECT aggregates many rows into a collection. This works particularly well for something like a one-to-many relationship:
MATCH (veholt:brewery { name: 'Veholt Mikrobryggeri' })
        -[:brews]->(beer:beer) 
RETURN veholt, COLLECT(beer.title) as beers

Results in

[ { veholt: { name: 'Veholt Mikrobryggeri', id: 3385 },
    beers: [ 'Veholt Humlehelvete Double IPA Originalen',
             'Veholt Jimmy Red' ] } ]

If we didn't use COLLECT:

[ { veholt: { name: 'Veholt Mikrobryggeri', id: 3385 },
    beer: 'Veholt Humlehelvete Double IPA Originalen' },
  { veholt: { name: 'Veholt Mikrobryggeri', id: 3385 },
    beer: 'Veholt Jimmy Red' } ]

Cypher: Lists

  • Various functions exist for working with lists:
    • EXTRACT (usually called map elsewhere)
    • REDUCE
    • FILTER
  • These functions (COLLECT included) do not have to be used as a part of a RETURN clause, they can be used in various places in a query, for example in a WHERE.

Web Console

Finn.

Neo4j

By jonpacker

Neo4j

  • 1,165