December 2012 GraphDB Meetup

We-[:THANK {for:'meetup space!'}]->CustomInk
We-[:THANK {for:'pizza'}]->Ikanow
We-[:THANK {for:'beer'}]->NeoTechnologies


Agenda
6:30 - Pizza/Beer + Networking, etc.
7:00 - Announcements
7:05 - Craig Vitter: Ikanow Infinit.e + Dev API Intro
7:30 - Short Break 
7:40 - Wes Freeman: Quick Cypher Review, 
       Demo: Quick Scala app to import Infinit.e data to Neo4j, 
       Example Cypher queries


Next Meetup
January ~10th! (even though we're aiming for bimonthly)
Probably somewhere in NOVA again

Why Neo4j?

  • Optimized for highly connected data
  • If you're doing nested self joins in your SQL, you probably need Neo4j
  • Find connections between records
  • Real-time queries for recommendations (as opposed to batch processing Hadoop-style)
  • Hierarchical (Tree, ACL, etc.) data
  • Graph ... data? Yeah, it's good at that, too.
  • Proven Lucene-based indexing as default for the pluggable index provider system, and full-text search features as a bonus

Neo4j: CYPHER

  • Declarative query language (can also do updates)
  • Easy to learn
  • Mostly unique to Neo4j
  • Still new, so not entirely optimized, but improving rapidly!
  • Try milestone releases for best Cypher experience (usually!)

Cypher QUERIES: BROAD STROKES

  • START at starting points (often with index lookups)
  • MATCH a graph pattern with a symbolic syntax
  • Use WHERE to filter the resulting matches
  • Use WITH to compute intermediate results for your next query part
  • RETURN the [aggregated] data you want, with aliases
  • LIMIT  and SKIP the number of results you want
  • ORDER BY works just like it does in SQL
"It all starts with the START" --MIchael Hunger


CYPHER: INSERTING/UpdatinG

  • CREATE lets you create with the human-friendly Geoff format (and other formats): 
    CREATE (me {name:'Wes'}), 
      me-[:is_friends_with {since:2012}]->(you {name:'Friend_01'});
  • DELETE lets you delete nodes, relationships
  • SET lets you update properties
  • You can build queries and use predicates while updating and deleting, similar to SQL

CYPHER: There's more

TransferRing Data from Infinit.e

  • https://github.com/wfreeman/infinit.e-neo4j-demo
  • Mostly written in a few hours at MoDevHack
  • In Scala using AnormCypher (shameless plug)
    http://anormcypher.org/
  • Uses the Document Query from Infinit.e's REST API
  • Entities are nodes, associations are relationships
  • Uses the indexName of the entities as unique identifiers (the same entity found in multiple documents will have the same indexName, usually)
  • Uses the verb and verb category together as the relationship type: "current_career", etc.

Some minor annoyances

  1. AnormCypher exceptions aren't descriptive (I'll fix that!)
  2. Cypher doesn't allow parameterized relationship types; I had to concatenate the query string (ugly!)
  3. No easy way to use CREATE UNIQUE with index lookups; I usually just broke it up into two Cypher calls
  4. JSON parsing in Scala requires specifying a schema or dealing with weird Map conversions--haven't found a great way to do it, yet.
  5. Index support in Cypher needs improving--coming soon!

Finally, to the cool stuff: DATA



Cleaning Up...

// delete nodes only connected to documents
start n=node(*) 
match doc-[r:references]->n
where length(n--()) = 1
delete n,r;
// delete unconnected nodes. who cares about them?
start n=node(*) 
where not(n--()) 
delete n;

Some Example Queries

// get a feel for our data (what kind of verbs were extracted)
start entity=node(*)
match entity-[association]->entity2
where type(association) <> "references"
return entity.name, type(association), entity2.name;
// find the entities most referred to by documents
start entity=node:node_auto_index('indexName:*')
match doc-[:references]->entity
return count(doc) as docCount, entity
order by docCount desc
limit 10;
// find who endorsed which candidates...
start candidate_career=node:node_auto_index('name:candidate')
match candidate_career<-[:current_career]-candidate<-[:political_endorsement]-endorser
return distinct candidate, endorser;

More Example Queries

// find competitors of the competitors... if they exist...
start company=node:node_auto_index('name:*')
match company-[:company_competitor]->competitor-[:company_competitor]->competitors_of_competitor
with company, competitor.name as competitor, collect(competitors_of_competitor.name) as comps
return company.name, collect(competitor), collect(comps);

december meetup ikanow with neo4j

By Wes Freeman