From documents

to graphs

#PyConIE

Hi!

  • Computer Engineer
  • Programming
  • Electronics
  • Math <3 <3
  • Physics
  • Lego
  • Meetups
  • Animals
  • Coffee
  • GIFs
  • Pokémon

Keynote at Python Brasil:

http://slides.com/hannelitavante-hannelita/rust-type-system-pybr12

Disclaimer

Views are on my own

Project from late 2015

Mostly for Neo4j 2.x

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

"We need to restructure our data"

"Relational databases are not enough"

Document Oriented DB

  • Flexible data model
  • Easy to get started
  • Easy to represent the data

Store data as Documents!

Imagine that we have talks of a conference

Our Documents

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

Sometimes we need to get some extra information

Possible questions

  • Which talks have a specific topic (ex: 'Databases')
  • Which speakers will also talk about this topic?
  • What are the sessions that will be hold into Auditorium and are about this topic?

These are common questions

More questions

  • Assuming that I do not want to change rooms, what is the best room to stay to get a higher number of sessions of a specific topic?

Further work

  • Recommendation system for the talks
  • Recommendation system for speakers
  • Build a tool to automatically build the sessions timetable based on topic distribution

Looks like we need some graphs!

Graphs are everywhere

TEAM, Neo4j

We can build graphs with information from Mongo

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

From Documents to Graphs

Mongo Connetor

https://github.com/10gen-labs/mongo-connector

Mongo Connector

You

MC

Mongo Connector

Mongo Connector

You

Call Mongo Connector

MC

Mongo Connector

You

Call Mongo Connector

MC

Hi!

Mongo Connector

You

Points where's your Mongo

MC

Mongo Connector

You

Points where's your Mongo

Points where is the other database

MC

DM

Elasticsearch

Solr

(Doc Manager)

Mongo Connector

MC

DM

Elasticsearch

Solr

(Doc Manager)

Creates a thread to watch Mongo Actions (replica)

Mongo Connector

MC

DM

Elasticsearch

Solr

(Doc Manager)

Creates a thread to watch Mongo Actions

Call actions on a Doc Manager

We can translate these actions

into a Graph Structure

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

Neo4j Doc Manager

mongo-connector (pip)

py2neo (neo4j)

class DocManager(DocManagerBase):

  def __init__(self, url, auto_commit_interval=DEFAULT_COMMIT_INTERVAL,
                 unique_key='_id', chunk_size=DEFAULT_MAX_BULK, **kwargs):
    

  def upsert(self, doc, namespace, timestamp):

  def bulk_upsert(self, docs, namespace, timestamp):

  def update(self, document_id, update_spec, namespace, timestamp):

  def remove(self, document_id, namespace, timestamp):
    
  def search(self, start_ts, end_ts):

We can retrieve Mongo commands with this interface class

We support Python 2 and Python 3

It will run like an auto importer. You just need to provide the database endpoints

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

Sync Mongo with Neo4j

db.talks.insert(  { "session":
db.talks.insert(  { "session":
db.talks.insert(  { "session": ...

Document:talks

Root node in Neo4j

{
  "session": {
    "title": "12 Years of Spring: An Open Source Journey"
  },
  "topics":  ["keynote", "spring"],
  "room": "Auditorium",
  "speaker": {
    "name": "Juergen Hoeller"
  }
}
{
  "session": {
    "title": "12 Years of Spring: An Open Source Journey"
  },
  "topics":  ["keynote", "spring"],
  "room": "Auditorium",
  "speaker": {
    "name": "Juergen Hoeller"
  }
}

Document:session

Document:speaker

{
  "session": {
    "title": "12 Years of Spring: An Open Source Journey"
  },
  "topics":  ["keynote", "spring"],
  "room": "Auditorium",
  "speaker": {
    "name": "Juergen Hoeller"
  }
}

JSON properties become node properties

All the nodes are connected to the root node

Nested documents

"session" : {
    "title" : "12 Years of Spring: An Open Source Journey",
    "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.",
    "conference" : {
      "city" : "London"
    }
  }

Nested documents

"session" : {
    "title" : "12 Years of Spring: An Open Source Journey",
    "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.",
    "conference" : {
      "city" : "Dublin"
    }
  }

Nested documents

Document:session

Document:conference

Child node

Parent node

JSON array

"session" : { 
  "tracks": [{ "main":"Python" },
            { "second":"Data" }]
... }

JSON array

Document:session

Document:track0

talks_track0

talks_track1

Document:track1

We also support explicit ids to create a relationship

Explicit ids

{
  "name": "Hanneli",
  "account_id": "32434ab2341192",
  "url": "medium.com/@hannelita"
}

session_account

Document:session

Document:account

We also support a configuration file if you don't want to import all your data

We can specify the namespaces that we want to import:

"include": ["test.talks", "docs.info"] (config.json file)

It is also possible to specify the fields and collections via command line:

mongo-connector -m localhost:27017 -t http://localhost:7474/db/data -d neo4j_doc_manager -i room,timeslot,title

 

Agenda

  • Quick note about document oriented databases
  • Graph databases can help your data model
  • Creating connectors for MongoDB
  • neo4j_doc_manager general architecture
  • Data mapping
  • Challenges

1. Data model is a challenge.

Different representations (Documents -> Graphs)

2. Avoiding orphan nodes

3. Batching - maximum of 10k per batch

Projects

mongo-conenctor: 

https://github.com/10gen-labs/mongo-connector

neo4j-doc-manager: 

https://github.com/neo4j-contrib/neo4j_doc_manager

 

Next Projects

Neo4j Cassandra connector :) 

https://github.com/neo4j-contrib/neo4j-cassandra-connector

Lessons learned

  • Polyglot persistence is great
  • Graphs can be very useful for simplifying queries
  • Real applications: fraud detection
  • University (UK) is using it :)

Thank you :)

Questions?

 

hannelita@gmail.com

@hannelita

From Documents to Graphs - Pycon Ireland 2016

By Hanneli Tavante (hannelita)

From Documents to Graphs - Pycon Ireland 2016

  • 3,941