From documents
to graphs
#PyConIE
Hi!
- Computer Engineer
- Programming
- Electronics
- Math <3 <3
- Physics
- Lego
- Meetups
- Animals
- Coffee
- GIFs
- Pokémon
Keynote at Python Brasil:
http://slides.com/hannelitavante-hannelita/rust-type-system-pybr12
Disclaimer
Views are on my own
Project from late 2015
Mostly for Neo4j 2.x
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
"We need to restructure our data"
"Relational databases are not enough"
Document Oriented DB
- Flexible data model
- Easy to get started
- Easy to represent the data
Store data as Documents!
Imagine that we have talks of a conference
Our Documents
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
Sometimes we need to get some extra information
Possible questions
- Which talks have a specific topic (ex: 'Databases')
- Which speakers will also talk about this topic?
- What are the sessions that will be hold into Auditorium and are about this topic?
These are common questions
More questions
- Assuming that I do not want to change rooms, what is the best room to stay to get a higher number of sessions of a specific topic?
Further work
- Recommendation system for the talks
- Recommendation system for speakers
- Build a tool to automatically build the sessions timetable based on topic distribution
Looks like we need some graphs!
Graphs are everywhere
TEAM, Neo4j
We can build graphs with information from Mongo
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
From Documents to Graphs
Mongo Connetor
https://github.com/10gen-labs/mongo-connector
Mongo Connector
You
MC
Mongo Connector
Mongo Connector
You
Call Mongo Connector
MC
Mongo Connector
You
Call Mongo Connector
MC
Hi!
Mongo Connector
You
Points where's your Mongo
MC
Mongo Connector
You
Points where's your Mongo
Points where is the other database
MC
DM
Elasticsearch
Solr
(Doc Manager)
Mongo Connector
MC
DM
Elasticsearch
Solr
(Doc Manager)
Creates a thread to watch Mongo Actions (replica)
Mongo Connector
MC
DM
Elasticsearch
Solr
(Doc Manager)
Creates a thread to watch Mongo Actions
Call actions on a Doc Manager
We can translate these actions
into a Graph Structure
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
Neo4j Doc Manager
mongo-connector (pip)
py2neo (neo4j)
class DocManager(DocManagerBase):
def __init__(self, url, auto_commit_interval=DEFAULT_COMMIT_INTERVAL,
unique_key='_id', chunk_size=DEFAULT_MAX_BULK, **kwargs):
def upsert(self, doc, namespace, timestamp):
def bulk_upsert(self, docs, namespace, timestamp):
def update(self, document_id, update_spec, namespace, timestamp):
def remove(self, document_id, namespace, timestamp):
def search(self, start_ts, end_ts):
We can retrieve Mongo commands with this interface class
We support Python 2 and Python 3
It will run like an auto importer. You just need to provide the database endpoints
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
Sync Mongo with Neo4j
db.talks.insert( { "session":
db.talks.insert( { "session":
db.talks.insert( { "session": ...
Document:talks
Root node in Neo4j
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
Document:session
Document:speaker
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
JSON properties become node properties
All the nodes are connected to the root node
Nested documents
"session" : { "title" : "12 Years of Spring: An Open Source Journey", "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.", "conference" : { "city" : "London" } }
Nested documents
"session" : { "title" : "12 Years of Spring: An Open Source Journey", "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.", "conference" : { "city" : "Dublin" } }
Nested documents
Document:session
Document:conference
Child node
Parent node
JSON array
"session" : { "tracks": [{ "main":"Python" }, { "second":"Data" }] ... }
JSON array
Document:session
Document:track0
talks_track0
talks_track1
Document:track1
We also support explicit ids to create a relationship
Explicit ids
{ "name": "Hanneli", "account_id": "32434ab2341192", "url": "medium.com/@hannelita" }
session_account
Document:session
Document:account
We also support a configuration file if you don't want to import all your data
We can specify the namespaces that we want to import:
"include": ["test.talks", "docs.info"] (config.json file)
It is also possible to specify the fields and collections via command line:
mongo-connector -m localhost:27017 -t http://localhost:7474/db/data -d neo4j_doc_manager -i room,timeslot,title
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
1. Data model is a challenge.
Different representations (Documents -> Graphs)
2. Avoiding orphan nodes
3. Batching - maximum of 10k per batch
Projects
mongo-conenctor:
https://github.com/10gen-labs/mongo-connector
neo4j-doc-manager:
https://github.com/neo4j-contrib/neo4j_doc_manager
Next Projects
Neo4j Cassandra connector :)
https://github.com/neo4j-contrib/neo4j-cassandra-connector
Lessons learned
- Polyglot persistence is great
- Graphs can be very useful for simplifying queries
- Real applications: fraud detection
- University (UK) is using it :)
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
From Documents to Graphs - Pycon Ireland 2016
By Hanneli Tavante (hannelita)
From Documents to Graphs - Pycon Ireland 2016
- 3,914