From documents
data:image/s3,"s3://crabby-images/1e89f/1e89ffdff262050ff1a39c4251bf29f9eec526aa" alt=""
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
to graphs
#BuildStuffLT @hannelita
Slides at http://bit.ly/2fYqONz
This talk is about MongoDB and Neo4j :)
Code
http://bit.ly/2g15MPW
Hi!
- Computer Engineer
- Programming
- Electronics
- Math <3 <3
- Physics
- Lego
- Meetups
- Animals
- Coffee
- GIFs
- Pokémon
data:image/s3,"s3://crabby-images/63e1b/63e1b7f76e6b46622b848fe416aa799faf03fff5" alt=""
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
Disclaimer
Views are on my own
Project from late 2015
Mostly for Neo4j 2.x
Project
https://github.com/neo4j-contrib/neo4j_doc_manager
#BuildStuffLT @hannelita
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
"We need to restructure our data"
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
"Relational databases are not enough"
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
"Polyglot Databases"
data:image/s3,"s3://crabby-images/dac17/dac1728c6b8a8cbc2bb61f845414fdccbd444ebb" alt=""
#BuildStuffLT @hannelita
Document Oriented DB
- Flexible data model
- Easy to get started
- Easy to represent the data
#BuildStuffLT @hannelita
Store data as Documents!
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/77c3a/77c3a3338398e3e149bcf200ece78ab6571ae6bb" alt=""
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Imagine that we have talks of a conference
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Our Documents
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
Sometimes we need to get some extra information
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Possible questions
- Which talks have a specific topic (ex: 'Databases')
- Which speakers will also talk about this topic?
- What are the sessions that will be hold into Auditorium and are about this topic?
These are common questions
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
More questions
- Assuming that I do not want to change rooms, what is the best room to stay to get a higher number of sessions of a specific topic?
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
data:image/s3,"s3://crabby-images/6b4a0/6b4a043b706804c10a9f309064a5c99217c87206" alt=""
#BuildStuffLT @hannelita
Further work
- Recommendation system for the talks
- Recommendation system for speakers
- Build a tool to automatically build the sessions timetable based on topic distribution
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Looks like we need some graphs!
Graphs are everywhere
TEAM, Neo4j
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
We can build graphs with information from Mongo
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/56a5f/56a5fa6c89cb187f159d6fd2f3db3d65934f8406" alt=""
data:image/s3,"s3://crabby-images/af748/af748b0bc9bc0e9580e824650110479ef989dbb7" alt=""
From Documents to Graphs
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Neo4j super quick reference
- Graph oriented database
- Pure graph structure that you can persist
- Benefits of graph theory
- Large and active community
- Neotechnology
#BuildStuffLT @hannelita
Mongo Connetor
data:image/s3,"s3://crabby-images/7660a/7660a19833ccd9a2155e11ee67be74f112d31975" alt=""
https://github.com/10gen-labs/mongo-connector
data:image/s3,"s3://crabby-images/77e0e/77e0e9dbdabe360eb4208c240b6178c4a567d331" alt=""
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/2344c/2344c2db547cab5237011c160eb56df421f393ee" alt=""
You
MC
Mongo Connector
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/2344c/2344c2db547cab5237011c160eb56df421f393ee" alt=""
You
Call Mongo Connector
MC
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/2344c/2344c2db547cab5237011c160eb56df421f393ee" alt=""
You
Call Mongo Connector
MC
Hi!
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/2344c/2344c2db547cab5237011c160eb56df421f393ee" alt=""
You
Points where's your Mongo
MC
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/2344c/2344c2db547cab5237011c160eb56df421f393ee" alt=""
You
Points where's your Mongo
Points where is the other database
MC
data:image/s3,"s3://crabby-images/1e89f/1e89ffdff262050ff1a39c4251bf29f9eec526aa" alt=""
DM
Elasticsearch
Solr
(Doc Manager)
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
MC
data:image/s3,"s3://crabby-images/1e89f/1e89ffdff262050ff1a39c4251bf29f9eec526aa" alt=""
DM
Elasticsearch
Solr
(Doc Manager)
Creates a thread to watch Mongo Actions (replica)
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
Mongo Connector
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
MC
data:image/s3,"s3://crabby-images/1e89f/1e89ffdff262050ff1a39c4251bf29f9eec526aa" alt=""
DM
Elasticsearch
Solr
(Doc Manager)
Creates a thread to watch Mongo Actions
Call actions on a Doc Manager
We can translate these actions
into a Graph Structure
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
Neo4j Doc Manager
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
mongo-connector (pip)
py2neo (neo4j)
data:image/s3,"s3://crabby-images/a6f99/a6f99d09441761f5f3b39668f05bb80a9d3bb8a0" alt=""
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
class DocManager(DocManagerBase):
def __init__(self, url, auto_commit_interval=DEFAULT_COMMIT_INTERVAL,
unique_key='_id', chunk_size=DEFAULT_MAX_BULK, **kwargs):
def upsert(self, doc, namespace, timestamp):
def bulk_upsert(self, docs, namespace, timestamp):
def update(self, document_id, update_spec, namespace, timestamp):
def remove(self, document_id, namespace, timestamp):
def search(self, start_ts, end_ts):
We can retrieve Mongo commands with this interface class
#BuildStuffLT @hannelita
We support Python 2 and Python 3
#BuildStuffLT @hannelita
It will run like an auto importer. You just need to provide the database endpoints
#BuildStuffLT @hannelita
We track the auto generated nodes with the label :Document
#BuildStuffLT @hannelita
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
Sync Mongo with Neo4j
data:image/s3,"s3://crabby-images/3b29e/3b29efe0df5b4aa76a3a5cd84d7e34641fbd8ae8" alt=""
#BuildStuffLT @hannelita
db.talks.insert( { "session":
#BuildStuffLT @hannelita
db.talks.insert( { "session":
#BuildStuffLT @hannelita
db.talks.insert( { "session": ...
Document:talks
Root node in Neo4j
#BuildStuffLT @hannelita
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
#BuildStuffLT @hannelita
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
Document:session
Document:speaker
#BuildStuffLT @hannelita
{ "session": { "title": "12 Years of Spring: An Open Source Journey" }, "topics": ["keynote", "spring"], "room": "Auditorium", "speaker": { "name": "Juergen Hoeller" } }
#BuildStuffLT @hannelita
JSON properties become node properties
#BuildStuffLT @hannelita
All the nodes are connected to the root node
#BuildStuffLT @hannelita
data:image/s3,"s3://crabby-images/c5929/c5929d3ec22b988d9a5cc4063569adf6cf40bc07" alt=""
#BuildStuffLT @hannelita
Nested documents
"session" : { "title" : "12 Years of Spring: An Open Source Journey", "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.", "conference" : { "city" : "London" } }
#BuildStuffLT @hannelita
Nested documents
"session" : { "title" : "12 Years of Spring: An Open Source Journey", "abstract" : "Spring emerged as a core open source project in early 2003 and evolved to a broad portfolio of open source projects up until 2015.", "conference" : { "city" : "Dublin" } }
#BuildStuffLT @hannelita
Nested documents
Document:session
Document:conference
Child node
Parent node
#BuildStuffLT @hannelita
JSON array
"session" : { "tracks": [{ "main":"Python" }, { "second":"Data" }] ... }
#BuildStuffLT @hannelita
JSON array
Document:session
Document:track0
talks_track0
talks_track1
Document:track1
#BuildStuffLT @hannelita
We also support explicit ids to create a relationship
#BuildStuffLT @hannelita
Explicit ids
{ "name": "Hanneli", "account_id": "32434ab2341192", "url": "medium.com/@hannelita" }
session_account
Document:session
Document:account
#BuildStuffLT @hannelita
We also support a configuration file if you don't want to import all your data
#BuildStuffLT @hannelita
We can specify the namespaces that we want to import:
"include": ["test.talks", "docs.info"] (config.json file)
#BuildStuffLT @hannelita
It is also possible to specify the fields and collections via command line:
mongo-connector -m localhost:27017 -t http://localhost:7474/db/data -d neo4j_doc_manager -i room,timeslot,title
#BuildStuffLT @hannelita
Agenda
- Quick note about document oriented databases
- Graph databases can help your data model
- Creating connectors for MongoDB
- neo4j_doc_manager general architecture
- Data mapping
- Challenges
#BuildStuffLT @hannelita
1. Data model is a challenge.
#BuildStuffLT @hannelita
Different representations (Documents -> Graphs)
#BuildStuffLT @hannelita
2. Avoiding orphan nodes
#BuildStuffLT @hannelita
remove, set and unset commands can generate orphans
#BuildStuffLT @hannelita
3. Batching - maximum of 10k per batch
#BuildStuffLT @hannelita
Projects
mongo-conenctor:
https://github.com/10gen-labs/mongo-connector
neo4j-doc-manager:
https://github.com/neo4j-contrib/neo4j_doc_manager
#BuildStuffLT @hannelita
Next Projects
Neo4j Cassandra connector :)
https://github.com/neo4j-contrib/neo4j-cassandra-connector
#BuildStuffLT @hannelita
Lessons learned
- Polyglot persistence is great; be responsible!
- Graphs can be very useful for simplifying queries
- Real applications: fraud detection
- University (UK) is using it :)
#BuildStuffLT @hannelita
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
data:image/s3,"s3://crabby-images/fcaf9/fcaf9682acaa1a883760a17a5af5bebf293b2e03" alt=""
From Documents to Graphs - Buildstuff.lt
By Hanneli Tavante (hannelita)
From Documents to Graphs - Buildstuff.lt
- 2,188