Graphs and Neo4j - from Hydropower Plants to PCBs
#qconsp @hannelita
Hi!
- Computer Engineering
- Programming
- Electronics
- Mathematics
- Physics
- Lego
- Meetups
- Coffee
- GIFs
#qconsp @hannelita
Disclaimer
This content represents the speaker's personal overview.
Feedback (positive or not) accepted here: hannelita@gmail.com
Not all numeric data or names are true, aiming not to harm company secrets.
Modelling cases are real; apologises for technical terms flood.
Structure - Cases
- Use case context
- Modelling with relational databases (and fails)
- Graph modelling
- Evolving the model
- Epic fails
Final Considerations
- Main benefits
- Support tools
#qconsp @hannelita
Title Text
Have you ever been into the darkness?
Running out of electrical energy
Candle lights <3
Brazil is a huge producer of electrical energy.
#qconsp @hannelita
Specially from hydropower plants
#qconsp @hannelita
Case 1 - Context
#qconsp @hannelita
How do we distribute electrical energy? How are the power plants distributed?
#qconsp @hannelita
http://sigel.aneel.gov.br/sigel.html
Accessed in 28/3/2016
http://sigel.aneel.gov.br/sigel.html
Access in 28/3/2016
Map information
- Power plant location
- Transmission lines
- Supply capacity
- Total capacity
- Nearby cities
- Distribution
- Boundaries / States
- Hydrographic basin
- Dealers
Electrical
Political
Environmental
Economy
#qconsp @hannelita
Challenge
Build a sytem that stores all these information and how the data is related.
#qconsp @hannelita
Case 1 - Modelling with relational databases
#qconsp @hannelita
Electrical
Political
Environmental
Economy
CREATE TABLE power_plant;
CREATE TABLE city;
CREATE TABLE hydrographic_basin;
CREATE TABLE dealer;
Question 1:
How do you represent a power plant neighbourhood?
- Self-relationship;
- Denormalisation (neighbours_ids)
#qconsp @hannelita
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
- Given a coordinate data set, sum the population inside the resultant polygon.
#qconsp @hannelita
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
2. Match power plants coordinates based on supply capacity
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
3. Verify properties into transmission_lines table
It is not that difficult
#qconsp @hannelita
It is not over!
#qconsp @hannelita
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
4. Verify if there are industries nearby
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
5. Verify HDI
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
6. Verify dealers interest.
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
powe_plant
#qconsp @hannelita
7. Verify if the region has alternative energy sources
#qconsp @hannelita
Question 3:
Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?
#qconsp @hannelita
Question 3:
#qconsp @hannelita
Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?
Maybe tables are not the best structures to represent information about energy distribution.
#qconsp @hannelita
Neo4j comes to rescue!
#qconsp @hannelita
Quick intro - Neo4j
- Graph oriented database
- ACID
- Structures: Node, Relationship, Index and Label
- Maintained by Neotechnology
- Open Source
- Active community
#qconsp @hannelita
Case 1 - Graph Modelling
#qconsp @hannelita
Step 1 - Power plants become nodes
#qconsp @hannelita
Powered by Arrows - http://www.apcjones.com/arrows/#
CREATE (n:PowerPlant:HydropowerPlant { name : 'Itaipu', capacity : '14000' })
#qconsp @hannelita
Usina => Power Plant
Hidreletrica => Hydropower
capacidade => capacity
Step 2 - Cities become nodes
#qconsp @hannelita
Step 3 - Transmission lines become relationships!
#qconsp @hannelita
Itaipu - Ivaiporã
#qconsp @hannelita
MATCH (a:HidropowerPlant),(b:City)
WHERE a.name = 'Itaipu' AND b.name = 'Ivaipora'
CREATE (a)-[r:PROVIDES { cable_capacity : 765, rl : 330 }]->(b)
#qconsp @hannelita
Multiple relationships for several lines
#qconsp @hannelita
MATCH (a:HidrepowerPlant),(b:City)
WHERE a.name = 'Itaipu' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:PROVIDES { cable_capacity : 500 }]->(b)
MATCH (a:City),(b:City)
WHERE a.name = 'Ivaipora' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:MESH { capacidade_cabo : 500 }]->(b)
#qconsp @hannelita
Step 4 - Dealers become nodes
#qconsp @hannelita
CREATE (n:Dealer { name : 'Fake',
percentage : 85, margin : 72 })
MATCH (a:Dealer),(b:City)
WHERE a.name = 'Ficticio' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:ATTENDS]->(b)
MATCH (a:Dealer),(b:PowerPlant)
WHERE a.name = 'Ficticio' AND b.name = 'Ita'
CREATE (a)-[r:OWNS]->(b)
#qconsp @hannelita
Step 5 - Queries poderosas
MATCH (n:PowerPlant {capacity : 14000}),
(c:City {name : 'Sao Paulo'})
p = shortestPath((n)-[]-(c)) RETURN p
Queries determine optinal paths for energy supply
#qconsp @hannelita
Case 1 - Evolving the model
#qconsp @hannelita
Important: add Indexes for the most frequently used properties
Capacity, population, coordinates
#qconsp @hannelita
Important[2]: Labels
:City, :PowerPlant, :Region
Usually, elements can be grouped deserve a label.
#qconsp @hannelita
More evolving - turn other electrical elements into nodes
#qconsp @hannelita
CREATE (n:Component:Transformer
{ tag : 'F. Iguacu', type : 'Terciario', mva : 1650, total : 4 })
MATCH (a:Transformer),(b:PowerPlant)
WHERE a.tag = 'F. Iguacu' AND b.name = 'Itaipu'
CREATE (a)-[r:INSTALLED]->(b)
#qconsp @hannelita
Neo4j is flexible for modelling.
#qconsp @hannelita
Case 1 - Epic Fails
#qconsp @hannelita
Too many nodes for cities! (There are too many cities)
Problem
#qconsp @hannelita
Too much information being loaded on MATCH; performance problems
Impact
#qconsp @hannelita
Remove some :Cities and add :Region label, grouping cities
Solution
#qconsp @hannelita
Do not save all the CREATE operations into a file
Problem
#qconsp @hannelita
Problems with backup / replication.
Impact
#qconsp @hannelita
Do not perform CREATE operations into Web interface!
Add queries into a Git repository - https://github.com/hannelita/qconsp
Solution
#qconsp @hannelita
Case 1 - Extra - Insights
#qconsp @hannelita
Find hidden information
#qconsp @hannelita
Example: mapping the components made a big difference for a deeper model evaluation.
#qconsp @hannelita
Mapping components...
#qconsp @hannelita
Case 2 - Context
#qconsp @hannelita
A-HA! We could use graphs for (...) [complete]
#qconsp @hannelita
PCB Routing / Trail design
#qconsp @hannelita
Yes! But we can go further.
Let's analyse the board layout and components display.
#qconsp @hannelita
Case 2 - Modelling with relational databases
#qconsp @hannelita
Component
Trail
Sensor
Layer
CREATE TABLE component;
CREATE TABLE trail;
CREATE TABLE sensor;
CREATE TABLE layer;
Question 1:
A sensor detects temperature raise. How would you infer if it is a problem from a component or from the trail?
#qconsp @hannelita
Question 1
Usually you need extra information from the neighbours sensor. How do you model that?
- Self-relationship;
- Denormalise (sensors_ids)
Déjà vu!
#qconsp @hannelita
Question 2:
Which trails does affect more components at the same time? (ex: If Trail A breaks, the entire system stops working)
#qconsp @hannelita
Question 3:
Is it possible to extract some hidden or unseen information from the circuit by modelling it within a graph?
#qconsp @hannelita
Case 2 - Graph modelling
#qconsp @hannelita
Step 1: Components become nodes
CREATE (n:Component:Primary { name : 'R1',
type : 'resistor', value : '10K' })
CREATE (n:Component:Primary { name : 'C1',
type : 'capacitor', group : 'polyester',
value : '100p' })
CREATE (n:Component:CI { name : 'CI1',
type : 'LM741', seller : 'Texas' })
#qconsp @hannelita
Step 2: Map trails into relationships
MATCH (a:Primary),(c:CI)
WHERE a.name = 'R1' AND c.name = 'CI1'
CREATE (a)-[r:TRAILS { thickness : 2, dilation : 0.5 }]->(c)
#qconsp @hannelita
Step 3: Map Layers into Labels
CREATE (n:Component:Primary:LAYER1
{ name : 'R1', type : 'resistor', value : '10K' })
CREATE (n:Component:Primary:LAYER2
{ name : 'C1', type : 'capacitor',
group : 'polyester', value : '100p' })
CREATE (n:Component:CI:LAYER1 { name : 'CI1',
type : 'LM741', seller : 'Texas' })
Easy to fetch all the components from a specific Layer
#qconsp @hannelita
Case 2 - Evolving the model
#qconsp @hannelita
Step 4: Map sensors into nodes
CREATE (n:Sensor:LAYER1
{ name : 'SS1', type : 'light'})
CREATE (n:Sensor:LAYER2
{ name : 'SS2', type : 'temperature' })
MATCH (aPrimary),(s:Sensor)
WHERE a.name = 'R1' AND c.name = 'SS1'
CREATE (s)-[MONITORS { light : 2 }]->(a)
MATCH (a:Primary),(s:Sensor)
WHERE a.name = 'R1' AND c.name = 'SS2'
CREATE (s)-[r:MONITORS { temperature : 37 }]->(a)
#qconsp @hannelita
MATCH (n:Sensor)-[MONITORS]-(c:Component)
WHERE n.temperature > 60
RETURN c.name, r.dilation
Decide if it is the component of if it is the trail that is damaged.
Step 5: Run the following periodic query:
#qconsp @hannelita
Case 2 - Epic Fails
#qconsp @hannelita
Too many updates for the sensors; Neo4j has some writing restrictions
Problem
#qconsp @hannelita
Bad performance and high RAM consumption
Impact
#qconsp @hannelita
Remove some sensors node or jump to Enterprise version.
Solution
#qconsp @hannelita
Final considerations
- Flexible models
- Find hidden relations
- Easy to get started
- Active tool and active community
- It can be useful in several scenarios, beyond social networks and recommendation systems.
#qconsp @hannelita
Tools
- Data Import (Relational Databases, MongoDB, Cassandra, JSON, CSV)
- Visualization tools
- REST API
#qconsp @hannelita
References
- Neo4j Meetup in São Paulo
- Neo4j Slack Users
- Neo4j Training (Free)
- Arrows (Sketching tool)
#qconsp @hannelita
Special thanks
- Neo Technology, @lyonwj, @ryguyrg e @mesirii
- B.C., for the excellent feedback and review
- @Codeminer42
#qconsp @hannelita
- Prof. Maurílio and Prof. Justino.
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
#qconsp @hannelita
Graphs and Neo4j - From Hydropower plants to PCBs
By Hanneli Tavante (hannelita)
Graphs and Neo4j - From Hydropower plants to PCBs
Graphs and Neo4j - From Hydropower plants to PCBs - English version - QCON 2016
- 5,997