Graphs and Neo4j - from Hydropower Plants to PCBs
#froscon @hannelita
Hi!
- Computer Engineering
- Programming
- Electronics
- Mathematics
- Physics
- Lego
- Meetups
- Coffee
- GIFs
@hannelita
Disclaimer
This content represents the speaker's personal overview.
Feedback (positive or not) accepted here: hannelita@gmail.com
Not all numeric data or names are true, aiming not to harm company secrets.
Modelling cases are real; apologises for technical terms flood.
Disclaimer
This talk is based on Neo4j 2.x; Version 3.x was released recently.
Structure - Cases
- Use case context
- Modelling with relational databases (and fails)
- Graph modelling
- Evolving the model
- Epic fails
Final Considerations
- Main benefits
- Support tools
Title Text
Have you ever been into the darkness?
Running out of energy
Candle lights <3
Brazil is a huge producer of electrical energy.
Specially from hydropower plants
Case 1 - Context
How do we distribute electrical energy? How are the power plants distributed?
http://sigel.aneel.gov.br/sigel.html
Accessed in 28/3/2016
http://sigel.aneel.gov.br/sigel.html
Access in 28/3/2016
Map information
- Power plant location
- Transmission lines
- Supply capacity
- Total capacity
- Nearby cities
- Distribution
- Boundaries / States
- Hydrographic basin
- Dealers
Electrical
Political
Environmental
Economy
Challenge
Build a sytem that stores all these information and how the data is related.
Case 1 - Modelling with relational databases
Electrical
Political
Environmental
Economy
CREATE TABLE power_plant;
CREATE TABLE city;
CREATE TABLE hydrographic_basin;
CREATE TABLE dealer;
Question 1:
How do you represent a power plant neighbourhood?
- Self-relationship;
- Denormalisation (neighbours_ids)
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
- Given a coordinate data set, sum the population inside the resultant polygon.
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
city
power_plant
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
2. Match power plants coordinates based on supply capacity
city
power_plant
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
3. Verify properties into transmission_lines table
city
power_plant
It is not that difficult
It is not over!
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
4. Verify if there are industries nearby
city
power_plant
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
5. Verify HDI
city
power_plant
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
6. Verify dealers interest.
city
power_plant
Question 2:
Which is the best power plant to provide energy for a group of cities?
id | usage (month, in Mwh) | population (milion) | coordinate |
---|---|---|---|
1 | 40 | 13 | |
2 | 11 | 2 |
id | capacity ( Mwh) | transmission_line (PK) | coordinate |
---|---|---|---|
1 | 95 | 22 | |
2 | 11 | 1 |
7. Verify if the region has alternative energy sources
city
power_plant
Question 3:
Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?
Question 3:
Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?
Maybe tables are not the best structures to represent information about energy distribution.
Neo4j comes to rescue!
Quick intro - Neo4j
- Graph oriented database
- ACID
- Structures: Node, Relationship, Index and Label
- Maintained by Neotechnology
- Open Source
- Active community
Case 1 - Graph Modelling
Step 1 - Power plants become nodes
Powered by Arrows - http://www.apcjones.com/arrows/#
CREATE (n:PowerPlant:HydropowerPlant { name : 'Itaipu', capacity : '14000' })
Usina => Power Plant
Hidreletrica => Hydropower
capacidade => capacity
Step 2 - Cities become nodes
Step 3 - Transmission lines become relationships!
Itaipu - Ivaiporã
MATCH (a:HidropowerPlant),(b:City)
WHERE a.name = 'Itaipu' AND b.name = 'Ivaipora'
CREATE (a)-[r:PROVIDES { cable_capacity : 765, rl : 330 }]->(b)
Multiple relationships for several lines
MATCH (a:HidrepowerPlant),(b:City)
WHERE a.name = 'Itaipu' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:PROVIDES { cable_capacity : 500 }]->(b)
MATCH (a:City),(b:City)
WHERE a.name = 'Ivaipora' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:MESH { capacidade_cabo : 500 }]->(b)
Step 4 - Dealers become nodes
CREATE (n:Dealer { name : 'Fake',
percentage : 85, margin : 72 })
MATCH (a:Dealer),(b:City)
WHERE a.name = 'Ficticio' AND b.name = 'Cascavel Oeste'
CREATE (a)-[r:ATTENDS]->(b)
MATCH (a:Dealer),(b:PowerPlant)
WHERE a.name = 'Ficticio' AND b.name = 'Ita'
CREATE (a)-[r:OWNS]->(b)
Step 5 - Queries poderosas
MATCH (n:PowerPlant {capacity : 14000}),
(c:City {name : 'Sao Paulo'})
p = shortestPath((n)-[]-(c)) RETURN p
Queries determine optinal paths for energy supply
Case 1 - Evolving the model
Important: add Indexes for the most frequently used properties
Capacity, population, coordinates
Important[2]: Labels
:City, :PowerPlant, :Region
Usually, elements that can be grouped deserve a label.
More: turn other electrical elements into nodes
CREATE (n:Component:Transformer
{ tag : 'F. Iguacu', type : 'Terciario', mva : 1650, total : 4 })
MATCH (a:Transformer),(b:PowerPlant)
WHERE a.tag = 'F. Iguacu' AND b.name = 'Itaipu'
CREATE (a)-[r:INSTALLED]->(b)
Neo4j is flexible for modelling.
Case 1 - Epic Fails
Too many nodes for cities! (There are too many cities)
Problem
Too much information being loaded on MATCH; performance problems
Impact
Remove some :Cities and add :Region label, grouping cities
Solution
Not saving all the CREATE operations into a file
Problem
Problems with backup / replication.
Impact
Do not perform CREATE operations into Web interface!
Add queries into a Git repository - https://github.com/hannelita/qconsp
Solution
Case 1 - Extra - Insights
Find hidden information
Example: mapping the components made a big difference for a deeper model evaluation.
Mapping components...
Case 2 - Context
A-HA! We could use graphs for (...) [complete]
PCB Routing / Trail design
Yes! But we can go further.
Let's analyse the board layout and components display.
Case 2 - Modelling with relational databases
Component
Trail
Sensor
Layer
CREATE TABLE component;
CREATE TABLE trail;
CREATE TABLE sensor;
CREATE TABLE layer;
Question 1:
A sensor detects a temperature raise. How would you infer if it is a problem from a component or from the trail?
Question 1
Usually, you need extra information from the sensors nearby. How do you model that?
- Self-relationship;
- Denormalise (sensors_ids)
Déjà vu!
Question 2:
Which trails do affect more components at the same time? (ex: If Trail A breaks, the entire system stops working)
Question 3:
Is it possible to extract some hidden or unseen information from the circuit by modelling it within a graph?
Case 2 - Graph modelling
Step 1: Components become nodes
CREATE (n:Component:Primary { name : 'R1',
type : 'resistor', value : '10K' })
CREATE (n:Component:Primary { name : 'C1',
type : 'capacitor', group : 'polyester',
value : '100p' })
CREATE (n:Component:CI { name : 'CI1',
type : 'LM741', seller : 'Texas' })
Step 2: Map trails into relationships
MATCH (a:Primary),(c:CI)
WHERE a.name = 'R1' AND c.name = 'CI1'
CREATE (a)-[r:TRAILS { thickness : 2, dilation : 0.5 }]->(c)
Step 3: Map Layers into Labels
CREATE (n:Component:Primary:LAYER1
{ name : 'R1', type : 'resistor', value : '10K' })
CREATE (n:Component:Primary:LAYER2
{ name : 'C1', type : 'capacitor',
group : 'polyester', value : '100p' })
CREATE (n:Component:CI:LAYER1 { name : 'CI1',
type : 'LM741', seller : 'Texas' })
Easy to fetch all the components from a specific Layer
Case 2 - Evolving the model
Step 4: Map sensors into nodes
CREATE (n:Sensor:LAYER1
{ name : 'SS1', type : 'light'})
CREATE (n:Sensor:LAYER2
{ name : 'SS2', type : 'temperature' })
MATCH (aPrimary),(s:Sensor)
WHERE a.name = 'R1' AND c.name = 'SS1'
CREATE (s)-[MONITORS { light : 2 }]->(a)
MATCH (a:Primary),(s:Sensor)
WHERE a.name = 'R1' AND c.name = 'SS2'
CREATE (s)-[r:MONITORS { temperature : 37 }]->(a)
MATCH (n:Sensor)-[MONITORS]-(c:Component)
WHERE n.temperature > 60
RETURN c.name, r.dilation
Decide if it is the component of if it is the trail that is damaged.
Step 5: Run the following periodic query:
Case 2 - Epic Fails
Too many updates for the sensors; Neo4j has some writing restrictions
Problem
Bad performance and high RAM consumption
Impact
Remove some sensor nodes or jump to Enterprise version.
Solution
Final considerations
- Flexible models
- Find hidden relations
- Easy to get started
- Active tool and active community
- It can be useful in several scenarios, beyond social networks and recommendation systems.
Tools
- Data Import (Relational Databases, MongoDB, Cassandra, JSON, CSV)
- Visualization tools
- REST API
References
- Neo4j Meetup in São Paulo
- Neo4j Slack Users
- Neo4j Training (Free)
- Arrows (Sketching tool)
Special thanks
- Neo Technology, @lyonwj, @ryguyrg e @mesirii
- B.C., for the excellent feedback and review
- @Codeminer42
- Prof. Maurílio and Prof. Justino.
Thank you :)
Questions?
hannelita@gmail.com
@hannelita
Froscon - Graphs and Neo4j - From Hydropower plants to PCBs
By Hanneli Tavante (hannelita)
Froscon - Graphs and Neo4j - From Hydropower plants to PCBs
Graphs and Neo4j - From Hydropower plants to PCBs - English version
- 2,130