Graphs and Neo4j - from Hydropower Plants to PCBs

#froscon @hannelita

 

Hi!

 
  • Computer Engineering
  • Programming
  • Electronics
  • Mathematics
  • Physics
  • Lego
  • Meetups
  • Coffee
  • GIFs
 

@hannelita

 

Disclaimer

 

This content represents the speaker's personal overview.

 

Feedback (positive or not) accepted here: hannelita@gmail.com

 

Not all numeric data or names are true, aiming not to harm company secrets.

 

Modelling cases are real; apologises for technical terms flood.

 

Disclaimer

 

This talk is based on Neo4j 2.x; Version 3.x was released recently.

 

Structure - Cases

 
  • Use case context
  • Modelling with relational databases (and fails)
  • Graph modelling
  • Evolving the model
  • Epic fails
 

Final Considerations

 
  • Main benefits
  • Support tools
 

Title Text

 

Have you ever been into the darkness?

 

Running out of energy

 

Candle lights <3

 

Brazil is a huge producer of electrical energy.

 

Specially from hydropower plants

 

Case 1 - Context

 

How do we distribute electrical energy? How are the power plants distributed?

 

http://sigel.aneel.gov.br/sigel.html

 

Accessed in 28/3/2016

 

http://sigel.aneel.gov.br/sigel.html

 

Access in 28/3/2016

 

Map information

3
  • Power plant location
  • Transmission lines
  • Supply capacity
  • Total capacity
  • Nearby cities
  • Distribution
  • Boundaries / States
  • Hydrographic basin
  • Dealers
 

Electrical

 

Political

 

Environmental

 

Economy

 

Challenge

 

Build a sytem that stores all these information and how the data is related.

 

Case 1 - Modelling with relational databases

 

Electrical

 

Political

 

Environmental

 

Economy

 

CREATE TABLE power_plant;

1

CREATE TABLE city;

1

CREATE TABLE hydrographic_basin;

1

CREATE TABLE dealer;

1

Question 1:

 

How do you represent a power plant neighbourhood?

 
  • Self-relationship;
  • Denormalisation (neighbours_ids)
1

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1
  1. Given a coordinate data set, sum the population inside the resultant polygon.
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2

city

power_plant

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

2. Match power plants coordinates based on supply capacity

city

power_plant

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

3. Verify properties into transmission_lines table

city

power_plant

It is not that difficult

It is not over! 

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

4. Verify if there are industries nearby

city

power_plant

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

5. Verify HDI

city

power_plant

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

6. Verify dealers interest.

city

power_plant

Question 2:

 

Which is the best power plant to provide energy for a group of cities?

 
id usage (month, in Mwh) population (milion) coordinate
1 40 13
2 11 2
id capacity ( Mwh) transmission_line (PK) coordinate
1 95 22
2 11 1

7. Verify if the region has alternative energy sources

city

power_plant

Question 3:

 

Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?

 

Question 3:

 

Assuming that hydropower plants work as tug-of-war with multiple endpoints, how do you redistribute the electrical charges if one plant shuts down?

 

Maybe tables are not the best structures to represent information about energy distribution.

Neo4j comes to rescue!

Quick intro - Neo4j

  • Graph oriented database
  • ACID
  • Structures: Node, Relationship, Index and Label
  • Maintained by Neotechnology
  • Open Source
  • Active community

Case 1 - Graph Modelling

 

Step 1 - Power plants become nodes

Powered by Arrows - http://www.apcjones.com/arrows/#

CREATE (n:PowerPlant:HydropowerPlant { name : 'Itaipu', capacity : '14000' })
 

Usina => Power Plant

Hidreletrica => Hydropower

capacidade => capacity

Step 2 - Cities become nodes

3

Step 3 - Transmission lines become relationships! 

3

Itaipu - Ivaiporã

MATCH (a:HidropowerPlant),(b:City)

WHERE a.name = 'Itaipu' AND b.name = 'Ivaipora'

CREATE (a)-[r:PROVIDES { cable_capacity : 765, rl : 330 }]->(b) 

 

Multiple relationships for several lines

MATCH (a:HidrepowerPlant),(b:City) 
WHERE a.name = 'Itaipu' AND b.name = 'Cascavel Oeste' 
CREATE (a)-[r:PROVIDES { cable_capacity : 500 }]->(b)

MATCH (a:City),(b:City) 
WHERE a.name = 'Ivaipora' AND b.name = 'Cascavel Oeste' 
CREATE (a)-[r:MESH { capacidade_cabo : 500 }]->(b)

Step 4 - Dealers become nodes

3
CREATE (n:Dealer { name : 'Fake', 
percentage : 85, margin : 72 })

MATCH (a:Dealer),(b:City) 
WHERE a.name = 'Ficticio' AND b.name = 'Cascavel Oeste' 
CREATE (a)-[r:ATTENDS]->(b)

MATCH (a:Dealer),(b:PowerPlant) 
WHERE a.name = 'Ficticio' AND b.name = 'Ita' 
CREATE (a)-[r:OWNS]->(b)

Step 5 - Queries poderosas 

3
MATCH (n:PowerPlant {capacity : 14000}),
      (c:City {name : 'Sao Paulo'})
p = shortestPath((n)-[]-(c)) RETURN p

Queries determine optinal paths for energy supply

Case 1 - Evolving the model

 

Important: add Indexes for the most frequently used properties

Capacity, population, coordinates

Important[2]: Labels

 

:City, :PowerPlant, :Region

Usually, elements that can be grouped deserve a label.

 

More: turn other electrical elements into nodes

CREATE (n:Component:Transformer 
{ tag : 'F. Iguacu', type : 'Terciario', mva : 1650, total : 4 })


MATCH (a:Transformer),(b:PowerPlant) 
WHERE a.tag = 'F. Iguacu' AND b.name = 'Itaipu' 
CREATE (a)-[r:INSTALLED]->(b)

Neo4j is flexible for modelling.

 

Case 1 - Epic Fails

 

Too many nodes for cities! (There are too many cities)

 

Problem

 

Too much information being loaded on MATCH; performance problems

 

Impact

 

Remove some :Cities and add :Region label, grouping cities

 

Solution

 

Not saving all the CREATE operations into a file

 

Problem

 

Problems with backup / replication. 

 

Impact

 

Do not perform CREATE operations into Web interface! 

Add queries into a Git repository - https://github.com/hannelita/qconsp

 

Solution

 

Case 1 - Extra - Insights

 

Find hidden information

Example: mapping the components made a big difference for a deeper model evaluation.

Mapping components...

Case 2 - Context

 

A-HA! We could use graphs for (...) [complete]

PCB Routing / Trail design 

Yes! But we can go further.

Let's analyse the board layout and  components display.

Case 2 - Modelling with relational databases

 

Component

 

Trail

 

Sensor

 

Layer

 

CREATE TABLE component;

1

CREATE TABLE trail;

1

CREATE TABLE sensor;

1

CREATE TABLE layer;

1

Question 1:

A sensor detects a temperature raise. How would you infer if it is a problem from a component or from the trail?

 

Question 1

 

Usually, you need extra information from the sensors nearby. How do you model that?

 
  • Self-relationship;
  • Denormalise (sensors_ids)
1

Déjà vu!

Question 2:

 

Which trails do affect more components at the same time? (ex: If Trail A breaks, the entire system stops working)

 

Question 3:

 

Is it possible to extract some hidden or unseen information from the circuit by modelling it within a graph?

 

Case 2 - Graph modelling

 

Step 1: Components become nodes

CREATE (n:Component:Primary { name : 'R1', 
type : 'resistor', value : '10K' })

CREATE (n:Component:Primary { name : 'C1', 
type : 'capacitor', group : 'polyester', 
 value : '100p' })

CREATE (n:Component:CI { name : 'CI1', 
type : 'LM741', seller : 'Texas' })

Step 2: Map trails into relationships

 
MATCH (a:Primary),(c:CI) 
WHERE a.name = 'R1' AND c.name = 'CI1' 
CREATE (a)-[r:TRAILS { thickness : 2, dilation : 0.5 }]->(c)

Step 3: Map Layers into Labels

 
CREATE (n:Component:Primary:LAYER1 
{ name : 'R1', type : 'resistor', value : '10K' })

CREATE (n:Component:Primary:LAYER2 
{ name : 'C1', type : 'capacitor', 
group : 'polyester',  value : '100p' })

CREATE (n:Component:CI:LAYER1 { name : 'CI1', 
type : 'LM741', seller : 'Texas' })

Easy to fetch all the components from a specific Layer

Case 2 - Evolving the model

 

Step 4: Map sensors into nodes

 
CREATE (n:Sensor:LAYER1 
{ name : 'SS1', type : 'light'})

CREATE (n:Sensor:LAYER2 
{ name : 'SS2', type : 'temperature' })

MATCH (aPrimary),(s:Sensor) 
WHERE a.name = 'R1' AND c.name = 'SS1' 
CREATE (s)-[MONITORS { light : 2 }]->(a)

MATCH (a:Primary),(s:Sensor) 
WHERE a.name = 'R1' AND c.name = 'SS2' 
CREATE (s)-[r:MONITORS { temperature : 37 }]->(a)
MATCH (n:Sensor)-[MONITORS]-(c:Component)
WHERE n.temperature > 60
RETURN c.name, r.dilation

Decide if it is the component of if it is the trail that is damaged.

Step 5: Run the following periodic query:

Case 2 - Epic Fails

 

Too many updates for the sensors; Neo4j has some writing restrictions

 

Problem

 

Bad performance and high RAM consumption

 

Impact

 

Remove some sensor nodes or jump to Enterprise version.

 

Solution

 

Final considerations

1
  • Flexible models
  • Find hidden relations
  • Easy to get started
  • Active tool and active community
  • It can be useful in several scenarios, beyond social networks and recommendation systems.
 

Tools

1
  • Data Import  (Relational Databases, MongoDB, Cassandra, JSON, CSV)
  • Visualization tools
  • REST API
 

References

 

Special thanks

 
  • Neo Technology, @lyonwj, @ryguyrg e @mesirii 
  • B.C., for the excellent feedback and review
  • @Codeminer42
2
  • Prof. Maurílio and  Prof. Justino.
2

Thank you :)

Questions?

 

 

hannelita@gmail.com

@hannelita