Columbus Graphistas
Meetup Group
1st Meeting - January 9th, 2014

Rick Otten
Director of Analytics
Wilson Relationship Marketing
http://www.wilsonrms.com
Using GEPHI for Graph Analysis and Visualization
an introduction
- A Brief Discussion of Graph Data Modeling
- Where Gephi fits in the current Graph Ecosystem
- Gephi Setup
- Getting Data into Gephi
- High Level Gephi Overview
- Drill down:
- Graph Statistics
- Filtering
- Partions and Ranking
- Graph Layouts
- Clustering
- Exporting
Quick Look At Graph Data Modeling
Graphs are a way to organize "Hyper-Relational Data"
... first lets do a fast and loose look at other popular data models (for context) ...
Key:Value Data
last_name: Otten
car: Subaru Outback
Fast writing and fast queries for specific values when you know the key.
Easy to understand.
I like to think of these as properties of an object.
Key Value Data Structures and Stores
- Webster Dictionary
- Dewey Decimal System
- JSON
- Berkeley DB
- MongoDB
- Redis
- CouchBase
- ArangoDB
- PostgreSQL HSTORE data type
- ... and many more...
Tables
first_name, last_name, car
Rick, Otten, Subaru Outback
Someone, Else, Tesla Model S
Compactly organizes many objects as rows.
Can be queried with the powerful SQL declarative tools.
Easier to do rollups and statistics and find patterns and select sets of rows.
Table Data Structures and Stores
- 2D Arrays
- CSV Files
- Microsoft Excel
- Hadoop
- Relational Databases (PostgreSQL, etc)
- ... and many more ...
Columnar Data
cars: Subaru Outback, Tesla Model S
first_names: Rick, Someone
last_names: Otten, Else
Re-organize the table to emphasize columns rather than rows.
Really great for aggregating and analysis of dimensions (Columns with discrete value sets).
Columnar Data Stores
- Vertica
- Teradata
- Accumulo
- MonetDB
- PostgreSQL IMCS extension
- ... and many more ...
Relational Data
Person Table: id, first_name, last_name
1, Rick, Otten
2, Someone, Else
Car Table: id, car
A, Tesla Model S
B, Subaru Outback
Relationship Table: person_id, car_id
1, B
2, A
Neatly organizes many-to-one relationships.
Does not scale beyond a few relationships.
Relational Data Stores
- PostgreSQL
- Sybase
- Oracle
- MySQL
- MariaDB
- SQL Server
- SQL Lite
- ... and many more ...
Graph Data
Person Car
first_name: Rick ----Drives---> make: Subaru
last_name: Otten frequency: daily model: Outback
Person Car
first_name: Someone ----Drives----> make: Tesla
last_name: Else frequency: Sundays model: Model S
Emphasizes the relationships (which can have properties too).
Can allow for great complexity and long paths

Network of Flavors
http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html
network of relationships at a sports club
Terminology
Mathematicians: Vertex
Computer & Social Scientists: Node
o
Mathematicians: Edge
Computer & Social Scientists: Edge OR Relationship OR Link
O ---- O
Common Edge Terminology
Strength of a Relationship:
weight
UnDirected:
O -
-- O
Directed:
O -
--> O
Parallel Edges:
O ===
O
since 1736
Graph Data Models can usually easily be represented visually with simple diagrams. This can make them more intuitive and "whiteboard friendly" than other data models.
(click on the image for the graph data modeling article this particular image came from)
For more info on Graph Data Modeling, here is a nice slice show ( Neo4j): http://www.slideshare.net/neo4j/data-modeling-with-neo4j-25767444
Graph (data models) Are Everywhere
- Search Engines
- Natural Language Processing & Lexical Analysis
- Social Networks
- Getting Directions (maps)
- Train Routes and Schedules
- Booking Airline Flights
- Routing Phone Calls
- Netflix Movie Recommendations
- Amazon Book Recommendations
- Twitter Feed Recommendations
- Org Charts
- BioInformatics
- Epidemic Modeling
- Electrical Circuits
- Electric Transmission Planning
- Oil & Gas Pipelines
- Manufacturing Parts Lists and Supplier Trees
- Conference Room Scheduling
- Geneology
- Fraud Detection
Graph Data Stores (hyper relational databases)
- Neo4j
- OrientDB
- Titan
- Giraph
- YarcData
- Dex
- VelocityGraph
- InfiniteGraph
- imGraph
- ArangoDB
- iGraph over PostgreSQL
- ... and many more ...
What is Gephi?
- Desktop Tool
-
Open Source: GPL 3
- In the wild since 2008
- Written in Java on the NetBeans Platform
-
Primarily used for Graph Visualization
-
Has some Analysis Capabilities (getting better all the time)
- Extensible via Plugins
Some other Graph Visualization & Analysis Tools
- Linkurious -- http://linkurio.us
- D3 -- http://d3js.org
- GraphViz -- http://www.graphviz.org
- Keylines -- http://www.keylines.com
- NodeXL -- http://www.smrfoundation.org/nodexl
- VivaGraphJS -- https://github.com/anvaka/VivaGraphJS
- iGraph -- http://igraph.sourceforge.net
- helios.js -- https://github.com/entrendipity/helios.js
- BioFabric -- http://www.biofabric.org
- Many graph databases come with some sort of visualization tool.
- I'm sure there are many others.
Max De Marzi - Graph Visualization Guru - blog: http://maxdemarzi.com
Some of The Limitations Of Gephi
Graph Size
My 1440x900 15" Mac Monitor has only 1.3M pixels TOTAL.
Needs Lots Of Memory
Java.
All Nodes have to have the same set of Properties
All edges have to have the same set of Properties
No Traversals or Recommendations
...although you could probably write a plugin to show shortest paths.
No Parallel Edges
Nor Edge-to-Edge Connections
Sometimes a little buggy or quirky.
Gephi Setup Notes
Java 1.6 not Java 1.7
gephi.conf: jdkhome, -J-d64 -J-Xms5120m -J-Xmx12288m
mouse vs. touchpad
tools/plugins
preferences
Getting Data Into Gephi
Some of the ways:
- auto-generated graphs
-
Excel/CSV
-
Neo4j Plugin
-
Virtuoso Plugin
-
HTTPGraph
- RDF
- GDF
GDF
CSV List of Nodes followed by a list of Edges
nodedef>name INTEGER,type VARCHAR,description VARCHAR,last VARCHAR,first VARCHAR,middle VARCHAR,suffix VARCHAR,phone VARCHAR,email VARCHAR, class VARCHAR 00216766,person,Someone_Else,Else,Someone,WhoKnew,,555-1212,,197300604955,person,Somebody_New,New,Somebody,IKnew,,(800) 111-7777,somebody@somewhere.com,1993edgedef> node1 VARCHAR, node2 VARCHAR, weight INTEGER00352008,00352008,2 00352008,00363969,1 00352008,00352928,1
Exploring Gephi
(demos)
Overview/Data Laboratory/Preview
Statistics
Filters
Layouts
Partitions and Ranking
Clustering
Exporting
Curated List of Tutorials
More Gephi Links
- Gephi Planet -- http://www.netvibes.com/gephi#General
- Gephi Forums -- https://forum.gephi.org
- Gephi Marketplace -- https://marketplace.gephi.org
My Gephi Wish List
- Better Memory Management (coming in 0.9!)
- Tree Layout Plugin
- More Clustering Algorithms
- A shortest Path finder (DFS & BFS)
Columbus Graphistas
By Rick Otten
Columbus Graphistas
- 2,413


