Columbus Graphistas

Meetup Group
1st Meeting - January 9th, 2014


Rick Otten

Director of Analytics

Wilson Relationship Marketing


http://www.wilsonrms.com





rotten@wilsonrms.com

Using GEPHI for Graph Analysis and Visualization

an introduction

  • A Brief Discussion of Graph Data Modeling
  • Where Gephi fits in the current Graph Ecosystem
  • Gephi Setup
  • Getting Data into Gephi
  • High Level Gephi Overview
  • Drill down:
    • Graph Statistics
    • Filtering
    • Partions and Ranking
    • Graph Layouts
    • Clustering
    • Exporting

Quick Look At Graph Data Modeling



Graphs are a way to organize "Hyper-Relational Data"


... first lets do a fast and loose look at other popular data models (for context)  ...

Key:Value Data


first_name:  Rick
last_name:  Otten
car:  Subaru Outback

Fast writing and fast queries for specific values when you know the key.

Easy to understand.

I like to think of these as properties of an object.

Key Value Data Structures and Stores

  • Webster Dictionary
  • Dewey Decimal System
  • JSON
  • Berkeley DB
  • MongoDB
  • Redis
  • CouchBase
  • ArangoDB
  • PostgreSQL HSTORE data type
  • ... and many more...

Tables


first_name, last_name, car
Rick, Otten, Subaru Outback
Someone, Else, Tesla Model S


Compactly organizes many objects as rows.  
Can be queried with the powerful SQL declarative tools.
Easier to do rollups and statistics and find patterns and select sets of rows.

Table Data Structures and Stores

  • 2D Arrays
  • CSV Files
  • Microsoft Excel
  • Hadoop
  • Relational Databases (PostgreSQL, etc)
  • ... and many more ...

Columnar Data


cars:  Subaru Outback, Tesla Model S
first_names:  Rick, Someone
last_names:  Otten, Else

Re-organize the table to emphasize columns rather than rows.

Really great for aggregating and analysis of dimensions (Columns with discrete value sets).

Columnar Data Stores

  • Vertica
  • Teradata
  • Accumulo
  • MonetDB
  • PostgreSQL IMCS extension
  • ... and many more ...

Relational Data

Person Table:                id, first_name, last_name
                                                1, Rick, Otten
                                                2, Someone, Else

Car Table:                         id, car
                                                A, Tesla Model S
                                                B, Subaru Outback

Relationship Table:     person_id, car_id
                                                1, B
                                                2, A
Neatly organizes many-to-one relationships.  
 Does not scale beyond a few relationships.

Relational Data Stores

  • PostgreSQL
  • Sybase
  • Oracle
  • MySQL
  • MariaDB
  • SQL Server
  • SQL Lite
  • ... and many more ...

Graph Data


Person                                                                                                   Car
first_name: Rick                  ----Drives--->                         make: Subaru
last_name: Otten                       frequency:  daily          model: Outback

Person                                                                                                      Car      
first_name: Someone     ----Drives---->                          make: Tesla
last_name: Else                          frequency: Sundays      model: Model S


Emphasizes the relationships  (which can have properties too).
Can allow for great complexity and long paths
Network of Flavors

http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html

 

            

network of relationships at a sports club

Terminology

Mathematicians:  Vertex

Computer & Social Scientists:  Node

o

Mathematicians:  Edge
Computer & Social Scientists:  Edge OR Relationship OR Link

O ---- O

Common Edge Terminology

Strength of a Relationship:   weight

UnDirected:
O - -- O

Directed:
O - --> O

Parallel Edges:
O ===  O

 

since 1736

Graph Data Models can usually easily be represented visually with simple diagrams.  This can make them more intuitive and "whiteboard friendly" than other data models.

  
(click on the image for the  graph data modeling article this  particular image came from)

For more info on Graph Data Modeling, here is a nice slice show ( Neo4j):   http://www.slideshare.net/neo4j/data-modeling-with-neo4j-25767444

Graph (data models) Are Everywhere

  • Search Engines
  • Natural Language Processing & Lexical Analysis
  • Social Networks
  • Getting Directions (maps)
    • Train Routes and Schedules
    • Booking Airline Flights
  • Routing Phone Calls
  • Netflix Movie Recommendations
  • Amazon Book Recommendations
  • Twitter Feed Recommendations
  • Org Charts
  • BioInformatics
    • Epidemic Modeling
  • Electrical Circuits 
  • Electric Transmission Planning
  • Oil & Gas Pipelines
  • Manufacturing Parts Lists and Supplier Trees
  • Conference Room Scheduling
  • Geneology
  • Fraud Detection

Graph Data Stores (hyper relational databases)

  • Neo4j
  • OrientDB
  • Titan
  • Giraph
  • YarcData
  • Dex
  • VelocityGraph
  • InfiniteGraph
  • imGraph
  • ArangoDB
  • iGraph over PostgreSQL
  • ... and many more ...

What is Gephi?

http://gephi.org

  • Desktop Tool
  • Open Source:  GPL 3
  • In the wild since 2008
  • Written in Java on the NetBeans Platform
  • Primarily used for Graph Visualization
  • Has some Analysis Capabilities (getting better all the time)
  • Extensible via Plugins

Some other Graph Visualization & Analysis Tools



Max De Marzi - Graph Visualization Guru - blog:  http://maxdemarzi.com

Some of The Limitations Of Gephi

Graph Size
My 1440x900 15" Mac Monitor has only 1.3M pixels TOTAL.
Needs Lots Of Memory
Java.
All Nodes have to have the same set of Properties
All edges have to have the same set of Properties

No Traversals or Recommendations 
...although you could probably write a plugin to show shortest paths.
        
    No Parallel Edges

    Nor Edge-to-Edge Connections


    
Sometimes a little buggy or quirky.

Gephi Setup Notes


Java 1.6 not Java 1.7

gephi.conf:  jdkhome, -J-d64 -J-Xms5120m -J-Xmx12288m


mouse vs. touchpad


tools/plugins


preferences



Getting Data Into Gephi

Some of the ways:

  • auto-generated graphs
  • Excel/CSV
  • Neo4j Plugin
  • Virtuoso Plugin
  • HTTPGraph
  • RDF
  • GDF

GDF

https://gephi.org/users/supported-graph-formats/gdf-format

CSV List of Nodes followed by a list of Edges
nodedef>name INTEGER,type VARCHAR,description VARCHAR,last VARCHAR,first VARCHAR,middle VARCHAR,suffix VARCHAR,phone VARCHAR,email VARCHAR, class VARCHAR
00216766,person,Someone_Else,Else,Someone,WhoKnew,,555-1212,,197300604955,person,Somebody_New,New,Somebody,IKnew,,(800) 111-7777,somebody@somewhere.com,1993edgedef> node1 VARCHAR, node2 VARCHAR, weight INTEGER00352008,00352008,2
00352008,00363969,1
00352008,00352928,1

Exploring Gephi

(demos)

Overview/Data Laboratory/Preview
Statistics
Filters
Layouts
Partitions and Ranking
Clustering
Exporting

Curated List of Tutorials
  http://exploreyourdata.wordpress.com/2013/07/29/gephi-curated-list-of-tutorials

More Gephi Links



My Gephi Wish List


  • Better Memory Management  (coming in 0.9!)
  • Tree Layout Plugin
  • More Clustering Algorithms
  • A shortest Path finder (DFS & BFS)
Made with Slides.com