GRAPH & NETWORKS:
THEORY & ALGORITHMS

Data Science Retreat Masterclass

ABOUT ME: Amélie Anglade

  • DS and Music Information Retrieval Consultant
     
  • PhD in MIR from QMUL
     
  • Worked for Large R&D Labs: Sony CSL, Philips Research
     
  • Now works mostly with music startups: SoundCloud, MusicGraph

ME and GRAPHS

  • MSc. thesis on modelling and clustering P2P networks of listeners

ME and GRAPHS

  • MSc. thesis on modelling and clustering P2P networks of listeners
     
  • Designed and implemented the DiscoRank at SoundCloud

ME and GRAPHS

  • MSc. thesis on modelling and clustering P2P networks of listeners
     
  • Designed and implemented the DiscoRank at SoundCloud
     
  • Worked for MusicGraph on Big Data graph algorithms

HOW to reach me

  • @amelie on dsr07.slack.com 
     
  • @utstikkar on Twitter (and the web)
     
  • amelie.anglade@gmail.com

Outline

  • Introduction
     
  • Graph Theory and Algorithms
     
  • Network properties and models
     
  • PageRank Algorithm
     
  • Graph Computing Technologies
     
  • Programming project

intro

real-world graphs

World Wide Web

real-world graphs

Facebook: Open Graph & Graph Search

real-world graphs

graph theory and algorithms

network properties and models

one famous Graph algorithm: Pagerank

Graph computing tech

  • In-memory graph toolkits
    • Usually single-user systems
    • Graph analysis and visualisation
    • Implementation of many if not all graph/network algorithms
    • Limit: can only operate on graphs that can be stored in main memory = max millions of edges
    • Examples: JUNGNetworkXiGraph

Graph computing tech

  • Real-time graph databases

    • Design to support multi-user concurrency

    • Use disk to persist the graph = couple billion edges locally, hundreds of billions of edges on distributed systems

    • Limit: global graphs algorithms/analytics not feasible

    • Focus on local algorithms and traversals

    • Examples: Neo4jOrientDBInfiniteGraphDEXTitan

Graph computing tech

  • Batch processing graph frameworks

    • ​Optimised for global graph analysis

    • Often Hadoop for storage (HDFS) and processing (MapReduce)

    • Iterative algorithms

    • Limit: Not real-time computation

    • But: Can leverage sequential reads from disk

    • Do not support concurrent users

    • Examples: HamaGiraphGraphLabFaunus

TIME to play

with graph algorithms

Made with Slides.com