Network Analysis with Grakn

Cheuk Ting Ho

@cheukting_ho

Cheukting

About me

Co-organizer of 

Open Source contribution

Creator of

Network Analysis

In some cases, information and data are better to be represented in graphs and analyse their relations with network theory

Network theory

Network theory has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, finance, operations research, climatology, ecology and sociology. — Wikipedia

Here are some key concepts...

Directed Graph vs

Non-Directed Graph

Connected Component

Connected component:

nodes that are connected to one another by paths in a non-directed graph

 

Strongly connected component:

if every node are reachable from every other node in a directed graph

Degree Centrality

Degree measures how many neighbours a node has:

e.g. 8 for this node

 

Directed graph - 2 versions:

  1. in-degree - the number of incoming links
    eg. 6 for this node

  2. out-degree - the number of out-going links
    eg. 2 for this node

Pagerank Centrality

There are three distinct factors that determine the PageRank (PR) of a node:

  1. the number of links it receives
  2. the link propensity of the linkers
  3. the centrality of the linkers

So if a node is the only link to a node with high centrality will have high PR (e.g. 21.21)

Steps:

  1. Every node starts with PR 1
  2. Evenly distribute the PR to the successor
  3. Loop until equilibrium is reached

More on network analysis and apply it for the travel industry:

See my presentation on YouTube

Why Grakn ?

GRAKN.AI is an open-source, distributed knowledge graph for knowledge-oriented systems.

NetworkX

  • Single machine
  • In memory
  • Python only

Grakn

  • Create and query data with Graql
     
  • Can be deployed to the cloud
     
  • Python, Java and Node.js clients available
     
  • Data are stored in the knowledge graph
     
  • Provide automated reasoning
     
  • Visualization via Workbase

GraphX

  • Use with Apache Spark
  • Hadoop clusters only

Neo4j

  • Does not support reasoning

Let's do some analysis

Before we start...

Download Grakn Core

- it also includes a console for queries
 

Download Workbase

- for visualization, we will do a live demo at the end
 

Install Grakn Client in Python

- we are using Python in this tutorial
 

Build the knowledge graph

- following the previous tutorial

What do we want to do?

  • Our graph is a non-directed graph
  • Finding the biggest group of allies and families (connected components)
  • Allies are people in the same house
  • Finding the character(s) who is the centre of the story / has the most connection (degree centrality)

To find all allies connection, we are going to use the reasoning rules with Grakn

def forming_ally(session):

    # write an insert query to create new relations using rule
    graql_insert_query = """
    define
    allies sub relation,
        relates ally1,
        relates ally2;
    join-allies sub rule,
    when {
        (member: $char1, organization: $house) isa membership;
        (member: $char2, organization: $house) isa membership;
        $char1 != $char2;
    }, then {
        (ally1: $char1, ally2: $char2) isa allies;
    };
    """

    with session.transaction().write() as transaction:
        # make a write transection with the query
        transaction.query(graql_insert_query)
        # remember to commit at the end
        transaction.commit()

Joining the characters in the same house as allies

def _convert_id_to_name(cluster, transaction):

    new_cluster = set()
    for element in cluster:
        graql_query = f'match $char id {element}, has name $name; get $name;'
        iterator = transaction.query(graql_query)
        answers = iterator.collect_concepts()
        for answer in answers:
            new_cluster.add(answer.value())
    return new_cluster

Create a helper function to convert the cluster with ids to cluster with names.

def getting_biggest_group(session):

    graql_query = f'compute cluster in [character, allies, marriage, parental], ' \
                  f'using connected-component;'
    with session.transaction().read() as transaction:
        # exicute the query and getting the clusters
        iterator = transaction.query(graql_query)
        result = [item.set() for item in iterator]

        # extracting the name of the characters in each clusters
        new_result = []
        for cluster in result:
            new_cluster = _convert_id_to_name(cluster, transaction)
            new_result.append(new_cluster)

        # finding the biggest group of people
        biggest_group = None
        max_size = 0
        for group in new_result:
            if len(group) > max_size:
                max_size = len(group)
                biggest_group = group

    return max_size, biggest_group

Finding the biggest group of related characters

def getting_main_character(session):

    graql_query = f'compute centrality in [character, allies, marriage, parental], ' \
                  f'using degree;'
    with session.transaction().read() as transaction:
        # exicute the query and returning the answer
        iterator = transaction.query(graql_query)
        result = [(item.measurement(),item.set()) for item in iterator]

        # finding the biggest cluster
        biggest_cluster = None
        max_measure = 0
        for (measure,group) in result:
            if measure > max_measure:
                max_measure = measure
                biggest_cluster = group

        # finding the name of the characters
        main_characters = _convert_id_to_name(biggest_cluster, transaction)

    return max_measure, main_characters

Finding the character(s) that relate(s) to most other characters

with GraknClient(uri="localhost:48555") as client:
    with client.session(keyspace = 'game_of_thrones') as session:
        # first forming allies if characters are in the same house
        forming_ally(session)

        # now we can answer some questions:
        print("What is the biggest group of friends and families?")
        max_size,biggest_group = getting_biggest_group(session)
        print(f'The biggest group is {biggest_group} with {max_size} members.')

        print() # extra line before next question

        print("Which character(s) relate(s) to most other characters?")
        max_measure, main_characters = getting_main_character(session)
        if len(main_characters) == 1:
            print(f'{list(main_characters)[0]} relates to the most, ' \
                  f'he/she related to {max_measure} characters')
        else:
            print(f'{main_characters} relate to the most, ' \
                  f'they all related to {max_measure} characters')

        # if there is only one most important character,
        # is that character in the biggest group?
        if len(main_characters) == 1:
            print() # extra line before next question
            print("Is he/she in the biggest group?")
            print(main_characters in biggest_group)

Main part of the program:

What is the biggest group of friends and families?
The biggest group is {'Daeron I Targaryen', 'Jaehaera Targaryen', 'Aegon IV Targaryen', 'Alys Arryn', 'Daemon Targaryen', 'Daena Targaryen', 'Elys Waynwood', 'Aerys I Targaryen', 'Baelor I Targaryen', 'Rhaegel Targaryen', 'Rhea Royce', 'Naerys Targaryen', 'Aelinor Penrose', 'Laena Velaryon', 'Aegon III Targaryen', 'Daeron II Targaryen', 'Viserys II Targaryen'} with 17 members.

Which character(s) relate(s) to most other characters?
Walder Frey relates to the most, he/she related to 7 characters

Is he/she in the biggest group?
False

Run the code in the terminal:

python analysis.py

Give us the result:

Complete Code on GitHub

Live demo with Workbase

Network Analysis with Grakn

By Cheuk Ting Ho

Network Analysis with Grakn

  • 1,108