Network Analysis with Grakn

Cheuk Ting Ho

@cheukting_ho

Cheukting

About me

Co-organizer of

Open Source contribution

Creator of

Network Analysis

In some cases, information and data are better to be represented in graphs and analyse their relations with network theory

Network theory

Network theory has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, finance, operations research, climatology, ecology and sociology. — Wikipedia

Here are some key concepts...

Directed Graph vs

Non-Directed Graph

https://www.e-education.psu.edu/geog597i_02/node/832

Connected Component

Connected component:

nodes that are connected to one another by paths in a non-directed graph

Strongly connected component:

if every node are reachable from every other node in a directed graph

Degree Centrality

Degree measures how many neighbours a node has:

e.g. 8 for this node

Directed graph - 2 versions:

in-degree - the number of incoming links
eg. 6 for this node
out-degree - the number of out-going links
eg. 2 for this node

https://www.sci.unich.it/~francesc/teaching/network/degree.html

Pagerank Centrality

There are three distinct factors that determine the PageRank (PR) of a node:

the number of links it receives
the link propensity of the linkers
the centrality of the linkers

So if a node is the only link to a node with high centrality will have high PR (e.g. 21.21)

Steps:

Every node starts with PR 1
Evenly distribute the PR to the successor
Loop until equilibrium is reached

https://www.sci.unich.it/~francesc/teaching/network/pagerank

More on network analysis and apply it for the travel industry:

See my presentation on YouTube

Why Grakn ?

GRAKN.AI is an open-source, distributed knowledge graph for knowledge-oriented systems.

NetworkX

Single machine
In memory
Python only

Grakn

Create and query data with Graql
Can be deployed to the cloud
Python, Java and Node.js clients available
Data are stored in the knowledge graph
Provide automated reasoning
Visualization via Workbase

GraphX

Use with Apache Spark
Hadoop clusters only

Neo4j

Does not support reasoning

Let's do some analysis

Before we start...

Download Grakn Core

- it also includes a console for queries

Download Workbase

- for visualization, we will do a live demo at the end

Install Grakn Client in Python

- we are using Python in this tutorial

Build the knowledge graph

- following the previous tutorial

What do we want to do?

Our graph is a non-directed graph
Finding the biggest group of allies and families (connected components)
Allies are people in the same house
Finding the character(s) who is the centre of the story / has the most connection (degree centrality)

To find all allies connection, we are going to use the reasoning rules with Grakn

def forming_ally(session):

    # write an insert query to create new relations using rule
    graql_insert_query = """
    define
    allies sub relation,
        relates ally1,
        relates ally2;
    join-allies sub rule,
    when {
        (member: $char1, organization: $house) isa membership;
        (member: $char2, organization: $house) isa membership;
        $char1 != $char2;
    }, then {
        (ally1: $char1, ally2: $char2) isa allies;
    };
    """

    with session.transaction().write() as transaction:
        # make a write transection with the query
        transaction.query(graql_insert_query)
        # remember to commit at the end
        transaction.commit()

Joining the characters in the same house as allies

def _convert_id_to_name(cluster, transaction):

    new_cluster = set()
    for element in cluster:
        graql_query = f'match $char id {element}, has name $name; get $name;'
        iterator = transaction.query(graql_query)
        answers = iterator.collect_concepts()
        for answer in answers:
            new_cluster.add(answer.value())
    return new_cluster

Create a helper function to convert the cluster with ids to cluster with names.

def getting_biggest_group(session):

    graql_query = f'compute cluster in [character, allies, marriage, parental], ' \
                  f'using connected-component;'
    with session.transaction().read() as transaction:
        # exicute the query and getting the clusters
        iterator = transaction.query(graql_query)
        result = [item.set() for item in iterator]

        # extracting the name of the characters in each clusters
        new_result = []
        for cluster in result:
            new_cluster = _convert_id_to_name(cluster, transaction)
            new_result.append(new_cluster)

        # finding the biggest group of people
        biggest_group = None
        max_size = 0
        for group in new_result:
            if len(group) > max_size:
                max_size = len(group)
                biggest_group = group

    return max_size, biggest_group

Finding the biggest group of related characters

def getting_main_character(session):

    graql_query = f'compute centrality in [character, allies, marriage, parental], ' \
                  f'using degree;'
    with session.transaction().read() as transaction:
        # exicute the query and returning the answer
        iterator = transaction.query(graql_query)
        result = [(item.measurement(),item.set()) for item in iterator]

        # finding the biggest cluster
        biggest_cluster = None
        max_measure = 0
        for (measure,group) in result:
            if measure > max_measure:
                max_measure = measure
                biggest_cluster = group

        # finding the name of the characters
        main_characters = _convert_id_to_name(biggest_cluster, transaction)

    return max_measure, main_characters

Finding the character(s) that relate(s) to most other characters

with GraknClient(uri="localhost:48555") as client:
    with client.session(keyspace = 'game_of_thrones') as session:
        # first forming allies if characters are in the same house
        forming_ally(session)

        # now we can answer some questions:
        print("What is the biggest group of friends and families?")
        max_size,biggest_group = getting_biggest_group(session)
        print(f'The biggest group is {biggest_group} with {max_size} members.')

        print() # extra line before next question

        print("Which character(s) relate(s) to most other characters?")
        max_measure, main_characters = getting_main_character(session)
        if len(main_characters) == 1:
            print(f'{list(main_characters)[0]} relates to the most, ' \
                  f'he/she related to {max_measure} characters')
        else:
            print(f'{main_characters} relate to the most, ' \
                  f'they all related to {max_measure} characters')

        # if there is only one most important character,
        # is that character in the biggest group?
        if len(main_characters) == 1:
            print() # extra line before next question
            print("Is he/she in the biggest group?")
            print(main_characters in biggest_group)

Main part of the program:

What is the biggest group of friends and families?
The biggest group is {'Daeron I Targaryen', 'Jaehaera Targaryen', 'Aegon IV Targaryen', 'Alys Arryn', 'Daemon Targaryen', 'Daena Targaryen', 'Elys Waynwood', 'Aerys I Targaryen', 'Baelor I Targaryen', 'Rhaegel Targaryen', 'Rhea Royce', 'Naerys Targaryen', 'Aelinor Penrose', 'Laena Velaryon', 'Aegon III Targaryen', 'Daeron II Targaryen', 'Viserys II Targaryen'} with 17 members.

Which character(s) relate(s) to most other characters?
Walder Frey relates to the most, he/she related to 7 characters

Is he/she in the biggest group?
False

Run the code in the terminal:

python analysis.py

Give us the result: