# Network Analysis with Grakn

Cheuk Ting Ho

@cheukting_ho

Cheukting

Co-organizer of

Open Source contribution

Creator of

# Network Analysis

In some cases, information and data are better to be represented in graphs and analyse their relations with network theory

# Network theory

Network theory has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, finance, operations research, climatology, ecology and sociology. — Wikipedia

Here are some key concepts...

# Connected Component

Connected component:

nodes that are connected to one another by paths in a non-directed graph

Strongly connected component:

if every node are reachable from every other node in a directed graph

# Degree Centrality

Degree measures how many neighbours a node has:

e.g. 8 for this node

Directed graph - 2 versions:

1. in-degree - the number of incoming links
eg. 6 for this node

2. out-degree - the number of out-going links
eg. 2 for this node

# Pagerank Centrality

There are three distinct factors that determine the PageRank (PR) of a node:

3. the centrality of the linkers

So if a node is the only link to a node with high centrality will have high PR (e.g. 21.21)

Steps:

1. Every node starts with PR 1
2. Evenly distribute the PR to the successor
3. Loop until equilibrium is reached

# Why Grakn ?

GRAKN.AI is an open-source, distributed knowledge graph for knowledge-oriented systems.

## NetworkX

• Single machine
• In memory
• Python only

## Grakn

• Create and query data with Graql

• Can be deployed to the cloud

• Python, Java and Node.js clients available

• Data are stored in the knowledge graph

• Provide automated reasoning

• Visualization via Workbase

## GraphX

• Use with Apache Spark

## Neo4j

• Does not support reasoning

# Before we start...

- it also includes a console for queries

- for visualization, we will do a live demo at the end

Install Grakn Client in Python

- we are using Python in this tutorial

Build the knowledge graph

- following the previous tutorial

## What do we want to do?

• Our graph is a non-directed graph
• Finding the biggest group of allies and families (connected components)
• Allies are people in the same house
• Finding the character(s) who is the centre of the story / has the most connection (degree centrality)

To find all allies connection, we are going to use the reasoning rules with Grakn

def forming_ally(session):

# write an insert query to create new relations using rule
graql_insert_query = """
define
allies sub relation,
relates ally1,
relates ally2;
join-allies sub rule,
when {
(member: \$char1, organization: \$house) isa membership;
(member: \$char2, organization: \$house) isa membership;
\$char1 != \$char2;
}, then {
(ally1: \$char1, ally2: \$char2) isa allies;
};
"""

with session.transaction().write() as transaction:
# make a write transection with the query
transaction.query(graql_insert_query)
# remember to commit at the end
transaction.commit()

Joining the characters in the same house as allies

def _convert_id_to_name(cluster, transaction):

new_cluster = set()
for element in cluster:
graql_query = f'match \$char id {element}, has name \$name; get \$name;'
iterator = transaction.query(graql_query)
return new_cluster

Create a helper function to convert the cluster with ids to cluster with names.

def getting_biggest_group(session):

graql_query = f'compute cluster in [character, allies, marriage, parental], ' \
f'using connected-component;'
# exicute the query and getting the clusters
iterator = transaction.query(graql_query)
result = [item.set() for item in iterator]

# extracting the name of the characters in each clusters
new_result = []
for cluster in result:
new_cluster = _convert_id_to_name(cluster, transaction)
new_result.append(new_cluster)

# finding the biggest group of people
biggest_group = None
max_size = 0
for group in new_result:
if len(group) > max_size:
max_size = len(group)
biggest_group = group

return max_size, biggest_group

Finding the biggest group of related characters

def getting_main_character(session):

graql_query = f'compute centrality in [character, allies, marriage, parental], ' \
f'using degree;'
# exicute the query and returning the answer
iterator = transaction.query(graql_query)
result = [(item.measurement(),item.set()) for item in iterator]

# finding the biggest cluster
biggest_cluster = None
max_measure = 0
for (measure,group) in result:
if measure > max_measure:
max_measure = measure
biggest_cluster = group

# finding the name of the characters
main_characters = _convert_id_to_name(biggest_cluster, transaction)

return max_measure, main_characters

Finding the character(s) that relate(s) to most other characters

with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace = 'game_of_thrones') as session:
# first forming allies if characters are in the same house
forming_ally(session)

# now we can answer some questions:
print("What is the biggest group of friends and families?")
max_size,biggest_group = getting_biggest_group(session)
print(f'The biggest group is {biggest_group} with {max_size} members.')

print() # extra line before next question

print("Which character(s) relate(s) to most other characters?")
max_measure, main_characters = getting_main_character(session)
if len(main_characters) == 1:
print(f'{list(main_characters)[0]} relates to the most, ' \
f'he/she related to {max_measure} characters')
else:
print(f'{main_characters} relate to the most, ' \
f'they all related to {max_measure} characters')

# if there is only one most important character,
# is that character in the biggest group?
if len(main_characters) == 1:
print() # extra line before next question
print("Is he/she in the biggest group?")
print(main_characters in biggest_group)

Main part of the program:

What is the biggest group of friends and families?
The biggest group is {'Daeron I Targaryen', 'Jaehaera Targaryen', 'Aegon IV Targaryen', 'Alys Arryn', 'Daemon Targaryen', 'Daena Targaryen', 'Elys Waynwood', 'Aerys I Targaryen', 'Baelor I Targaryen', 'Rhaegel Targaryen', 'Rhea Royce', 'Naerys Targaryen', 'Aelinor Penrose', 'Laena Velaryon', 'Aegon III Targaryen', 'Daeron II Targaryen', 'Viserys II Targaryen'} with 17 members.

Which character(s) relate(s) to most other characters?
Walder Frey relates to the most, he/she related to 7 characters

Is he/she in the biggest group?
False

Run the code in the terminal:

python analysis.py

Give us the result: