Cheuk Ting Ho
Developer advocate / Data Scientist - support open-source and building the community.
Cheuk Ting Ho
@cheukting_ho
Cheukting
Co-organizer of
Open Source contribution
Creator of
In some cases, information and data are better to be represented in graphs and analyse their relations with network theory
Network theory has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, finance, operations research, climatology, ecology and sociology. — Wikipedia
Here are some key concepts...
Connected component:
nodes that are connected to one another by paths in a non-directed graph
Strongly connected component:
if every node are reachable from every other node in a directed graph
Degree measures how many neighbours a node has:
e.g. 8 for this node
Directed graph - 2 versions:
in-degree - the number of incoming links
eg. 6 for this node
out-degree - the number of out-going links
eg. 2 for this node
There are three distinct factors that determine the PageRank (PR) of a node:
So if a node is the only link to a node with high centrality will have high PR (e.g. 21.21)
Steps:
GRAKN.AI is an open-source, distributed knowledge graph for knowledge-oriented systems.
Download Grakn Core
- it also includes a console for queries
Download Workbase
- for visualization, we will do a live demo at the end
Install Grakn Client in Python
- we are using Python in this tutorial
Build the knowledge graph
- following the previous tutorial
To find all allies connection, we are going to use the reasoning rules with Grakn
def forming_ally(session):
# write an insert query to create new relations using rule
graql_insert_query = """
define
allies sub relation,
relates ally1,
relates ally2;
join-allies sub rule,
when {
(member: $char1, organization: $house) isa membership;
(member: $char2, organization: $house) isa membership;
$char1 != $char2;
}, then {
(ally1: $char1, ally2: $char2) isa allies;
};
"""
with session.transaction().write() as transaction:
# make a write transection with the query
transaction.query(graql_insert_query)
# remember to commit at the end
transaction.commit()
Joining the characters in the same house as allies
def _convert_id_to_name(cluster, transaction):
new_cluster = set()
for element in cluster:
graql_query = f'match $char id {element}, has name $name; get $name;'
iterator = transaction.query(graql_query)
answers = iterator.collect_concepts()
for answer in answers:
new_cluster.add(answer.value())
return new_cluster
Create a helper function to convert the cluster with ids to cluster with names.
def getting_biggest_group(session):
graql_query = f'compute cluster in [character, allies, marriage, parental], ' \
f'using connected-component;'
with session.transaction().read() as transaction:
# exicute the query and getting the clusters
iterator = transaction.query(graql_query)
result = [item.set() for item in iterator]
# extracting the name of the characters in each clusters
new_result = []
for cluster in result:
new_cluster = _convert_id_to_name(cluster, transaction)
new_result.append(new_cluster)
# finding the biggest group of people
biggest_group = None
max_size = 0
for group in new_result:
if len(group) > max_size:
max_size = len(group)
biggest_group = group
return max_size, biggest_group
Finding the biggest group of related characters
def getting_main_character(session):
graql_query = f'compute centrality in [character, allies, marriage, parental], ' \
f'using degree;'
with session.transaction().read() as transaction:
# exicute the query and returning the answer
iterator = transaction.query(graql_query)
result = [(item.measurement(),item.set()) for item in iterator]
# finding the biggest cluster
biggest_cluster = None
max_measure = 0
for (measure,group) in result:
if measure > max_measure:
max_measure = measure
biggest_cluster = group
# finding the name of the characters
main_characters = _convert_id_to_name(biggest_cluster, transaction)
return max_measure, main_characters
Finding the character(s) that relate(s) to most other characters
with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace = 'game_of_thrones') as session:
# first forming allies if characters are in the same house
forming_ally(session)
# now we can answer some questions:
print("What is the biggest group of friends and families?")
max_size,biggest_group = getting_biggest_group(session)
print(f'The biggest group is {biggest_group} with {max_size} members.')
print() # extra line before next question
print("Which character(s) relate(s) to most other characters?")
max_measure, main_characters = getting_main_character(session)
if len(main_characters) == 1:
print(f'{list(main_characters)[0]} relates to the most, ' \
f'he/she related to {max_measure} characters')
else:
print(f'{main_characters} relate to the most, ' \
f'they all related to {max_measure} characters')
# if there is only one most important character,
# is that character in the biggest group?
if len(main_characters) == 1:
print() # extra line before next question
print("Is he/she in the biggest group?")
print(main_characters in biggest_group)
Main part of the program:
What is the biggest group of friends and families?
The biggest group is {'Daeron I Targaryen', 'Jaehaera Targaryen', 'Aegon IV Targaryen', 'Alys Arryn', 'Daemon Targaryen', 'Daena Targaryen', 'Elys Waynwood', 'Aerys I Targaryen', 'Baelor I Targaryen', 'Rhaegel Targaryen', 'Rhea Royce', 'Naerys Targaryen', 'Aelinor Penrose', 'Laena Velaryon', 'Aegon III Targaryen', 'Daeron II Targaryen', 'Viserys II Targaryen'} with 17 members.
Which character(s) relate(s) to most other characters?
Walder Frey relates to the most, he/she related to 7 characters
Is he/she in the biggest group?
False
Run the code in the terminal:
python analysis.py
Give us the result:
By Cheuk Ting Ho
Developer advocate / Data Scientist - support open-source and building the community.