What makes a community (cohesive subgroup):
Wasserman and Faust
A clique is a complete (fully connected) subgraph, i.e. a set of vertices where each pair of vertices is connected.
Cliques can overlap
Maximum cliques
Maximal cliques:
Clique size: 2 3 4 5
Number of cliques: 11 21 2 2
Network communities are groups of vertices such that vertices inside the group connected with many more edges than between groups.
Community detection is an assignment of vertices to communities. Will consider non-overlapping communities, graph cuts
Consider only sparse graphs m «n2 Each community should be connected Combinatorial optimization problem:
- optimization criterion (cut, conductance, modularity)
- optimization method
Exact solution NP - hard
(bi-partition: n = n1 + n2, n!/(n1!n2!) combinations)
Solved by greedy, approximate algorithms or heuristics. Recursive top-down 2-way partition, multiway partition. Balanced class partition vs communities
recursive partitioning
Focus on edges that connect communities.
Edge betweenness - number of shortest paths σst (e) going through edge e
Construct communities by progressively removing edges
Newman-Girvan, 2004
Algorithm: Edge Betweenness
Input: graph G(V,E)
Output: Dendrogram/communities
Repeat
For all e ∈ E compute edge betweenness CB (e);
remove edge ei with largest CB (ei ) ;
until edges left;
If bi-partition, then stop whrn graph splits in two components (check for connectedness)
Fortunato, Newman, 20 years of network community detection, Nature Physics, 2022, [pdf]
Example:
Example:
Measure of internal and external connectivity. The fraction of edges pointing outside a community
normalize by degree sum
sum goes through every pair of verticies
Kronecker delta function == 1 only if u and v are in the same community
| v=1 | v=2 | v=3 | v=4 | |
|---|---|---|---|---|
| u = 1 | (0 - (2*2)/8)*1 | (1 - (2*2)/8)*1 | (0 - (2*1)/8)*0 | |
| u = 2 | (1 - (2*2)/8)*1 | (0 - (2*2)/8)*1 | (1 - (2*3)/8)*1 | (0 - (2*1)/8)*0 |
| u = 3 | (1 - (3*2)/8)*1 | (0 - (3*3)/8)*1 | ||
| u = 4 | (0 - (1*2)/8)*0 | (0 - (1*2)/8)*0 | (0 - (1*1)/8)*1 |
Weighted:
Directed:
Input
Desired
output
clusters = 5, modularity = 0.437
Blondel, Fast unfolding of communities in large networks, 2008
Algorithm: Fast unfolding
Input: Graph \( G(V,E) \)
Output: Communities
Assign every node to is own community:
repeat
repeat
until no more improvement (local max of modularity):
Nodes from communities merged into "super nodes":
Weight on the links added up
until no more changes (max modularity):
For every node evaluate the modularity delta \( (\Delta Q) \) when putting node \( i \) into the community of some other neighbour \( j \);
Move \( i \) to a community of node \( j \) that yelds the largest gain in \( \Delta Q \).
Removing \( i \) from \( D \)
Merging \( i \) into \( C \)
Before
Intermediate
After
\( D - i \)
\( D - i \)
\( C + i \)
\( C \)
\( D \)
\( C \)
\( i \)
\( i \)
\( i \)
Each pass is made of two phases:
best: clusters = 6, modularity = 0.345
Input
Desired
output
https://github.com/saref/bayan
Input
Desired
output
Input: Graph G (V,E)
Output: Communities
Initialize labels on all nodes:
Randomized node order:
repeat
For every node replace its label with occurring with the highest
frequency among neighbors (ties are broken uniformly randomly);
until every node has a label that the maximum number of the neighbors have;
clusters = 3, modularity = 0.435
clusters = 4, modularity = 0.445
Algorithm: Walktrap community detection
Input: Graph G(V,E)
Output: Dendrogram/communities
Assign each vertex to its own community:
Compute random walk distance between adjacent vertices:
for n-1 steps do
choose two "closest" communities and merge them:
update distance between communities
ens
P. Pons and M. Latapy, 2006
clusters = 4, modularity = 0.440
Community detection:
Graph partitioning(sparse cuts)
Vertex clustering (vertex similarity)
image from W. Liu , 2014
3 prizewinners:
Deadline:
- 10 October (AoE)
- best solutions are supposed to be discussed 17 October