On Computing Betweenness Centrality in a Distributed Environment

Pierluigi Crescenzi

IRIF, UParis

Pierre Fraigniaud

IRIF,UParis

Ami Paz

UWien

Introduction

Betweenness Centrality

  • Measure of importance based on communication flow
    • Nodes with high betweenness centrality lie on communication paths and can control information flow
  • Formally, for each node \(v\)
    • \(\mathsf{bc}_v=\frac{1}{(n-1)(n-2)}\sum_{s\neq v, t\neq v}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}\) where
      • \(σ_{s,t}(v)\) = #shortest \(s\)-\(t\) paths passing trough \(v\)
      • \(σ_{s,t}\) = #shortest \(s\)-\(t\) paths
  • Applies to wide range of problems
    • Social networks
    • Biology
    • Transport
    • Scientific cooperation
    • ...

Example

s,t 1 2 3 4 5
1,2 1 0 0 0
1,3 1 0 0 0
1,4 2 1 1 0
1,5 2 1 1 2
2,3 2 1 1 0
2,4 1 0 0 0
2,5 1 0 0 1
3,4 1 0 0 0
3,5 1 0 0 1
4,5 1 0 0 0

\(\sigma_{s,t}\)

\(\sigma_{s,t}(v)\)

  • \(\mathsf{bc}_1=\frac{1}{2}\)
  • \(\mathsf{bc}_2=\frac{1}{2}+\frac{1}{2}=1\)
  • \(\mathsf{bc}_3=\frac{1}{2}\)
  • \(\mathsf{bc}_4=1+\frac{1}{2}+1+1=\frac{7}{2}\)
  • \(\mathsf{bc}_5=0\)

Communication networks

  • Applications
    • Wireless mesh networks design
    • Security
    • Transmission rates optimisation
    • Topology control
    • Resource placement and allocation
    • Link-sensing
    • Routing
    • ...
  • Our motivation
    • Frequency of hello messages for link-sensing in wireless networks : \(f(v) \approx \sqrt\frac{\deg_v}{\mathsf{bc}_v}\)
    • Objective 
      • Integrate bc computation in routing protocols

Our result

  • Routing protocols
    • Link-state
      • Each node knows the entire graph
      • The computation of the betweenness centrality may require excessive computational resources
        • \(O(nm)\) sequential time
    • Distance-vector
      • Each node knows the next hop towards each target node
      • No known efficient algorithms for computing betweenness centrality
      • ​​We provide such an algorithm
        • Simple and fast
          • Assuming polynomial number of shortest paths
            • Otherwise approximation

Simple and fast

  • Objective
    • Design an algorithm for betweenness centrality
      of complexity similar to the one of distributed Bellman-Ford  

Preliminaries

Simple facts

  • If \(\sigma_{s,t}(v)\neq0\) then \(\sigma_{s,t}(v) = \sigma_{s,v}\sigma_{v,t}\)
  • If the arc \((u,v)\) belongs to a shortest path from \(s\) to \(t\), then \(\sigma_{s,t}(u,v) = \sigma_{s,u}\sigma_{v,t}\)

Simple facts

  • \(NH_v(t)\) : set of next-hops towards \(t\)
  • \(PH_v(s)\) : set of nodes for which \(v\) is predecessor in shortest path from \(s\)
  • For every \(t\neq v\), \(\sigma_{v,t}=\sum_{u\in NH_v(t)}\sigma_{u,t}\)
  • If \(v\) is in a shortest path from \(s\) to \(t\), \(\sigma_{v,t}=\sum_{u\in PH_v(s)}\sigma_{v,t}(v,u)\)

Less simple fact

  • From definition, \(\mathsf{bc}_v=\frac{1}{(n-1)(n-2)}\sum_{s\neq v}\mathsf{bc}_v(s)\) where
    • \(\mathsf{bc}_v(s)=\sum_{t\neq v} \frac{\sigma_{s,t}(v)}{\sigma_{s,t}}\)
  • Is not difficult to prove that\[\mathsf{bc}_v(s)=\sigma_{s,v} \sum_{u\in PH_v(s)}\frac{\mathsf{bc}_u(s)+1}{\sigma_{s,u}}\]
    • From global definition to local definition
      • We need information only from a subset of neighbors

The distributed algorithm

A first simple version

\(\sigma_{v,t}=\sum_{u\in NH_v(t)}\sigma_{u,t}\)

\(\mathsf{bc}_v(s)=\sigma_{s,v} \sum_{u\in PH_v(s)}\frac{\mathsf{bc}_u(s)+1}{\sigma_{s,u}}\)

\(\mathsf{bc}_v=\frac{1}{(n-1)(n-2)}\sum_{s\neq v}\mathsf{bc}_v(s)\)

A first simple version

  • Theorem
    • Algorithm 2 enables every node to compute its betweenness centrality in any network G after 2D+1 phases

A more efficient quite simple version

Experimental results I

Global error

\(\frac{\|\mathsf{bc}-C\|_2}{\|\mathsf{bc}\|_2}=\frac{\sqrt{\sum_{v\in V}(\mathsf{bc}_v-C[v])^2}}{\sqrt{\sum_{v\in V}(\mathsf{bc}_v)^2}}\)

  • How far are current values \(C\) from final values \(\mathsf{bc}\) 
  • To be computed at the end of each send-receive phase

Grids and hypercubes

  • \(7\times 6\) grid and hypercube of dimension 11

    • Hence, diameter is 11

Erdös-Renyi

  • Erdös-Renyi graphs with 500 nodes and different diameters

    • 20 samples for each diameter

Real-world networks

  • E-mail network with 1133 nodes and diameter 8

  • Autonomous system network with 3011 nodes and diameter 9

Weighted Erdös-Renyi

  • Randomly weighted Erdös-Renyi with 500 nodes

    • Diameter noted is diameter of underlying, unweighted graphs

Weighted real-world networks

  • Road network with 3353 nodes

    • Rome, Italy, 1999

Experimental results II

Local error

  • \(T_D\) : time it takes for the Bellman-Ford algorithm to converge locally
    • I.e., until the distances are correctly computed
  • \(T_C\) : time it takes for the betweenness centrality value to converge locally
  • Local convergence time vs betweenness

betweenness

time

\(b\)

\(t\)

  • There is at least one node that converged in time \(t\), with betweenness centrality \(b\)
    • Don't show how many

AS network

  • Black : \(T_D\)

  • Red : \(T_C\)

Road network

Conclusion

The open problem

  • Assuming polynomial number of shortest paths
    • \(\mathrm{CONGEST}(B)\)
      • Variant of the CONGEST model in which at most \(B\) words of \(O(\log n)\) bits each can be sent through each link at each round
    • Our distributed algorithm for weighted graphs applies to the \(\mathrm{CONGEST}(n)\) model, and converges in \(O(D)\) rounds
    • The known distributed algorithms for unweighted graphs apply to the \(\mathrm{CONGEST}(1)\) model, and converge in \(O(n)\) rounds
    • Open problem
      • Compute exact betweenness centrality of weighted graphs in \(O(\frac{n}{B} + D)\) rounds in the \(\mathrm{CONGEST}(B)\) model, for \(1 \leq B \leq n\)

Thank you