Graph mining

2019/2020

Pierluigi Crescenzi

Université de Paris, IRIF

Inspired by Advanced Algorithms and Graph Mining by Andrea Marino (University of Florence)

Centrality measure computation
Betweenness and closeness

Centrality measure computation

Central/peripheral nodes

pierluigi.crescenzi@irif.fr

GM

#03

  • Graph \(G=(V,E)\)
    • Undirected
    • Unweighted
    • Connected
  • Distance \(d(u, v)\)
    • # of edges in shortest path from \(u\) to \(v\)
  • Eccentricity in subgraph​ \(S\)
    • \(e_S(v)=\max_{w\in S} d( v, w)\)
  • Central nodes in \(S\)
    • Min \(e_S\)
  • Peripheral nodes in \(S\)
    • Max distance from central
  • Hodology
    • Study of pathways

Centrality measure computation

Node centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Degree centrality of node \(v\)
    • \(\delta(v)=\frac{|\mathrm{deg}(v)|}{|V|-1}\)
  • Betweenness centrality of node \(v\)
    • \(\sigma_{u,w}\): # shortest paths from \(u\) to \(w\)
    • \(\sigma_{u,w}(v)\): # shortest paths from \(u\) to \(w\) passing through \(v\)
    • \(\beta(v) = \frac{\sum_{u,w}\frac{\sigma_{u,w}(v)}{\sigma_{u,w}}}{\frac{(|V|-1)(|V|-2)}{2}}\)
  • Closeness centrality of node \(v\)
    • \(\varphi(v) = \sum_{w \in V} d(v,w)\)
      • Farness of node \(v\)
    • \(\kappa(v) = \frac{1}{\frac{\varphi(v)}{n-1}} = \frac{n-1}{\varphi(v)}\)

\(1.0\)

\(0.25\)

\(0.25\)

\(0.25\)

\(0.25\)

\(0.0\)

\(0.0\)

\(0.0\)

\(0.0\)

\(0.571\)

\(0.571\)

\(0.571\)

\(0.571\)

  • Eccentricity centrality of node \(v\)
    • \(\varepsilon(v) = \frac{1}{\max_wd(v,w)}\)

\(0.5\)

\(0.5\)

\(0.5\)

\(0.5\)

Centrality measure computation

Node centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Closeness centrality in the case of directed weakly connected graphs
    • Very realistic case
  • Two approaches
    • Generalise "classical" definition\[\kappa(v) =\frac{\rho(v) - 1}{\varphi(v)}\frac{\rho(v)-1}{n-1}=\frac{(\rho(v) - 1)^2}{(n-1)\varphi( v)}\]where \(\rho(v) = |\Rho(v)|\) and \(\Rho(v)\) is set of nodes reachable from \(v\)
    • Observe that infinite distance should not contribute\[\eta(v)=\sum_{w \in V-\{v\}}\frac{1}{d(v,w)}\](called harmonic centrality)

Centrality measure computation

Node centrality

pierluigi.crescenzi@irif.fr

GM

#03

Interactive version available at http://schochastics.net/sna/periodic.html

Centrality measure computation

Node centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Correlation between centrality measures
    • Lot of papers
      • Not always consistent
    • An example (Florentine families)
      • Pearson's correlation coefficient
        • 1 totally correlated, -1 totally anti-correlated
Degree Eccentricity Closeness Betweenness
Degree 1 0.49439 0.82451 0.84415
Eccentricity 1 0.8299 0.57968
Closeness 1 0.80662

Centrality measure computation

Axioms for centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Density axiom
    • \(k\)-clique and directed \(k\)-cycle connected by edge \(\{x,y\}\)
    • Centrality of \(x\) is strictly larger than centrality of \(y\)
  • Degree
    • \(\delta(x)=k>2=\delta(y)\)
  • Closeness
    • \(\varphi(x)=(k-1)+\frac{k(k+1)}{2}=(k-1)+k+\frac{k(k-1)}{2}\\=2(k-1)+1+\frac{k(k-1)}{2}=\varphi(y)\)
  • Betweenness
    • \(\beta(x)=2(k-1)k<2(k-1)k+\frac{(k-2)(k-1)}{2}=\beta(y)\)
  • Harmonic
    • \(\eta(x)=(k-1)+\sum_{i=1}^k\frac{1}{i}>1+\frac{k-1}{2}+\sum_{i=1}^{k-1}\frac{1}{i}=\eta(y)\)

Centrality measure computation

Some observations

pierluigi.crescenzi@irif.fr

GM

#03

  • A "jungle" of centrality measures

    • Sometimes strongly correlated, sometimes not

    • Different computational complexity

    • Different interpretation

  • Difficult to "validate" them

    • Not so many benchmarks

    • Why Bacon should be more important than Clooney?

  • Application dependent notion

    • Bioinformatics seems to be an interesting application field

    • Importance of nodes is sometimes clear

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Naive algorithm

    • For every node \(v\), set \(\beta(v)=0\)

    • For every node \(s\), perform BFS to find all shortest paths from \(s\) to all other nodes \(t\): let \(\Sigma_{s,t}\) be the set of these paths

    • For every node \(v\) and for each pair \(s,t\), count number of times \(v\) appears in a path in \(\Sigma_{s,t}\), divide by \(|\Sigma_{s,t}|\) and add to \(\beta(v)\)

    • For every node \(v\), return \(\beta(v)\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

\(a\)

\(b\)

\(c\)

\(d\)

\(e\)

\(f\)

\(g\)

\(h\)

\(i\)

\(j\)

\(k\)

  • \(\Sigma_{a,x}=\{(a,x)\}\) for \(x\in\{b,c,d,e\}\)
  • \(\Sigma_{a,f}=\{(a,b,f),(a,c,f)\}\)
  • \(\Sigma_{a,g}=\{(a,d,g)\}\)
  • \(\Sigma_{a,h}=\{(a,d,h),(a,e,h)\}\)
  • \(\Sigma_{a,i}=\{(a,b,f,i),(a,c,f,i),(a,d,g,i)\}\)
  • \(\Sigma_{a,j}=\{(a,d,g,j),(a,d,h,j),(a,e,h,j)\}\)
  • \(\Sigma_{a,k}=\{(a,b,f,i,k),(a,c,f,i,k),(a,d,g,i,k),(a,d,g,j,k),(a,d,h,j,k),(a,e,h,j,k)\}\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Naive algorithm very expensive in terms of space

    • Storing all shortest paths between all pairs of nodes

    • Little improvement

      • Storing only paths from one source

        • For every node \(v\), set \(\beta(v)=0\)

        • For every node \(s\), perform BFS to find all shortest paths from \(s\) to all other nodes \(t\): let \(\Sigma_{s,t}\) be the set of these paths

          • For every node \(v\) and for every target \(t\), count number of times \(v\) appears in a path in \(\Sigma_{s,t}\), divide by \(|\Sigma_{s,t}|\) and add to \(\beta(v)\)

        • For every node \(v\), return \(\beta(v)\)

    • ​​Still unfeasible in the worst case

      • ​Why?

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Brandes' algorithm

    • Integrating the computation of the betweenness centrality in the backward phase (construction of the shortest paths)

      • Without constructing the shortest paths

    • For every node \(v\), set \(\beta(v)=0\)
    • For every node \(s\)
      • For every node \(v\), \(\delta_s(v)=0\)
      • Perform (as before) BFS from \(s\) to all other nodes \(t\) and compute number of shortest paths \(\sigma_{s,t}\) from \(s\) to \(t\)
      • Going back to \(s\), increment \(\delta_s(v)\) as appropriate when node \(v\) is reached by using \(\sigma_{s,\bullet}\)-values

      • Add \(\delta_s(v)\) to \(\beta(v)\)

    • ​​Return \(\beta(v)\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

\(a\)

  • Forward phase: computing number of shortest paths
    • When a new parent is discovered, add its number of shortest paths from the source

\(b\)

\(c\)

\(d\)

\(e\)

\(f\)

\(g\)

\(h\)

\(i\)

\(j\)

\(k\)

\(b\)

\(c\)

\(d\)

\(e\)

\(f\)

\(g\)

\(h\)

\(i\)

\(j\)

\(k\)

\(1\)

\(1\)

\(1\)

\(1\)

\(2\)

\(1\)

\(2\)

\(3\)

\(3\)

\(6\)

\(\sigma_{a,\bullet}\)-values (\(\sigma_{a,a}=1\))

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Backward (accumulation) phase: computing contributions to betweenness centrality
    • Pair-wise dependency\[\delta_{s,t}(v)=\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}\]
      • Hence, \(\beta(v)=\sum_{s,t}\delta_{s,t}(v)\)
    • One-sided dependency\[\delta_s(v)=\sum_t\delta_{s,t}(v)\]
      • Hence, \(\beta(v)=\sum_{s}\delta_{s}(v)\)
    • Basic idea
      • \(\delta_s(v)\) can be computed recursively by using \(\delta_s(w)\) for \(w\) immediately following \(v\) on some shortest path from \(s\)
        • ​\(\mathrm{PH}_s(v)\): set of these nodes \(w\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Fact 1. If \(\sigma_{s,t}(v)\neq0\) then \(\sigma_{s,t}(v) = \sigma_{s,v}\sigma_{v,t}\)
  • Fact 2. If the arc \((u,v)\) belongs to a shortest path from \(s\) to \(t\), then \(\sigma_{s,t}(u,v) = \sigma_{s,u}\sigma_{v,t}\)
    • \(\sigma_{s,t}(u,v)\): number of shortest paths from \(s\) to \(t\) passing through edge \((u,v)\)
  • Fact 3. If \(\sigma_{s,t}(v)\neq0\) then \(\sigma_{v,t} = \sum_{w\in\mathrm{PH}_s(v)}\sigma_{v,t}(v,w)\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Brandes' lemma. For every \(s\neq v\)\[\delta_s(v)=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\frac{\delta_s(u)}{\sigma_{s,u}}\]
  • Proof
    • If \(t=v\), then contribution of \(t\) to \(\delta_s(v)=1\)
    • Let \(T_u\) be set of \(t\neq v\) such that \(\sigma_{s,t}(v,u)\neq 0\) and let \[T=\bigcup_{u\in\mathrm{PH}_s(v)}T_u\]
    • Then\[\delta_s(v)=1+\sum_{t\in T}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}} = 1+\sum_{t\in T}\frac{\sigma_{s,v}\sigma_{v,t}}{\sigma_{s,t}} = 1+\sigma_{s,v}\sum_{t\in T}\sum_{u\in\mathrm{PH}_s(v)}\frac{\sigma_{v,t}(v,u)}{\sigma_{s,t}}\\=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\sum_{t\in T_u}\frac{\sigma_{v,t}(v,u)}{\sigma_{s,t}}=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\sum_{t\in T_u}\frac{\sigma_{u,t}}{\sigma_{s,t}}\\=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\frac{1}{\sigma_{s,u}}\sum_{t\in T_u}\frac{\sigma_{s,u}\sigma_{u,t}}{\sigma_{s,t}}=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\frac{\delta_s(u)}{\sigma_{s,u}}\]

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

\(a\)

  • Accumulation phase: computing contributions to betweenness centrality

\(b\)

\(c\)

\(d\)

\(e\)

\(f\)

\(g\)

\(h\)

\(i\)

\(j\)

\(k\)

\(b\)

\(c\)

\(d\)

\(e\)

\(f\)

\(g\)

\(h\)

\(i\)

\(j\)

\(k\)

\(1\)

\(1\)

\(1\)

\(1\)

\(2\)

\(1\)

\(2\)

\(3\)

\(3\)

\(6\)

\(\sigma_{a,\bullet}\)-values (\(\sigma_{a,a}=1\))

\(1\)

\[\delta_s(v)=1+\sigma_{s,v}\sum_{u\in\mathrm{PH}_s(v)}\frac{\delta_s(u)}{\sigma_{s,u}}\]

\[\delta_a(i)=1+\sigma_{a,i}\frac{\delta_a(k)}{\sigma_{a,k}}=1+3\frac{1}{6}=\frac{3}{2}=\delta_a(j)\]

\(\frac{3}{2}\)

\(\frac{3}{2}\)

\[\delta_a(f)=1+\sigma_{a,f}\frac{\delta_a(i)}{\sigma_{a,i}}=1+2\frac{3/2}{3}=2=\delta_a(h)\]

\(2\)

\(2\)

\[\delta_a(g)=1+\sigma_{a,g}\left(\frac{\delta_a(i)}{\sigma_{a,i}}+\frac{\delta_a(j)}{\sigma_{a,j}}\right)=1+1\left(\frac{3/2}{3}+\frac{3/2}{3}\right)=2\]

\(2\)

\(2\)

\(2\)

\(2\)

\[\delta_a(b)=1+\sigma_{a,b}\frac{\delta_a(f)}{\sigma_{a,f}}=1+1\frac{2}{2}=2=\delta_a(c)=\delta_a(e)\]

\(4\)

\[\delta_a(d)=1+\sigma_{a,d}\left(\frac{\delta_a(g)}{\sigma_{a,g}}+\frac{\delta_a(h)}{\sigma_{a,h}}\right)=1+1\left(\frac{2}{1}+\frac{2}{2}\right)=4\]

\(11\)

Centrality measure computation

Betweenness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Complexity of Brandes' algorithm
    • Time \(O(nm)\)
    • Space \(O(n+m)\)
  • Unlikely that we can do better
  • Other results
    • Approximation
      • Through sampling
    • Dynamic version
    • Distributed version

Centrality measure computation

Closeness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Textbook algorithm
    • Time \(O(nm)\)
    • Space \(O(n+m)\)
  • Unlikely that we can do better
  • Approximation
    • Through sampling
  • What about looking for the top-\(k\) nodes?
  • \(k\)-Hitting Set (\(k\)-HS) problem
    • Domain \(X\)
    • Collection \(C\) of subsets of \(X\) with \(|X|\leq k\log|C|\)
    • Is there \(c\in C\) such that, for any \(y\in C\), \(c\cap y\neq \emptyset\)?
  • Conjecture: there is no \(\epsilon\) such that, for all \(k\geq 1\), there is algorithm solving \(k\)-HS in time \(O(n^{2-\epsilon})\) 
    • \(n\): size of input

Centrality measure computation

Closeness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • From HS to farness
    • Example: \(X = \{a,b,c,d\}\) and \(C = \{\{a,b\}, \{b,c\}, \{c,d\}\}\)

\(I_N\)

\(I_N\)

\(K_4\)

\(K_4\)

\(c\)

\(\varphi(c) = 4N+2+2(|C|-1)+3|X|+(2|C|+|\{y\in C: c\cap y=\emptyset\}|)\)

\(2\)

\(2\)

\(1\)

\(1\)

\(2\)

\(2\)

\(1\) if \(x\in c\), \(2\) otherwise

\(2\) if \(c\cap y \neq \emptyset\), \(3\) otherwise

\(2\) if \(x\in c\), \(1\) otherwise

\(= 4N+4|C|+3|X|+\delta_{hs}(c)\)

\delta_{hs}(c)=\left\{\begin{array}{ll}0 & \mathrm{if\ hitting\ set}\\>0 & \mathrm{otherwise}\end{array}\right.

For \(N\) sufficiently large, all other nodes have farness greater than \(4N+4|C|+3|X|\).

Hence, HS iff minimum farness is \(4N+4|C|+3|X|\).

Centrality measure computation

Closeness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • Computing closeness top node
    • iFUB: perform complete BFS starting from a (small) set of (cleverly) selected nodes
    • "Orthogonal" idea
      • Perform a partial BFS starting from all nodes \(v\)
        • In non-increasing order of degree
      • Each pBFS
        • Receives the starting node \(v\) and a lower bound \(x\) on the closeness centrality of top node
        • Returns \(0\) if \(v\) is not the top node, \(\kappa(v)\) otherwise
  • Generalises to
    • Top \(k\) nodes
    • Directed weakly connected graphs

Centrality measure computation

Closeness centrality

pierluigi.crescenzi@irif.fr

GM

#03

  • pBFS(\(v\),\(x\))

\(v\)

\(\Gamma_1(v)\)

\( n_1(v)=1+\gamma_1(v)\)

\(\Gamma_2(v)\)

\(n_2(v)=n_1(v)+\gamma_2(v)\)

\(\gamma_1(v) = |\Gamma_1(v)|\)

\(\varphi_1(v)=1\cdot\gamma_1(v)\)

\(\gamma_2(v) = |\Gamma_2(v)|\)

\(\varphi_2(v)=\varphi_1(v)+2\cdot\gamma_2(v)\)

\(\Gamma_d(v)\)

\(n_d(v)=n_{d-1}(v)+\gamma_d(v)\)

\(\gamma_d(v) = |\Gamma_d(v)|\)

\(\varphi_d(v)=\varphi_{d-1}(v)+d\cdot\gamma_d(v)\)

\(\varphi(v) \geq \varphi_d(v) + (d+1)\gamma_ {d+1}(v)+(d+2)(n-n_{d+1}(v))\)

\(\varphi(v) \geq \varphi_d(v) - \gamma_ {d+1}(v)+(d+2)(n-n_d(v))\)

?

?

?

\(\varphi(v) \geq \tilde{\varphi}_d(v) = \varphi_d(v) - \tilde{\gamma}_ {d+1}(v)+(d+2)(n-n_d(v))\)

Everything is known

Stop as soon as \((\kappa_{\mathrm{top}}\geq) x \geq \frac{n-1}{\tilde{\varphi}_d(v)}(\geq \kappa(v))\) and return 0. Otherwise return \(\kappa(v)\).

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

Since \(n_{d+1}(v)=n_d(v)+\gamma_{d+1}(v)\)

\(\tilde{\gamma}_ {d+1}(v)=\sum_{u\in\Gamma_{d}(v)}(\mathrm{deg}(u)-1)\)

  • The algorithm
    • \(\kappa(v) = 0\) for each \(v\)
    • \(\kappa_\mathrm{top} = 0\)
    • For each \(v \in V\) in decreasing order of degree
      • \(\kappa(v) = \mathrm{pBFS}(v,x)\)
      • If \(\kappa(v) \neq 0\), then \(\kappa_\mathrm{top} = \kappa(v)\)
    • Return \(\kappa_\mathrm{top}\)

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

  • The IMDB network
    • Nodes: actors
    • Edges: at least one co-presence

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

Semels ('40)

Corrado ('45)

Flowers ('50-'80)

Welles ('85-'90)

Madsen ('14)

Hitler ('05-'10)

Lee ('95-'00)

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

  • The case of directed weakly connected graphs
    • Closeness definition\[\kappa(v) =\frac{\rho(v) - 1}{\varphi(v)}\frac{\rho(v)-1}{n-1}=\frac{(\rho(v) - 1)^2}{(n-1)\varphi( v)}\]where \(\rho(v) = |\Rho(v)|\) and \(\Rho(v)\) is set of nodes reachable from \(v\)
    • Lower bound on farness\[\varphi(v)\geq\tilde{\varphi}_d(v) = \varphi_d(v) - \tilde{\gamma}_ {d+1}(v)+(d+2)(\rho(v)-n_d(v))\]
  • ​We don't know \(\rho(v)\)
    • ​We need to do a BFS from each node
  • Idea
    • We can compute upper and lower bounds on \(\rho(v)\) and use them to compute lower bound for closeness centrality

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

  • Using the bounds
    • If \(\alpha(v)\leq\rho(v)\leq\omega(v)\), then\[\frac{1}{\kappa(v)} \geq(n-1)\min\left(\frac{\tilde{\varphi}(v,\alpha(v))}{(\alpha(v)-1)^2},\frac{\tilde{\varphi}(v,\omega(v))}{(\omega(v)-1)^2}\right)\]where\[\tilde{\varphi}_d(v,x) = \varphi_d(v) - \tilde{\gamma}_ {d+1}(v)+(d+2)(x-n_d(v))\]
      • Let \(a=d+2\) and \(b=\tilde{\gamma}_{d+1}(v)+a(n_d(v)-1)-\varphi_d(v)\)
        • ​Note that \(a>0\) and \(b>0\) (since \(\varphi_d(v)<a(n_d(v)-1)\))
      • We know that\[\varphi(v)\geq\varphi_d(v) - \tilde{\gamma}_ {d+1}(v)+a(\rho(v)-n_d(v))\\=a(\rho(v)-1)+\varphi_d(v)-\tilde{\gamma}_{d+1}(v)-a(n_d(v)-1)=a(\rho(v)-1)-b\]
      • Hence \[\frac{1}{\kappa(v)}=\frac{(n-1)\varphi( v)}{(\rho(v)-1)^2}\geq(n-1)\frac{a(\rho(v)-1)-b}{(\rho(v)-1)^2}\]
      • Function \(g(x)=\frac{a(x-1)-b}{x^2}\) in interval \([x_1,x_2]\) with \(x_1,x_2>0\) has minimum in \(x_1\) or \(x_2\): bound follows

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Closeness centrality

  • Computing the bounds \(\alpha(v)\) and \(\omega(v)\)
    • Compute graph of strongly connected components (DAG)
    • If \(u,v\in C\), then \(\rho(u)=\rho(v)=\sum_{D\in R(C)}|D|\)
      • \(R(C)\): set of components reachable from \(C\)
    • Hence, we need lower and upper bound \(\alpha(C)\) and \(\omega(C)\) on \(\sum_{D\in R(C)}|D|\)
    • We process components in reverse topological order
      • \(\alpha(C)=|C|+\max_{D\in N(C)}\alpha(D)\)
      • \(\omega(C)=|C|+\sum_{D\in N(C)}\omega(D)\)
  • \(\alpha(C_6)=4=\omega(C_6)\)
  • \(\alpha(C_2)=6=\omega(C_2)\)
  • \(\alpha(C_5)=7=\omega(C_5)\)
  • \(\alpha(C_4)=9=\omega(C_4)\)
  • \(\alpha(C_3)=10<26=\omega(C_3)\)
  • \(\alpha(C_1)=11<34=\omega(C_1)\)

\(C_1\)

\(C_2\)

\(C_3\)

\(C_4\)

\(C_5\)

\(C_6\)

\(4\)

\(3\)

\(5\)

\(4\)

\(2\)

\(6\)

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Improving our closeness centrality

  • The importance of being important
    • Fast spreading of information
    • Control of the information flow among vertices
    • Fast collection  of information from other nodes
  • Increasing the value of centrality by adding links can increase the importance within the network
  • \(\varphi(x_1)=1+1+3+2+3=10\)

  • \((x_1,x_6)\): \(\varphi(x_1)=1+1+3+2+1=8\)

  • \((x_1,x_5)\): \(\varphi(x_1)=1+1+2+1+2=7\)

  • \((x_1,x_4)\): \(\varphi(x_1)=1+1+1+2+3=8\)

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Improving our closeness centrality

  • Maximum Closeness Improvement Problem (MCI)
    • Input: a graph \(G=(V,E)\), a vertex \(u \in V\) and an integer \(k \in \mathbb{N}\)
    • A set \(S=\{v| v \in V\setminus N(u)\}\), such that \(|S|\leq k\)
    • Maximize \(\kappa(v)\)in the graph obtained from \(G\) by adding the edges \((u,v)\) with \(v\in S\)
  • For each \(\gamma \geq 1-\frac{1}{15e}\), there is no \(\gamma-\)approximation algorithm for the MCI problem, unless \(P=NP\)
    • Reduction from the dominating set problem
  • A greedy algorithm approximates MCI within a factor \(1-\frac{1}{e}\)
    • A function \(f\) is monotone submodular if for any pair of sets \(S\subseteq T \subseteq X\) and for any element \(e\in X\setminus T\), \(f(S\cup\{e\}) - f(S) \geq f(T\cup \{e\}) - f(T)\)
    • For each vertex \(u\), \(\kappa(u)\) is monotone and submodular with respect to any feasible solution for MCI
  • The DBLP network
    • Nodes: researchers
    • Edges: at least one co-authorship

pierluigi.crescenzi@irif.fr

GM

#03

Centrality measure computation

Improving our closeness centrality