2019/2020
Université de Paris, IRIF
Inspired by Advanced Algorithms and Graph Mining by Andrea Marino (University of Florence)
By Ageev Andrew - Own Work based on Image:Map of USA showing state names.png
CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=22945395
Given an unweighted graph \(G = (V,E)\) (strongly) connected
Distance
The distance \(d(u,v)\) is the number of edges along shortest path from \(u\) to \(v\)
Diameter
\(D = \max_{u,v \in V} d(u,v)\)
Eccentricity of a node \(u\): \(\mathrm{ecc}(u)=\max_{v\in V}d(u,v)\)
Diameter: \(D=\max_{u\in V}\mathrm{ecc}(u)\)
Frontier \(F_i(u)\) of a node \(u\): set of nodes at distance \(i\)
Nodes at level \(i\) of BFS tree
Forward eccentricity of a node \(u\): \(\mathrm{ecc}_F(u)=\max_{v\in V}d(u,v)\)
Backward eccentricity of a node \(u\): \(\mathrm{ecc}_B(u)=\max_{v\in V}d(v,u)\)
Diameter: \(D=\max_{u\in V}\{\mathrm{ecc}_F(u),\mathrm{ecc}_B(u)\}\)
Backward frontier \(F^B_i(u)\) of a node \(u\): nodes at level \(i\) of backward BFS tree to \(u\)
Forward BFS tree
Backward BFS tree
Lower bound: eccentricity (height of the BFS tree)
Example: 3
Upper bound: twice the eccentricity
Example: 6 (every node can reach another node going to \(v_1\) by \(\leq 3\) edges and going to the destination by \(\leq 3\) edges)
Fact: \(x\in F_i(u)\) and \(y\in F_j(u)\) implies \(d(x,y)\leq i+j\)
Bounds by sampling but very often \(L < D < U\)
In the example diameter is 4: \(d(v_{7},v_{8})=4\)
Lower bound: maximum between \(\mathrm{ecc}_F(u)\) (height of the fBFS tree) and \(\mathrm{ecc}_B(u)\) (height of the bBFS tree)
Example: 5
Upper bound: \(\mathrm{ecc}_F(u) + \mathrm{ecc}_B(u)\)
Example: 9 (every node can reach another node going to \(v_1\) by \(\leq 5\) edges and going to the destination by \(\leq 4\) edges)
Fact: \(x\in F_i^B(u)\) and \(y\in F_j^F(u)\) implies \(d(x,y)\leq i+j\)
Bounds by sampling but very often \(L < D < U\)
In the example diameter is 7: \(d(v_{10},v_{12})=7\)
Run a BFS from a (random) node \(r\): let \(a\) be the farthest node
Run a BFS from \(a\): let \(b\) be the farthest node
Return \(d(a,b)\)
Run a fBFS and a bBFS from a (random) node \(r\): let \(a_1\) and \(a_2\) be the farthest nodes
Run a bBFS (fBFS) from \(a_1\) (\(a_2\)): let \(b_1\) (\(b_2\)) be the farthest node
Return \(\max\{d(b_1,a_1),d(a_2,b_2)\}\)
The textbook algorithm runs a BFS for any node and return the maximum found eccentricity
Idea
Perform the BFSes one after the other specifying the order in which they have to be executed
While doing this
Refine the lower bound (maximum eccentricity)
Upper bound eccentricities of remaining nodes
Stop when the remaining nodes cannot have eccentricity higher than our lower bound
Good order can be inferred looking at some properties of BFS trees
Main observation
For any \(1\leq i< \mathrm{ecc}(u)\) and \(1 \leq k < i\), and for any \(x\in F_{i-k}(u)\) such that \(\mathrm{ecc}(x)>2(i-1)\), there exists \(y\in F_j(u)\) such that \(d(x,y)=\mathrm{ecc}(x)\) with \(j \geq i\)
\(\mathrm{ecc}(x)>2(i-1)\Rightarrow\exists y[\mathrm{ecc}(y)\geq\mathrm{ecc}(x)]\)
Main observation
For any \(1\leq i< \mathrm{ecc}(u)\) and \(1 \leq k < i\), and for any \(x\in F_{i-k}(u)\) such that \(\mathrm{ecc}(x)>2(i-1)\), there exists \(y\in F_j(u)\) such that \(d(x,y)=\mathrm{ecc}(x)\) with \(j \geq i\)
Proof
Since \(\mathrm{ecc}(x)>2(i-1)\), then there exists \(y_x\) whose distance from \(x\) is equal to \(\mathrm{ecc}(x)\) and, hence, greater than \(2(i-1)\)
If \(y_x\) was in \(F_j(u)\) with \(j < i\), then \[d(x,y_x)\leq (i-1)+(i-k)\leq 2\max\{i-1,i-k\} = 2(i-1)\]
Contradiction: hence, \(y_x\in F_j(u)\) with \(j\geq i\)
Corollary: if \(lb\) is the maximum among all the eccentricities of the nodes in or below the level \(i\), then the eccentricities of all other nodes is bounded by \(\max\{lb,2(i-1)\}\)
Corollary of the main observation
If \(lb\) is the maximum among all the eccentricities of the nodes in or below the level \(i\), then the eccentricities of all other nodes is bounded by \(\max\{lb,2(i-1)\}\)
Notation: \(B_{i}(u)=\max_{v\in F_i(u)}\mathrm{ecc}(v)\)
The algorithm (bottom-up)
Given a node \(u\) and its BFS tree
Set \(i=\mathrm{ecc}(u)\) and \(M=B_{i}(u)\)
If \(M > 2(i-1)\), then return \(M\), else set \(i=i-1\) and \(M=\max\{M,B_{i}(u)\}\) and repeat this step
Notation: \((X,Y,a,b)\in\{(B,F,x,y),(F,B,t,z)\}\)
Main observation
For any \(1\leq i< \mathrm{ecc}_X(u)\) and \(1 \leq k < i\), and for any \(a\in F^X_{i-k}(u)\) such that \(\mathrm{ecc}_Y(a)>2(i-1)\), there exists \(b\in F^Y_j(u)\) such that \(\mathrm{ecc}_X(b)\geq\mathrm{ecc}_X(a)\) with \(j \geq i\)
Notation: \((X,Y,a,b)\in\{(B,F,x,y),(F,B,t,z)\}\)
Main observation
For any \(1\leq i< \mathrm{ecc}_X(u)\) and \(1 \leq k < i\), and for any \(a\in F^X_{i-k}(u)\) such that \(\mathrm{ecc}_Y(a)>2(i-1)\), there exists \(b\in F^Y_j(u)\) such that \(\mathrm{ecc}_X(b)\geq\mathrm{ecc}_X(a)\) with \(j \geq i\)
Corollary: if \(lb\) is the maximum among all the \(\mathrm{ecc}_B\) of nodes in or below the level \(i\) of the fBFS and among all the \(\mathrm{ecc}_F\) of nodes in or below the level \(i\) of the bBFS, then the eccentricities of all other nodes is bounded by \(\max\{lb,2(i-1)\}\)
Further notation:
\(i=lb=\max\{\mathrm{ecc}_F(v_1), \mathrm{ecc}_B(v_1)\} = \max\{4, 5\} = 5\), and \(ub=2i = 10\)
Since \(ub>lb\), the algorithm enters the while loop with \(i=5\)
\(B_5^F(u)=0\) (since \(5>\mathrm{ecc}_F(u)\)) and \(B_5^B(u)=\mathrm{ecc}_F(v_{10})=7\): since \(7 < 8 = 2(i-1)\), the algorithm enters the else branch and set \(lb\) equal to 7 and \(ub\) equal to 8
Since \(ub>lb\), the algorithm enters the while loop with \(i=5\)
\(B_4^F(u)=\mathrm{ecc}_B(v_{10})=6\) and \(B_4^B(u)=\mathrm{ecc}_F(v_{8})=6\): since \(\max\{lb, B_4^B(u), B_4^F(u)\} = 7 > 6 = 2(i-1)\), the algorithm enters the if branch and returns the value 7 which is the correct diameter value
Suitable properties of the starting node \(u\)
(1) \(u\) has to be the node with minimum eccentricity, called radius \(R\)
(2) Constant number of nodes in \(F_{\mathrm{ecc}(u)}(u)\)
If you are able to infer the node \(u\) such that (1) and \(R=D/2\) you will stop after one iteration
High degree node is very often a good choice
If the lower bound path returned by 2-sweep is tight and \(R=D/2\), the node in the middle of this path make us stop after one iteration
Almost always in real-world graphs \(R=D/2\) (the minimum possible, maximum heterogeneity) and (2) is true if \(u\) is central
diFUB can be generalized to weighted graphs
Using Dijkstra algorithm instead of BFS and sorting the nodes according to their distance from \(u\)
It works well, but not for road networks
Further optimization allow us to do better than this and to compute also the diameter of weakly connected graphs
It is possible to prove that for some graph random generation models (fixing the power law distribution) the number of BFSes is almost constant
...
By properties 2 and 3
By property 1
Hence
Hence
Hence