The beginning of the story

Graph G=(V,E)
- Undirected
Distance d(u,v): number of edges in shortest path from u to v
- Connected
Diameter: maximum distance
- Maximum eccentricity of all nodes
  - e(v)=max d(v,w)

Our "toy" network

IMDB graph: edge between two actors if played in same movie

Algorithms for diameter

O(|V||E|): breadth-first search from each node

Can we do better?

Into the square

Quadratic algorithms are not feasible
Look for "hardest" quadratic time solvable problems
- Approach similar to NP-completeness
- Definition of specific reducibility
  - Preserving subquadratic solvability
Hardness relative to complexity hypothesis
- Similar to P vs NP
- SETH: no algorithm solving k-SAT in subexponential time
  - Quadratic time solvable version (k-SAT*)

Quasi-linear reducibility

\mathcal{P} \leq_{ql}\mathcal{Q}

I \mathrm{\ instance\ of\ }\mathcal{P} \rightarrow \Phi(I)\mathrm{\ instance\ of\ }\mathcal{Q}

\mathrm{Computable\ in\ time\ }\tilde{O}(|I|)

I \mathrm{\ and\ } s(I) \mathrm{\ same\ output}

\mathrm{Linear\ time\ computable\ output\ mapping}

\mathcal{P} \leq_{ql}\mathcal{Q} \mathrm{\ and \ } \mathcal{Q} \mathrm{\ is\ solvable\ in\ time \ } \tilde{O}(n^{2-\epsilon})

\mathcal{P} \mathrm{\ is\ solvable\ in\ time \ } \tilde{O}(n^{2-\epsilon})

k-SAT*

\mathrm{\ Input\ }

\mathrm{Possible\ assignments\ to \ } x_i

O(n^{2-\epsilon})\mathrm{\ algorithm\ for\ } k-\mathrm{SAT}^*

O(2^{\frac{n}{2}(2-\epsilon)})=O((2^{\frac{2-\epsilon}{2}})^n)\mathrm{\ algorithm\ for\ } \mathrm{SAT}

\mathrm{Two\ sets \ of\ } n \mathrm{\ variables\ }\{x_i\},\{y_i\}

\mathrm{Set\ of\ clauses\ } C

\mathrm{Possible\ assignments\ to \ } y_i

\mathrm{\ Output:\ true\ if\ }C\mathrm{\ satisfiable}

The reduction web

From disjoint sets to diameter

\mathrm{\ Input\ }

\mathrm{Set\ of\ items\ } X

\mathrm{Collection\ } C \mathrm{\ of\ subsets\ of\ } X

\mathrm{\ Output:\ true\ if\ }C\mathrm{\ has\ two\ disjoint\ sets}

\mathrm{Clique\ of\ }|X|\mathrm{\ nodes}

\mathrm{Independent\ set\ of\ }|C|\mathrm{\ nodes}

\mathrm{Reduction}

\mathrm{Two\ sets\ that\ do\ not\ intersect:\ distance\ }3

\mathrm{Two\ sets\ that\ intersect:\ distance\ }2

\mathrm{Disjoint\ sets\ } \leftrightarrow \mathrm{\ diameter\ is\ } 3

Lower bound on the diameter

The 2-sweep heuristics

v_1

v_2 \mathrm{ \ maximizes\ } d(v_2,v_1)

\mathrm{Max\ eccentricity \ of\ } v_1, v_2: \mathrm{good\ lower\ bound\ on\ diameter}

Lower bound on the diameter

The sumsweep heuristics

v_1

v_2 \mathrm{ \ maximizes\ } d(v_2,v_1)

\mathrm{Max\ eccentricity \ of\ } v_1, v_2, v_3, v_4: \mathrm{better\ lower\ bound\ on\ diameter}

v_3 \mathrm{ \ maximizes\ } d(v_3,v_1)+d(v_3,v_2)

v_4 \mathrm{ \ maximizes\ } d(v_4,v_1)+d(v_4,v_2)+d(v_4,v_3)

Bounds on node eccentricities

\mathrm{If\ BFS\ from\ } v \mathrm{\ done}

d(v,w) \leq ecc(w) \leq d(v,w)+ecc(v)

L_v(w)

U_v(w)

U_v(w) \mathrm{can\ be\ improved}

Exact value of diameter

\mathrm{Vectors\ } e_L,e_U: \mathrm{\ lower\ and\ upper\ bounds}

\mathrm{At\ each\ BFS\ from\ } v:

e_L(w)=\max(e_L(w),L_v(w))

e_U(w)=\min(e_U(w),U_v(w))

\mathrm{Vector\ } S: \mathrm{\ sum\ of\ distances\ to\ already\ explored\ nodes}

S(w)=d(v,w)+S(w)

\mathrm{Start\ with\ } e_L=0, e_U=\infty \mathrm{\ and\ sumsweep\ of\ } k \mathrm{\ nodes}

\mathrm{At\ each\ step,\ \mathbf{cleverly}\ choose\ next\ } v

\mathrm{Update\ } e_L, e_U : e_L(w)=e_U(w)\Rightarrow e(w) = e_L(w)

\mathrm{Terminate\ when\ } \max e(v) \geq \max (e_U(w))

Choosing the next vertex u

\mathrm{Alternate}

\mathrm{Minimize\ } e_L(u)

\mathrm{Ties\ solved\ by\ minimizing\ } S(u)

\mathrm{Maximize\ } e_U(u)

\mathrm{Ties\ solved\ by\ maximizing\ } S(u)

\mathrm{Should\ improve\ upper\ bounds}

\mathrm{Should\ improve\ lower\ bounds}

Performances

In theory, as the worst case, but...

Why?

Average case complexity

Very hard and technical
Many models
Are models realistic?
Which properties are used?

Axiomatic framework
- Define axioms
- Deduce probabilistic analyses from the axioms
- Prove that random graphs satisfy the axioms
- Show empirically that real-world graphs satisfy the axioms

The models

Erdös-Renyi model
- Not realistic (all nodes are "equal")
- Heuristics are not efficient on this model
Random graph with prescribed degree distribution
- Configuration model
- Chung-Lu model
- Norros-Reittu model
Power law degree distribution

|\{v\in V:\mathrm{deg}(v)=d\}| \approx nd^{-\beta}

The axioms

Some definitions

\tau_s(n^x) = \min\{l:\gamma^l(s)\geq n^x\}

\gamma^l(s)=|\{v\in V:d(s,v)=l\}|

T(d \rightarrow n^x) = \mathrm{\ avg}_{\mathrm{deg}(s)=d}\tau_s(n^x)

...

\tau_s(n^x)

\gamma^1(s)

\gamma^2(s)

s

n^x

|\{s\in V:\tau_s(n^x)\geq T(\mathrm{deg}(s)\rightarrow n^x)+l\}|\approx\frac{n}{c^l}

Axiom 1

d(s,t) \approx \tau_s(n^x)+\tau_t(n^{1-x})-1

Axiom 2

The sum-sweep heuristics

\beta>3

2<\beta<3

1<\beta<2

\leq n^{1+\frac{C}{C+\frac{\beta-1}{\beta-3}}}

n^{1+o(1)}

\leq mn^{1-\frac{2-\beta}{\beta-1}\left(\left\lfloor\frac{\beta-1}{2-\beta}-\frac{3}{2}\right\rfloor-\frac{1}{2}\right)}

C=\frac{2d_{\mathrm{avg}}(n)}{D-d_{\mathrm{avg}}(n)}\mathrm{\ is\ constant}

Computing closeness top-k

\mathrm{Definition:\ }c(v)=\frac{n-1}{\sum_{w \in V-\{v\}}d(v,w)}

\mathrm{In\ theory\ complexity\ }\Theta(n^2)

\mathrm{In\ practice}

\mathrm{BFSCut\ returns\ 0\ if\ } v \mathrm{\ is\ not\ among\ top\ } k, c(v)\mathrm{\ otherwise}

How to cut the BFS

\mathrm{If\ } \gamma_d(v)=|\Gamma_d(v)|: f(v) \geq f_d(v) + (d+1)\gamma_ {d+1}(v)+(d+2)(r(v)-n_{d+1}(v))

\mathrm{Since\ } n_{d+1}(v)=\gamma_{d+1}(v)+n_{d}(v): f(v) \geq f_d(v) - \gamma_ {d+1}(v)+(d+2)(r(v)-n_{d}(v))

\mathrm{Since\ } \gamma_{d+1}(v)\leq\tilde{\gamma}_{d+1}(v): f(v) \geq f_d(v) - \tilde{\gamma}_ {d+1}(v)+(d+2)(r(v)-n_{d}(v))

Everything is known
- If graph not connected, work on components

To conclude the story...

Total running time: 37 minutes!

Semels ('40)

Corrado ('45)

Flowers ('50-'80)

Welles ('85-'90)

Lee ('95-'00)

Hitler ('05-'10)

Madsen ('14)

Thanks to...

This is my story: dozens of researchers have similar stories (references)

Michele Borassi, Pierluigi Crescenzi, Michel Habib: Into the Square: On the Complexity of Some Quadratic-time Solvable Problems. Electr. Notes Theor. Comput. Sci. 322: 51-67 (2016)
Elisabetta Bergamini, Michele Borassi, Pierluigi Crescenzi, Andrea Marino, Henning Meyerhenke: Computing Top-k Closeness Centrality Faster in Unweighted Graphs. ALENEX 2016: 68-80
Michele Borassi, Pierluigi Crescenzi, Luca Trevisan: An Axiomatic and an Average-Case Analysis of Algorithms and Heuristics for Metric Properties of Graphs. CoRR abs/1604.01445 (2016)
Michele Borassi, Pierluigi Crescenzi, Michel Habib, Walter A. Kosters, Andrea Marino, Frank W. Takes: Fast diameter and radius BFS-based computation in (weakly connected) real-world graphs: With an application to the six degrees of separation games. Theor. Comput. Sci. 586: 59-80 (2015)
Michele Borassi, David Coudert, Pierluigi Crescenzi, Andrea Marino: On Computing the Hyperbolicity of Real-World Graphs. ESA 2015: 215-226
Pilu Crescenzi, Roberto Grossi, Michel Habib, Leonardo Lanzi, Andrea Marino: On computing the diameter of real-world undirected graphs. Theor. Comput. Sci. 514: 84-95 (2013)
Pierluigi Crescenzi, Roberto Grossi, Leonardo Lanzi, Andrea Marino: On Computing the Diameter of Real-World Directed (Weighted) Graphs. SEA 2012: 99-110
Pierluigi Crescenzi, Roberto Grossi, Leonardo Lanzi, Andrea Marino: A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs. TAPAS 2011: 92-103
Pierluigi Crescenzi, Roberto Grossi, Claudio Imbrenda, Leonardo Lanzi, Andrea Marino: Finding the Diameter in Real-World Graphs - Experimentally Turning a Lower Bound into an Upper Bound. ESA (1) 2010: 302-313

A (large) graph mining roundtrip

From practice to theory and back

Pierluigi Crescenzi

The beginning of the story

Our "toy" network

Algorithms for diameter

Into the square

Quasi-linear reducibility

k-SAT*

The reduction web

From disjoint sets to diameter

Lower bound on the diameter

Lower bound on the diameter

Bounds on node eccentricities

Exact value of diameter

Choosing the next vertex u

Performances

Average case complexity

The models

The axioms

The sum-sweep heuristics

The same story for...

Computing closeness top-k

How to cut the BFS

To conclude the story...

Thanks to...

References

A (large) graph mining roundtrip

A (large) graph mining roundtrip

Pierluigi Crescenzi PRO