Social Network Analysis
week 02
Basic Concepts
Basic Concepts
Basic Concepts
- A network is a collection of objects where some pairs of objects are connected by links
| Objects: nodes, vertices | N |
| Interactions: links, edges | E |
| System: network, graph | G(N,E) |
- A graph G = (V, E) is an ordered pair of sets: a set of vertices V and a set edges E, where n = |V|, m = |E|
- An edge eij = (vi, vj) is pair of vertices (ordered pair for directed graph)

Basic Concepts
- Network often refers to real systems like Web, Social network, Metabolic network
- Language: Network, node, link

- Graph is a mathematical representation of a network Web graph, Social graph (a Facebook term)
- Language: Graph, vertex, edge

How to define a network
How to build a graph:
- What are nodes?
- What are edges?
Choice of the proper network representation of a given domain/problem determines our ability to use networks successfully:
- In some cases there is a unique, unambiguous representation
- In other cases, the representation is by no means unique
- The way you assign links will determine the nature of the question you can study
How to define a network
Directed vs Undirected networks
Undirected Graphs
Links: undirected (symmetrical, reciprocal)
Examples: Collaborations Friendship on Facebook
Directed Graphs
Links: directed (arcs)
Examples: Phone calls Following on Twitter


The graph is called (un)directed iff. the set of pairs is (un)directed respectively.
Directed vs Undirected networks
The edge that consists of the same elements is called loop.
The subset of edges that consists of the same elements is called multiple edges (or multi-edge).
Two nodes/vertices are adjacent if they share a common edge An edge and a node on that edge are called incident.
Example:
(1, 1) — loop,
((1, 2),(1, 2),(1, 2)) — multiple edges.
Directed graph is called oriented graph if there are no two-side edges between any two vertices of the graph.
Graph Isomorphism
Two graphs are called isomorphic if one can re-number the vertices of one graph to obtain another.
Adjacency matrix
Two nodes/vertices are adjacent if they share a common edge An edge and a node on that edge are called incident.
Properties of adjacency matrix:
- Adjacency matrix is symmetrical for undirected graph
- Adjacency matrix diagonal elements equal to 0 (no loops)
- Adjacency matrix always square matrix
- For unweighted graph adjacency matrix consists of 0 and 1 and deg vi = {the sum of row i elements}
Incidence matrix
Two nodes/vertices are adjacent if they share a common edge An edge and a node on that edge are called incident.
Properties of incidence matrix:
- In each column of incidence matrix there are only two non-zero elements
- Incidence matrix is rectangular matrix with dimensions ∥V ∥ × ∥E∥
- The sum of elements in each incidence matrix column equals to 0 for directed graphs
- For unweighted undirected graph incidence matrix consists of 0 and 1 and deg vi = {the sum of row i elements}
Incidence matrix
Two nodes/vertices are adjacent if they share a common edge An edge and a node on that edge are called incident.
For unweighted undirected graph incidence matrix is defined equivalently, but instead of -1 will be 1.
To generalize incidence matrix definition for weighted graphs (or weighted incidence matrix) multiply each column by weight of corresponding edge.
Graph Connectivity
- Strongly connected directed graph has a path from each node to every other node and vice versa e.g., A-B path and B-A path)
- Weakly connected directed graph is connected if we disregard the edge directions

Graph on the left is connected but not strongly connected (e.g., there is no way to get from F to G by following the edge directions).
K-connectivity
A set of vertices (edges) is called k-vertex (edge) cut if a graph becomes not connected after the deletion of this set.
A graph is called κ-vertex (edge) connected iff. if it doesn’t have any k − 1-vertex (edge) cuts.
κ-vertex connectivity is also shortly called κ-connectivity.The paths between two vertices are called k-vertex (edge) independent iff. there exist k paths from it which consist of disjoint sets of vertices (edges).
Node Degree
Undirected Graphs
Node degree, : the number of edges adjacent to node i
Directed Graphs
Node degree = in-degree + out-degree.


Complete Graph
The maximum number of edges in an undirected graph on N nodes is

An undirected graph with the number of edges E = Emax is called a complete graph, and its average degree is N-1
Degree distribution
Degree distribution \( P(k) \) :
Probability that a randomly chosen node has degree \( k \)
\( N_k \) - # nodes with degree \( k \)


Power law distribution

Power law distribution

Power law distribution

Bipartite Graph
Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets
Examples:
- Authors-to-Papers (they authored)
- Actors-to-Movies (they appeared in)
- Users-to-Movies (they rated)
- Recipes-to-Ingredients (they contain)

“Folded” networks:
- Author collaboration networks
- Movie co-rating networks
Bipartite Graph
Connected undirected graph is called bipartite iff. one can divide the vertices into two groups such that any two vertices from one group are not adjacent.
Consider the bipartite property testing algorithm:
- Choose any vertex.
- Start depth-first or breadth-first algorithms and divide vertices to 0 or 1 groups by putting to them corresponded marks:
- For depth-first: sequentially alternate marks for depth-first walk.
- For breadth-first: put the same marks if the vertices are on the same breadth and change marks otherwise.
- If any two vertices from one group are not adjacent, the graph is bipartite and not bipartite otherwise.
Bipartite Graph
Connected undirected graph is bipartite iff. it doesn’t contain odd length cycles.
1. Necessarity. Assume the contrary: the graph contains odd length cycle. Let’s start to divide vertices from this cycle to two groups by the rule: two vertices from one group should not be adjacent. By this rule vertices will sequentially alternate to each other. Since the length of cycle is odd, the first and the last vertices will be from the same group. This gives a contradiction with bipartite condition.
2. Sufficiency. Consider bipartite property testing algorithm with spanning tree T construction. It is easy to see that any tree is a bipartite graph. Let’s start to add remaining edges. Denote first edge by (v, w). Assume that vertices v and w corresponds to the same group by the testing algorithm. By the lemma 5 there exists unique path from v to w in T. Since the marks alternate to each other along this path by the algorithm, this path with the edge (v, w) form an odd length cycle. This gives a contradiction. Hence, all remaining edges connect vertices from different groups.
Bipartite Graph

Local and global characteristics of graph
Path in Graphs
A path is a sequence of nodes in which each node is linked to the next one
A path can intersect itself and pass through the same edge multiple times
E.g.: ACBDCDEG

Path in Graphs
Cycle is a path consisted from distinct edges where the first and last vertices coincide.
Simple path is a path where any edges and vertices are distinct except the first and the last vertices.
Simple cycle is a simple path where the first and last vertices coincide.
Node centrality
Centrality is a function defined for each vertex of a graph that contains some information of a graph structure.
Let’s denote by N(v) the set of vertices which adjacent to the vertex v. The simplest example is the degree centrality deg v = ∥N(v)∥.
Let’s denote by G(N(v)) the maximal sub-graph on vertices V (N(v)). Then MC(v) be the largest connected component in G(N(v)). Maximum neighborhood component MNC(v) is the number of vertices in MC(v).
Density of maximum neighborhood component DMNC(v) = ∥E(MC(v))∥ ∥V (MC(v))∥ ϵ , for some ϵ ∈ [1, 2].
Global characteristics of a graph
5. Average clustering coefficient:
2. Density:
1. The simplest example is the diametre:
3. Global efficiency:
4. Average shortest path length:
6. Small world coefficient:
where \(G_{rand}\) is a random graph \((|V(G)|, |E(G)|)\)
Distance in Graphs
Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes
In directed graphs, paths need to follow the direction of the arrows Consequence: Distance is not symmetric:
If the two nodes are not connected, the distance is usually defined as infinite (or zero)


Network Diameter
- The distance between two vertices is the number of edges in the shortest path from vi to vj
- Graph diameter is the largest shortest path:
- Average path length:

Global CC (Transitivity)
Global clustering coefficient:

Local CC
Local clustering coefficient (per vertex)
How connected are i’s neighbors to each other?



where ei is the number of edges between the neighbors of node i
Average CC
Average clustering coefficient
How connected are i’s neighbors to each other?

Graph Laplacian
Laplacian Operator
Laplacian Operator
Laplacian Operator
- Laplacian operator in physics is an "average difference between a point and a small sphere around that point"
- In a discrete case (the graph case), it is the difference between a node's value and its neighbors values
M. Bronstein, Geometric deep learning: going beyond Euclidean data, 2017
Properties of the Laplacian
- Every row sums to 0:
- I.e. 1 is an eigen-vector of L with eigen-value = 0
where
Normalized Laplacian
The normalized Laplacian matrix \(\Delta\) is defined as \(D^{-\frac{1}{2}}LD^{-\frac{1}{2}}\), where \(L\) is the Laplacian matrix of \(G\), and \(D = diag(deg \ v_1, \dots, deg \ v_n)\) is the diagonal matrix consisting of degrees of \(G\). Then the \((i, j)\)-entry of \(\Delta\) is
in which we simply write \(d_i = deg \ v_i\) for convenience.
Properties of normalized Laplacian matrix
1. By viewing \(\Delta\) as a linear operator \(\Delta : \mathbb{R}^n \to \mathbb{R}^n\), \(\Delta\) is self-adjoint with respect to the scalar product \((\cdot, \cdot)\), i.e.,
\((x,\Delta y)\) = \((\Delta x, y)\)
for all \(x, y \in \mathbb{R}^n\). Here the scalar product is defined for pairs of vectors; formally, for any \(x, y \in \mathbb{R}^n\), \((x, y) := \sum_{i=1}^{n} x_i y_i\).
2. \(\Delta\) is non-negative:
\((\Delta x, x) \ge 0\), for all \(x\).
3. \(\Delta x = 0\) precisely when \(x\) is a vector collinear to \((\sqrt{d_1}, \dots, \sqrt{d_n})\).
4. The trace of \(\Delta\) is \(n\).
Normalized Laplacian
The normalized Laplacian matrix \(\Delta\) is defined as \(D^{-\frac{1}{2}}LD^{-\frac{1}{2}}\), where \(L\) is the Laplacian matrix of \(G\), and \(D = diag(deg \ v_1, \dots, deg \ v_n)\) is the diagonal matrix consisting of degrees of \(G\). Then the \((i, j)\)-entry of \(\Delta\) is

Graph Partitioning


Graph Partitioning

Graph Partitioning

Positive semi-definite
Node clustering
- Let x be a vector that is +1 if part of cluster A and -1 if part of cluster B
- Is 0 if \( x_i \) and \( x_j \) are in same cluster and 2 if they're in different clusters
If we find the \( x \) that minimizes this, we will find cluster assignments that minimize cross-cluster edges
Node clustering
- If x is real-valued rather than +/- 1, we can more easily optimize and get a 1-dimensional embedding of nodes where we are minimizing the distance between connected nodes
- Need additional constraints because constant vector (c, c, …, c) is eigen-vector with eigen-value = 0, but that is trivial solution
Node clustering
- Additional Constraints:
- Center of Mass about the origin:
- x.T*x = 1 so all points are not mapped to 0
- Rayleigh-Ritz theorem says solution is equal to eigen-vector with
- smallest eigen-value. (Makes intuitive sense)
- Smallest doesn’t fit our constraints
- Second smallest does (orthogonal to 1 and normalized)
- “Fiedler Vector”
Node clustering
- Turn into cluster assignment by taking sign of value in Fiedler vector
Node clustering


Karate club network
Fiedler Vector
Laplacian Operator Example

Javer, An open-source platform for analyzing and sharing worm-behavior data, Nature, 2018

OpenWorm project
OpenWorm project visualization
The first comprehensive computational model of Caenorhabditis elegans (C. elegans), a microscopic roundworm. With only a thousand cells, it solves basic problems such as feeding, mate-finding and predator avoidance.
References
- Bondy, J. A. (2008). USR Murty Graph Theory. Graduate Texts in Mathematics, 244. [pdf]
- XU, Y. (2017). KURATOWSKI’S THEOREM. [pdf]
- Patrignani, M. (2013). Planarity Testing and Embedding. [pdf]
- Tutte, W. T. (1956). A theorem on planar graphs. Transactions of the American Mathematical Society, 82(1), 99-116. [pdf]
- Thomassen, C. (1983). A theorem on paths in planar graphs. Journal of Graph Theory, 7(2), 169-176. [pdf]
- KENDALL, M. STEINITZ’THEOREM FOR POLYHEDRA. [pdf]
- Strang, A., Haynes, O., Cahill, N. D., Narayan, D. A. (2018). Generalized relationships between characteristic path length, efficiency, clustering coefficients, and density. Social Network Analysis and Mining, 8, 1-6. [pdf]
- Chin, C. H., Chen, S. H., Wu, H. H., Ho, C. W., Ko, M. T., Lin, C. Y. (2014). cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC systems biology, 8(4), 1-7. [pdf]
- Cvetkovi´c, D., Rowlinson, P., Simi´c, S. (2009). An Introduction to the Theory of Graph Spectra (London Mathematical Society Student Texts). Cambridge: Cambridge University Press [pdf]
- S. Amghibech, Eigenvalues of the discrete p-Laplacian for graphs. Ars Comb. 67 (2003), 283-302. [pdf]
Basic Concepts
By karpovilia
Basic Concepts
- 295