Luisa Cutillo, l.cutillo@leeds.ac.uk, University of Leeds
in collaboration with
Andrew Bailey, and David Westhead, UoL
Precision matrix
p= number of features,
Set of Vertexes V=
Set of Edges E = Precision matrix
Markov Property:
Nodes 1 and 5 are conditionally independent given node 2
Edges correspond to direct interactions between nodes
partial correlation
Conditional Independence = SPARSITY!
A hypothetical example of a GGM on psychological variables. (Epskamp at all 2018)
Fatigue
Insomnia
Concentration
(Dempster, 72) it encodes the conditional independence structure
Expresses the dependency graph
Example: Random Walk
What is the precision matrix?
Sparsity assumption => max graph degree d<<p
is reasonable in many contexts!
Example: Gene interaction networks
Fig. 6. A network of strong cross-cancer interactions. Cancer Genetic Network Inference Using Gaussian Graphical Models. (2019) Zhao and Duan.
However the " Bet on Sparsity principle" introduced Tibshirani 2001, "In praise of sparsity and convexity":
(...no procedure does well in dense problems!)
Graphical Lasso (Friedman, Hastie, Tibshirani 2008):
imposes an penalty for the estimation of
Features
Samples
Data
Deal with estimating a sparse graph that interrelates both features and data points .
video data example (Kalaitzis et al., 2013): both the frames (pictures over time) and the image variables (pixels) are correlated.
Single cell data: extract the conditional independence structure between genes and cells, inferring a network both at genes level and at cells level.
Cells
Genes
2 | ... | 10 |
---|---|---|
: | ... | : |
5 | ... | 7 |
Preserves the matrix structure by using a Kronecker sum (KS) for the precision matrixes
KS => Cartesian product of graphs' adjacency matrix
(eg. Frames x Pixels)
(kalaitzis et al. (2013))
Limitations:
We exploit eigenvalue decompositions of the Cartesian product graph to present a more efficient version of the algorithm which reduces memory requirements from O(n^2p^2) to O(n^2+ p^2).
mEST dataset: 182 mouse embryonic stem cells (mESCs) with known cell-cycle phase. We selected a subset of 167 genes involved in the mitotic nuclear division (M phase) as annotated in DAVID database. Buettner et al. (2015)
https://github.com/luisacutillo78/Scalable_Bigraphical_Lasso.git
The triumphs and limitations of computational methods for scRNA-seq (Kharchenko, 2021)
Beating Moore’s law.
- number of cells measured by landmark scRNA-seq datasets over years (red),
- increase in the CPU transistor counts (black).
- estimated number of cells in a human body (green dashed line).
We may be interested in graph representations of the people, species, and metabolites.
GmGm addresses this problem!
Tensors (i.e. modalities) sharing an axis will be drawn independently from a Kronecker-sum normal distribution and parameterized by the same precision matrix
If we remove the regularization, we need only 1 eigendecomposition!
In place of regularization, use thresholding
Work in progress- unpublished
(software available GmGM 0.4.0 )!
Dataset:
157,689 cells
25,184 genes
GO terms with long names have been replaced with their GO ID; these were all major-histocompatibility-complex-associated GO terms. Immune-related GO terms have been colored blue. These were identified using the Python API of the GProfiler (Kolberg et al., 2023) functional enrichment tool.
Memory Use (assuming double-precision)
Runtimes
Too much memory to fit into Bailey's computer’s 8GB RAM, even when highly sparse
Memory use:
Runtime: