Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models
Subhabrata Majumdar & George Michailidis
Presenter: Aiying Zhang
April 25th, 2018
Content
- Introduction
- Statistical Model
- Algorithm
- Testings
- Performance evaluation
- Discussion
Goal:
Build a framework based on Gaussian Graphical Model (GGM) for horizontal and vertical integration of information across multi-omics data.
Horizontal: multi-conditions/subtypes
Vertical: different omics
Omics: genomic, proteomic, metabolomic
Contribution:
Borrow information across multiple similar multi-layer networks to simultaneously perform inference on all model parameters.
Introduction
Introduction
- Joint Multiple Multi-Layer Estimation (JMMLE)
- Hypothesis testing in multi-layer models
- Dataset D, K groups, M layers
- Each layer m has pm variables(nodes)
- Model: for each group k=1,...,K
- Parameters of interest :
- the precision matrices
- the coefficient matrices
JMMLE
- Special case -- a two-layer model: M=2
- Goal: estimate from
- Focus: joint estimation of
- Noted:
- For M>2, within-layer undirected edges of any m-th layer(m>1) and between-layer directed edges (m-1)-th layer can be estimated by the same method.
- Joint estimation of can use other existing methods.
JMMLE
Algorithm
Estimation of
- Joint Structural Estimation Method (Ma and Michailidis, 2016)
- Use penalized nodewise regressions to get the graph structure
- Obtain neighborhood matrix
- Fit a graphical lasso model to obtain the sparse estimates of the precision matrix
Algorithm
Joint estimation of
Algorithm
Alternative Block Algorithm:
Algorithm
Tuning parameter selection:
- BIC (Bayesian Information Criterion) for
- HBIC (High-dimensional BIC) for
Hypothesis testing
Debiased estimator and asymptotic normality
- Proposed by Zhang and Zhang (2014)
- A debiasing procedure for lasso estimates for individual coeffcients in high-dimensional linear regression
- Method:
Hypothesis testing
Debiased estimator
- Define debiased estimates for individual rows of
- Under mild conditions, a centered and scaled
are asymptotic normal.
Hypothesis testing
Pairwise testing
- Global differences between two groups
Hypothesis testing
Entrywise differences
- Test statistics:
- FDR control: Benjamini-Hochberg (BH) procedure
Performance
Evaluation
- K=5, M=2
- Within-layer: non-zero probability
- Between-layer: non-zero probability
- Non-zero elements independently from the uniform distribution
- 50 replications in each setting
Performance
Evaluation
MCC: Matthews Correlation Coefficient
RF: Relative error in Frobenius norm
Performance
Evaluation
Performance
Evaluation
Simulation 2: Testing
- K=2
- Generate the by randomly assigning each element to be non-zero with probability , then drawing values of those elements from Unif{ }.
- Generate a matrix of differences D, where takes values -1, 1, 0 w.p. 0.1, 0.1, 0.8, respectively.
- Finally, set
Type-1 error set , FDR controlled at
Performance
Evaluation
Discussion
Conclusions:
- This work introduces an integrative framework for knowledge discovery in multiple multi-layer Gaussian Graphical Models.
- Exploit a priori known structural similarities across
parameters of the multiple models - Perform global and simultaneous testing for pairwise differences
- Exploit a priori known structural similarities across
Improvements:
- Beyond pairwise testing, need an overall test for multi-groups
- Non-Gaussian data and graphical models with non-linear interactions
Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models
By Aiying Zhang
Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models
- 89