nl-causal

Inference of Nonlinear Causal Effects with Application to TWAS with GWAS Summary Data

(Joint work with C. Li, H. Xue, X. Shen and W. Pan)

Ben Dai (CUHK)

Causal diagram for IV

Goal. Infer the causal effect from exposure to outcome

Issues. Simple regression?

yields biased estimator of β (unobserved confounders)
The promise of instrumental variables (IVs):
- unbiased estimation of the causal effect is possible without explicitly enumerating all confounders.

Causal diagram for IV

Goal. Infer the causal effect from exposure to outcome

Issues. Simple regression?

yields biased estimator of β (unobserved confounders)
The promise of instrumental variables (IVs):
- unbiased estimation of the causal effect is possible without explicitly enumerating all confounders.

Source: Howell et al. (2018)

Causal diagram for IV

Goal. Infer the causal effect from exposure to outcome

Issues. Simple regression?

yields biased estimator of β (unobserved confounders)
The promise of instrumental variables (IVs):
- unbiased estimation of the causal effect is possible without explicitly enumerating all confounders.

Random allocation alleles suggests SNPs are IVs for gene testing

TWAS data types

Controlled access to indiv. level data, e.g., GTEx dataset
The sample size is much smaller than GWAS

TWAS accepts various forms of input data types:

Individual-level gene expression data + GWAS

SNPs -> Gene expression

SNPs -> Outcome

GWAS boasts a large sample size: ukb-b (~400K)

Recall: 2SLS (with invalid IVs)

$ x = \mathbf{z}^T \mathbf{\theta} + w, \qquad y = \beta x + \mathbf{z}^T \mathbf{\alpha} + \varepsilon. $ (1)

$\beta\in\mathbb{R}$, $\pmb\alpha\in\mathbb{R}^p$, $\pmb\theta\in\mathbb{R}^p$ are unknown parameters
$(w, \varepsilon)$ are correlated (confounder), and $(w, \varepsilon) \perp \mathbf{z}$ (IVs)
$\bm\alpha\neq \mathbf{0}$ indicates the violation of IV assumptions

Goal: estimation and statistical inference on $\beta$

application: potential causal genes for AD

Recall: 2SLS

$ x = \mathbf{z}^T \mathbf{\theta} + w, \qquad y = \beta x + \mathbf{z}^T \mathbf{\alpha} + \varepsilon. $ (1)

solves $ \pmb{\theta} $ and $(\beta, \pmb{\alpha})$ separately based on two independent data.

$ D_1 = (\mathbf{Z}_1, \mathbf{x}_1) $ with $n_1$; and $D_2 = (\mathbf{Z}^T_2 \mathbf{Z}_2, \mathbf{Z}^T_2 \mathbf{y}_2)$ with $n_2$

$\hat{\pmb \theta} = (\mathbf{Z}^T_1 \mathbf{Z}_1)^{-1} \mathbf{Z}_1^T\mathbf{x}_1$, and impute $ \hat{\mathbf{x}} = \mathbf{z}^T \hat{\pmb{\theta}}$

By plugging the Stage 1 into the Stage 2, we obtain

$ y = \mathbf{z}^T \mathbf{\theta}\beta + \mathbf{z}^T\mathbf{\alpha} + e, \quad e = w\beta + \varepsilon,\ E(e) = 0, \ E (e^2) = \sigma_e^2.$

(now, $ \mathbf{z} $ is uncorrelated with $e$)

2SLS

Obs

$ \min_{\beta, \pmb{\alpha}} (\hat{\mathbf{\theta}}\beta + \mathbf{\alpha})^T \mathbf{Z}_2^T \mathbf{Z}_2 (\widehat{\mathbf{\theta}}\beta + \mathbf{\alpha}) - 2\mathbf{y}_2^T\mathbf{Z}_2 (\hat{\mathbf{\theta}}\beta + \mathbf{\alpha}), \quad \|\pmb{\alpha}\|_0 \leq K$

$ \|\cdot\|_0 $ penalty can be replaced by SCAD and MCP
2SLS only requires summary statistics:
- ($\mathbf{Z}^T_1 \mathbf{Z}_1, \mathbf{Z}_1^T \mathbf{x}_1$) and $\mathbf{Z}^T_2 \mathbf{Z}_2, \mathbf{Z}^T_2 \mathbf{y}_2)$
Identifiability conditions: majority / plurality rules

Kang et al (2016a, 2016b) and Guo at al. (2018)

Ref

Recall: 2SLS

$ x = \mathbf{z}^T \mathbf{\theta} + w, \qquad y = \beta x + \mathbf{z}^T \mathbf{\alpha} + \varepsilon. $ (1)

solves $ \pmb{\theta} $ and $(\beta, \pmb{\alpha})$ separately based on two independent data.

$ D_1 = (\mathbf{Z}_1, \mathbf{x}_1) $ with $n_1$; and $D_2 = (\mathbf{Z}^T_2 \mathbf{Z}_2, \mathbf{Z}^T_2 \mathbf{y}_2)$ with $n_2$

$\hat{\pmb \theta} = (\mathbf{Z}^T_1 \mathbf{Z}_1)^{-1} \mathbf{Z}_1^T\mathbf{x}_1$, and impute $ \hat{\mathbf{x}} = \mathbf{z}^T \hat{\pmb{\theta}}$

By plugging the Stage 1 into the Stage 2, we obtain

$ y = \mathbf{z}^T \mathbf{\theta}\beta + \mathbf{z}^T\mathbf{\alpha} + e, \quad e = w\beta + \varepsilon,\ E(e) = 0, \ E (e^2) = \sigma_e^2.$

(now, $ \mathbf{z} $ is uncorrelated with $e$)

2SLS

Obs

$ \|\cdot\|_0 $ penalty can be replaced by SCAD and MCP
2SLS only requires summary statistics:
- ($\mathbf{Z}^T_1 \mathbf{Z}_1, \mathbf{Z}_1^T \mathbf{x}_1$) and $(\mathbf{Z}^T_2 \mathbf{Z}_2, \mathbf{Z}^T_2 \mathbf{y}_2)$
Identifiability conditions: majority / plurality rules

Haavelmo (1943), Theil (1953), Kang et al (2016) and Guo et al. (2018)

Ref

Nonlinear effect?

Common Lab Tests Normal Ranges. (Source: Healthline)

U-shaped (nonlinear) causal effect
Other examples:
- body weight / cholesterol levels -> longevity
- ongevityalcohol consumption -> CAD
- exercise -> immune responsive disease resistance

Component	Normal range
White blood cells	3,500 to 10,500 cells/mcL
Platelets glucose CO2 Ca+	150,000 to 450,000/mcL 70-99 mg/dL 23-29 mEq/L 8.6-10.2 mg/dL

Nonlinear effect?

Common Lab Tests Normal Ranges. (Source: Healthline)

U-shaped (nonlinear) causal effect
Other examples:
- body weight / cholesterol levels -> longevity
- ongevityalcohol consumption -> CAD
- exercise -> immune responsive disease resistance

Component	Normal range
White blood cells	3,500 to 10,500 cells/mcL
Platelets glucose CO2 Ca+	150,000 to 450,000/mcL 70-99 mg/dL 23-29 mEq/L 8.6-10.2 mg/dL

Difficulty

The sample size of individual data is relatively small
GWAS can not be used to learn nonlinear pattern but you still to use it
- It may be too "expensive" to accurately estimate the non-parametric nonlinear causal effect

Nonlinear causal model

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

$\beta$ and $\phi$ are only identifiable up to a multiplicative scalar. Thus, we fix $\|\bm\theta\|_2 = 1$ and $\beta \geq 0$
$\phi(\cdot)$ is an arbitrary nonlinear transformation
Incorporates the classical 2SLS and PT-2SLS

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Interpretation

$ \beta $ is called the marginal causal effect

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Interpretation

$ \phi(\cdot) $ is called the nonlinear causal transformation

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Interpretation

$ \beta \phi(\cdot) $ is called the nonlinear causal effect

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Interpretation

$\beta > 0$ indicates the presence of the causal relation, and its hypothesis testing and CI are developed
$\phi(\cdot)$ can also be estimated
If the model is well-specified, $\beta \phi(\cdot) \to$ ATE

Difficulty

The sample size of individual data is relatively small
GWAS can not be used to learn nonlinear pattern but you still to use it
- It may be too "expensive" to accurately estimate the non-parametric nonlinear causal effect

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Method

Observation

Very similar to the single index model: $ x \perp \mathbf{z} \mid \mathbf{z}^T\mathbf{\theta} $
$ \pmb{\theta} $ can be estimated via sliced inverse regression (SIR; Li (1991)) or sufficient dimension reduction (SDR; Cook (2009))
WITHOUT estimating $ \phi(\cdot) $ !!!

Once $ \hat{\mathbf{\theta}} $ is obtained ...

Impute $ \phi(x) $ as $ \mathbf{z}^T \hat{\pmb{\theta}} $
Plugging into Stage 2, solve $ \beta $ via a sparse reg as in 2SLS

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Method

Observation

Very similar to the single index model: $ x \perp \mathbf{z} \mid \mathbf{z}^T\mathbf{\theta} $
$ \pmb{\theta} $ can be estimated via sliced inverse regression (SIR; Li (1991)) or sufficient dimension reduction (SDR; Cook (2009))
WITHOUT estimating $ \phi(\cdot) $ !!!

Once $ \hat{\mathbf{\theta}} $ is obtained ...

Impute $ \phi(x) $ as $ \mathbf{z}^T \hat{\pmb{\theta}} $
Plugging into Stage 2, solve $ \beta $ via a sparse reg as in 2SLS

Suppose $ (\mathbf{z}, x, y) $ satisfy a nonlinear causal model:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Method

Observation

Very similar to the single index model: $ x \perp \mathbf{z} \mid \mathbf{z}^T\mathbf{\theta} $
$ \pmb{\theta} $ can be estimated via sliced inverse regression (SIR; Li (1991)) or sufficient dimension reduction (SDR; Cook (2009))
WITHOUT estimating $ \phi(\cdot) $ !!!

Once $ \hat{\mathbf{\theta}} $ is obtained ...

Impute $ \phi(x) $ as $ \mathbf{z}^T \hat{\pmb{\theta}} $
Plugging into Stage 2, solve $ \beta $ via a sparse reg as in 2SLS

Inference

Consider the hypotheses:

$ H_0: \beta = 0, \qquad H_1: \beta > 0 $

where rejecting the null hypothesis $H_0$ indicates an evidence for causal influence of the exposure $x$ on the outcome $y$.

Define the pivotal test statistic:

$\widehat{T} = \frac{n_2^{1/2}\widehat\beta}{\widehat\sigma_e (\widehat{\mathbf \theta}^T\widehat{\mathbf \Sigma}\widehat{\mathbf \theta} - \widehat{\mathbf \theta}^T \widehat{\mathbf \Sigma}_{*A}(\widehat{\mathbf\Sigma}_{AA})^{-1}\widehat{\mathbf \Sigma}_{A*}\widehat{\mathbf{\theta}} )^{1/2}}$

where $A = \{ j : \alpha_j \neq 0 \}$, $\mathbf \Sigma_{*A},\mathbf{\Sigma_{A*}}$ denote the columns and rows of $\pmb{\Sigma}$ indexed by $A$, respectively.

Does NOT require an estimation of $ \phi(\cdot) $

Misspecified nonlinearity

It is possible that the nonlinear transformation $\phi(\cdot)$ could be misspecified in practice, especially when two structural equations do not share the same transformation for the exposure:

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \psi(x) + \mathbf{z}^T \mathbf{\alpha} + \varepsilon,$

where $\phi\neq \psi$ are two different nonlinear functions, hypothesis testing remains valid

Corollary 1. In the above model, with the same conditions and the same test, then the Type-I error is controlled by $\alpha$ under the null hypothesis.

$ \phi(x) = \mathbf{z}^T \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^T \pmb{\alpha} + \varepsilon.$

Estimation of nonlinear TF

$\phi$ can be estimated by a two-stage procedure.
- Estimate $\mathbb{E}(\mathbf{z}^T \mathbf{\theta}\mid x)$ by a non-parametric regression
- $\widehat\rho$ is est via the uncorrelatedness between $\mathbf{z}^T \mathbf{\theta}$ and $w$

Simulation

The performance for both $\beta$ and $\phi(\cdot)$ are considered.
For proposed method (2SIR), we propose to combine tests based on different slices, denoted as Comb-2SIR, using the Cauchy combining method (Liu et al. 2020)
Specifically, the results are compared against 2SLS and 2SLS based on the Yeo-Johnson power transformation (a generalized Box-Cox transformation (Yeo 2000)), denoted as 2SLS and PT-2SLS
Six transformations are considered in the simulation:

linear: $ \phi(x) = x$;
logarithm function: $ \phi(x) = \log(x)$;
inverse function: $\phi(x) = 1/x$
piecewise linear function: $\phi(x) = xI(x\leq 0) + 0.5 x I(x > 0)$
cube root function: $\phi(x) = x^{1/3}$;
quadratic function: $\phi(x) = x^2$

Simulation

Empirical Type I error ($\beta_0 = 0$) and power ($\beta_0 = 0.05, 0.10, 0.15$) of the proposed nonlinear causal test for the simulated example (marginal effect inference).

Simulation

Empirical Type I error ($\beta_0 = 0$) and power ($\beta_0 = 0.05, 0.10, 0.15$) of the proposed nonlinear causal test for the simulated example (marginal effect inference).

Simulation

Empirical Type I error ($\beta_0 = 0$) and power ($\beta_0 = 0.05, 0.10, 0.15$) of the proposed nonlinear causal test for the simulated example (marginal effect inference).

Application

The bar-plot of $p$-values of significant genes for AD by at least one method, where the $y$-axis represents $-\log_{10}(p)$. The results are based on ADNI + IGAP GWAS datasets.

12 were significant by 2SLS and/or PT-2SLS, 18 are significant by SIR and/or Comb-2SIR.
7 genes, including TOMM40, are only identified by Comb-2SIR. We searched these genes in GWAS results and found ALL of them have been reported to be significantly associated with AD.

More results

APOC1: a significant gene over all methods.

More results

APOC1: a significant gene over all methods.

BCL3: a significant gene only identified by 2SIR/Comb-2SIR.

More results

APOC1: a significant gene over all methods.

Negative control derived from ADNI, where outcomes are permuted.

More results

APOC1: a significant gene over all methods.

Negative control derived from ADNI, where outcomes are permuted.

More simulated examples based on different sample sizes and dimensions with:

Standard setting
Invalid IVs
Categorical IVs
Weak IVs
Non-additive and epistatic effects
Misspecified models

Software

Compared with 2SLS

Strength

2SIR relaxes the linear assumption underlying the relationships between $(\mathbf{z}, x, y)$.
Compatibility: The method exhibits minimal power loss when the underlying true model is linear and the same datasets are used.
Easy to use, well-documented software, more power

Weakness

Additional assumptions on instrumental variables (IVs): $z$ should follow an elliptical symmetric distribution; however, this issue appears to be relatively minor in TWAS, see Example 3.
Cannot use summary statistics data in Stage 1: $ \mathbf{Z}^T \mathbf{x} $

Thank you!

If you like nl-causal please star 🌟 our Github repository, thank you!