• Outline

  • Bias & Selection bias
  • Consequences of selection bias
  • Causal representation of different biases
  • eQTL
  • PCA adjustment in eQTL
  • Bias for PCA adjustment

Detecting, Quantifying and Correcting Selection Bias due to PCA adjustment in eQTL discovery

Alex Couto Alves

Bias vs Noise

 

  • Systematic Error (Bias): Systematic difference between population parameter and the estimated statistics.

  • Random Error (Noise): Random difference between population parameter and the estimated statistics.

Nomenclature

  • Selection Bias

  • Collider bias

  • Berkson bias

Does gallstones cause diabetes?

Ratio of multiple diagnosis to single diagnosis in the hospital is always greater than in the general population

"

Selection bias

D1

H

D2

A causal model

E

G

Suppression & Selection Bias

PCA of gene expression

(Matthias Scholz, Ph.D. thesis)

PC1  = Weight11  * Expression Gene 1  + Weight12  * Expression Gene 2 +

PC2  = Weight21  * Expression Gene 1  + Weight22  * Expression Gene 2 +

Expression quantitative trait loci mapping

Adjustment for expression PCs increase the yield of cis-eQTLs

H

SNP

SNPt

Gt

G

S

B

PCA

?

S

Causation vs bias and confounding in genetic association studies

Detection of selection bias

  1. Increasing PCs increases numbers of replicable eQTLs
  2. PCs are functionally defined from expression
  3. eQTL associations adjusting for PC recapitulate selection bias causal structure
  4. Numerical simulations illustrate two types of selection bias
  5. Empirical assessments suggests effect sizes of some eQTLs are biased

Functional definition of Expression PC

Selection Bias & Endogeneity Bias

PCA selection bias

D1

H

D2

A causal model

E2

E1

G1

PC1

G2

SNP2

SNP1

Gg

Simulation PCA Selection Bias

Empirical assessment

Empirical assessment

Quantification of Selection Bias

  1. GWAS assumptions excludes confounder bias
  2. Mathematical analysis of the regression coefficients estimates reveals bias factor is independent of the genetic effect
  3. Construction of a null model of no genetic effect shows the possibility of FP induced by bias alone
  4. Construction of a test statistic corrected for the biases on effect size and SE estimates
  5. Principled detection of putative true & false positives
  6. Quantification of the FP and FN due to selection bias

selection bias in the regression coefficients estimates

A null model of no genetic effect

 (Betas induced by Selection Bias)

A test for regression coefficients adjusting for selection bias and suppression

Principled definition of TP/FP

Significant associations with null genetic effect

Bias-induced FP/FN

Fat expression

Bias-induced FP/FN

LCL expression

Correction of Selection Bias

  • Randomization of real data assesses bias correction validity
  • P-values corrected for bias are validated against the null of no genetic effect with and without PC-induced selection bias
  • Betas adjusted for bias are validated against the null of no genetic effect

Design of the experiment

Correction of selection bias

P-values

Confounding in genetic associations

Crude and PC adjusted models

Confounding

LCL

1
2
3

Genotypes

1
2

Expression

SNP

?   B1

G

B2

C

?

1M

317K

610K

Expression batch effects

Confounding analysis

Crude model

Confounding analysis

PC adjusted model

Artificial confounding

Emulating missing data impution to ref

SNP

B1

G

B2

C

Genotypes

Expression

Artificial confounding

Crude model

Artificial confounding

PC adjusted model

Conclusion

  1. Models with adjustment for expression PCs explain variation on Y using variation on Y
  2. This causes endogeneity biases, including selection bias and suppression.
  3. Correction of selection bias in genetic studies is possible because of knowledge of the theoretical causal model and the assumption of no confounding
  4. It is possible to mitigate confounding in properly designed genetic association studies.
  5. Violations to the assumption of no confounding, can severely affect both PC-adjusted  and crude model estimates, but less so in PC-adjusted models

Correction of selection bias

Beta values

Correction of selection bias

P-values

Empirical assessment

Empirical assessment

FP

TP

All

Basic causal structures

X

Y

  • C
  • X
  • Y
  • C

Confounder

  • X
  • Y
  • C
  • X
  • Y
  • C

Collider

X

Y

  • M
  • X
  • Y
  • M

Mediator

  • X
  • Y
  • E
  • X
  • Y
  • E

Exogenous variable

  • Does the causes of gallstones cause diabetes?

  • E
  • D1
  • D2
  • H
  • G
  • E
  • D1
  • D2
  • H
  • G
  • D1
  • D2
  • H
  • Joseph Berkson (1899 – 1982)  

  •  
  • 1922 M.A. Physics, Columbia
  • 1927 M.D.,  Johns Hopkins
  • 1928 Dr.Sc., Johns Hopkins
  • 1933 Head of Biometry and Medical Statistics ,Mayo Clinic
  • 1946 Legion of Honour, US War Dept
  •  
  • From Biometrics Vol. 39, No. 4 (Dec., 1983), pp. 1107-1111

D1

H

D2

D1

H

D2

A                             B

  • Identities

  • Selection bias

  • Only total hospitalization avoids Berkson bias:

pcabias

By acoutoal

pcabias

Detecting, Quantifying and Correcting Selection Bias due to PCA adjustment in eQTL studies

  • 614