A unified model for calling variants in tumors while controlling the false discovery rate

Text

Johannes Köster

University of Duisburg-Essen                     https://koesterlab.github.io

A unified model capturing all

sources of uncertainty

Calling variants from tumor/normal pairs

Results

https://prosic.github.io

+

CATCATTGAAATA----GGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACAT------CGAA

tumor

healthy

reference

somatic

deletion

germline

insertion

somatic

SNV

P(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}}

Mapping uncertainty

Typing uncertainty

Purity

\(\Pr(\text{left read}) \cdot \Pr(\text{right read}) \cdot \Pr(\text{insert size})\)

=

=

=

\(p_i / a_i\)

Sampling bias

Allele frequencies

Allele frequency estimation

Precision/Recall

False-discovery rate (FDR) control

+

Candidate variants

Mapped NGS reads

simulated data

Poster PROSIC

By Johannes Köster

Poster PROSIC

Poster at ETOS 2018

  • 1,934