A unified model for calling variants in tumors while controlling the false discovery rate

Text

Johannes Köster

University of Duisburg-Essen                     https://koesterlab.github.io

A unified model capturing all

sources of uncertainty

Calling variants from tumor/normal pairs

Results

https://prosic.github.io

+

CATCATTGAAATA----GGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACAT------CGAA

tumor

healthy

reference

somatic

deletion

germline

insertion

somatic

SNV

P(Zitθh,θc)=πitcorrectly mapped(α(θcτtpivariant+(1θcτt)aireference)from cancer cell+(1α)(θhτhpivariant+(1θhτh)aireference)from healthy cell)+(1πih)oiwrongly mappedP(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}}
P(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}}

Mapping uncertainty

Typing uncertainty

Purity

Pr(left read)Pr(right read)Pr(insert size)\Pr(\text{left read}) \cdot \Pr(\text{right read}) \cdot \Pr(\text{insert size})

=

=

=

pi/aip_i / a_i

Sampling bias

Allele frequencies

Allele frequency estimation

Precision/Recall

False-discovery rate (FDR) control

+

Candidate variants

Mapped NGS reads

simulated data

A unified model for calling variants in tumors while controlling the false discovery rate Text Johannes Köster University of Duisburg-Essen                     https://koesterlab.github.io A unified model capturing all sources of uncertainty Calling variants from tumor/normal pairs Results https://prosic.github.io + CA T CATTGAAATA ---- GGCACAT GCTGCT CGAA CAGCATTGAAATATATAGGCACAT GCTGCT CGAA CAGCATTGAAATATATAGGCACAT------CGAA tumor healthy reference somatic deletion germline insertion somatic SNV P ( Z i t ∣ θ h , θ c ) = π i t ⏞ c o r r e c t l y m a p p e d ( α ( θ c τ t p i ⏞ v a r i a n t + ( 1 − θ c τ t ) a i ⏞ r e f e r e n c e ) ⏞ f r o m c a n c e r c e l l + ( 1 − α ) ( θ h τ h p i ⏞ v a r i a n t + ( 1 − θ h τ h ) a i ⏞ r e f e r e n c e ) ⏞ f r o m h e a l t h y c e l l ) + ( 1 − π i h ) o i ⏞ w r o n g l y m a p p e d P(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}} Mapping uncertainty Typing uncertainty Purity Pr ⁡ ( l e f t r e a d ) ⋅ Pr ⁡ ( r i g h t r e a d ) ⋅ Pr ⁡ ( i n s e r t s i z e ) = = = p i / a i Sampling bias Allele frequencies Allele frequency estimation Precision/Recall False-discovery rate (FDR) control + Candidate variants Mapped NGS reads simulated data

Poster PROSIC

By Johannes Köster

Poster PROSIC

Poster at ETOS 2018

  • 1,992