A unified model for calling variants in tumors while controlling the false discovery rate
Text
Johannes Köster
University of Duisburg-Essen https://koesterlab.github.io
A unified model capturing all
sources of uncertainty
Calling variants from tumor/normal pairs
Results
https://prosic.github.io
+
CATCATTGAAATA----GGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
CAGCATTGAAATATATAGGCACAT------CGAA
tumor
healthy
reference
somatic
deletion
germline
insertion
somatic
SNV
P(Zit∣θh,θc)=πitcorrectly mapped(α(θcτtpivariant+(1−θcτt)aireference)from cancer cell+(1−α)(θhτhpivariant+(1−θhτh)aireference)from healthy cell)+(1−πih)oiwrongly mapped
P(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}}
Mapping uncertainty
Typing uncertainty
Purity

Pr(left read)⋅Pr(right read)⋅Pr(insert size)
=
=
=
pi/ai
Sampling bias
Allele frequencies
Allele frequency estimation
Precision/Recall
False-discovery rate (FDR) control
+
Candidate variants
Mapped NGS reads
simulated data
A unified model for calling variants in tumors while controlling the false discovery rate Text Johannes Köster University of Duisburg-Essen https://koesterlab.github.io A unified model capturing all sources of uncertainty Calling variants from tumor/normal pairs Results https://prosic.github.io + CA T CATTGAAATA ---- GGCACAT GCTGCT CGAA CAGCATTGAAATATATAGGCACAT GCTGCT CGAA CAGCATTGAAATATATAGGCACAT------CGAA tumor healthy reference somatic deletion germline insertion somatic SNV P ( Z i t ∣ θ h , θ c ) = π i t ⏞ c o r r e c t l y m a p p e d ( α ( θ c τ t p i ⏞ v a r i a n t + ( 1 − θ c τ t ) a i ⏞ r e f e r e n c e ) ⏞ f r o m c a n c e r c e l l + ( 1 − α ) ( θ h τ h p i ⏞ v a r i a n t + ( 1 − θ h τ h ) a i ⏞ r e f e r e n c e ) ⏞ f r o m h e a l t h y c e l l ) + ( 1 − π i h ) o i ⏞ w r o n g l y m a p p e d P(Z_i^t\mid\theta_h,\theta_c) = \overbrace{\pi_i^t}^{\text{correctly mapped}} \Big( \overbrace{\alpha \big(\overbrace{\theta_c \tau_t p_i}^{\text{variant}} + \overbrace{(1 - \theta_c \tau_t) a_i}^{\text{reference}} \big)}^{\text{from cancer cell}} + \overbrace{(1 - \alpha) \big(\overbrace{\theta_h \tau_h p_i}^{\text{variant}} + \overbrace{(1 - \theta_h \tau_h) a_i}^{\text{reference}} \big)}^{\text{from healthy cell}} \Big) + \overbrace{(1 - \pi_i^h) o_i}^{\text{wrongly mapped}} Mapping uncertainty Typing uncertainty Purity Pr ( l e f t r e a d ) ⋅ Pr ( r i g h t r e a d ) ⋅ Pr ( i n s e r t s i z e ) = = = p i / a i Sampling bias Allele frequencies Allele frequency estimation Precision/Recall False-discovery rate (FDR) control + Candidate variants Mapped NGS reads simulated data
Poster PROSIC
By Johannes Köster
Poster PROSIC
Poster at ETOS 2018
- 1,992