Genotyping somatic insertions and deletions
Louis Dijkstra, Johannes Köster,
Tobias Marschall, Alexander Schönhuth
HiTSeq 2016
Somatic indels
CAGCATTGAAATA----GGCACAT------CGAA
CAGCATTGAAATATATAGGCACAT------CGAA
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
tumor
healthy
reference
Deletions:
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
CAGCATTGAAATA----GGCACATGCTGCTCGAA
CAGCATTGAAATA----GGCACAT------CGAA
tumor
healthy
reference
Insertions:
somatic
germline
Problem
Given:
aligned NGS reads from tumor and healthy sample
Find:
- somatic indels
- germline indels
and assess their significance while considering uncertainties
Allele frequency
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
Allele frequency
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
heteroz. in all tumor cells:
0.5 x 0.75 = 0.375
Allele frequency
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
heteroz. in red subclone:
0.5 x 0.18 = 0.09
Types of evidence
Maximum likelihood allele frequency:
healthy: 1/2
tumor: 4/7
Internal segment:
Split read:
Uncertainties:
- alignment: correct locus?
- typing: supports variant?
Types of evidence
Internal segment:
Split read:
Calculate:
likelihood of allele frequency while considering uncertainties
Available for each read:
- probability that alignment is correct (MAPQ)
- probability that alignment is associated with variant
Naive solution:
sum over all possible combinations
( summands)
Idea
Latent variable model
typing
alignment
observation
typing
alignment
observation
Likelihood in linear time
Joint model
tumor purity
Simulation study
Healthy:
Venter's genome (30x)
Tumor:
Venter's genome + somatic variants (40x)
Results
Conclusion
A latent variable model for calling somatic insertions and deletions that considers
- segment and split read evidence,
- alignment uncertainty,
- typing uncertainty,
- tumor purity.
Benefit:
- Assess significance of somatic variants.
- Better recall and precision.
- Estimate allele frequency.
https://prosic.github.io
https://bioconda.github.io
Acknowledgements
Louis Dijkstra
Tobias Marschall
Alexander Schönhuth
Genotyping somatic insertions and deletions
By Johannes Köster
Genotyping somatic insertions and deletions
PROSIC talk for HiTSeq 2016
- 2,894