Louis Dijkstra, Johannes Köster,
Tobias Marschall, Alexander Schönhuth
HiTSeq 2016
CAGCATTGAAATA----GGCACAT------CGAA
CAGCATTGAAATATATAGGCACAT------CGAA
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
tumor
healthy
reference
Deletions:
CAGCATTGAAATATATAGGCACATGCTGCTCGAA
CAGCATTGAAATA----GGCACATGCTGCTCGAA
CAGCATTGAAATA----GGCACAT------CGAA
tumor
healthy
reference
Insertions:
somatic
germline
Given:
aligned NGS reads from tumor and healthy sample
Find:
and assess their significance while considering uncertainties
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
heteroz. in all tumor cells:
0.5 x 0.75 = 0.375
Probability that genome copy in sample harbors variant
homozygous: 1.0
heterozygous: 0.5
absent: 0.0
heteroz. in red subclone:
0.5 x 0.18 = 0.09
Maximum likelihood allele frequency:
healthy: 1/2
tumor: 4/7
Internal segment:
Split read:
Uncertainties:
Internal segment:
Split read:
Calculate:
likelihood of allele frequency while considering uncertainties
Available for each read:
Naive solution:
sum over all possible combinations
( summands)
typing
alignment
observation
typing
alignment
observation
tumor purity
Healthy:
Venter's genome (30x)
Tumor:
Venter's genome + somatic variants (40x)
A latent variable model for calling somatic insertions and deletions that considers
Benefit:
https://prosic.github.io
https://bioconda.github.io
Louis Dijkstra
Tobias Marschall
Alexander Schönhuth