Johannes Köster
https://koesterlab.github.io
small
structural
SNVs, Indels
Inversions, Duplications, Indels, ...
?
?
?
?
somatic
germline
small
structural
germline
somatic
Delly
GATK
Freebayes
Platypus
Pindel
Mutect
Strelka
Manta
Octopus
Lancet
small
structural
germline
somatic
Delly
GATK
Freebayes
Platypus
Pindel
Mutect
Strelka
Lancet
Manta
Octopus
arbitrary
\(\xi_i \sim \text{Bernoulli}(\alpha \theta_c \tau + (1-\alpha) \theta_h \tau)\)
\(\omega_i \sim Bernoulli(\pi_i)\)
\(Z_i \mid \xi_i, \omega_i=1 \sim\)
\(\beta\)
\(\Pr(Z_i \mid \theta_c,\theta_h,\beta) =\)
⊕ Bayes' theorem
= posterior probability for events of interest (combinations of allele frequencies)
samples:
normal:
resolution: 5
universe: "0.0 | 0.5 | 1.0 | ]0.0,0.5["
tumor:
resolution: 100
universe: "[0.0,1.0]"
contamination:
by: normal
fraction: 0.25
events:
germline: "normal:0.5 | normal:1.0"
somatic_tumor: "normal:0.0 & tumor:]0.0,1.0]"
Application: tumor/normal
##fileformat=VCFv4.3
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60
20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4
##fileformat=VCFv4.3
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00002 NA00003
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:48:4:51,51 0/0:61:2
20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/2:17:2 1/1:40:3
⊖
Problem:
accumulation of errors
samples:
normal:
resolution: 5
universe: "0.0 | 0.5 | 1.0 | ]0.0,0.5["
tumor:
resolution: 100
universe: "[0.0,1.0]"
contamination:
by: normal
fraction: 0.25
relapse:
resolution: 100
universe: "[0.0,1.0]"
contamination:
by: normal
fraction: 0.53
events:
germline: "normal:0.5 | normal:1.0"
somatic_normal: "normal:]0.0,0.5["
somatic_tumor: "normal:0.0 & tumor:]0.0,1.0]"
somatic_relapse: "normal:0.0 & tumor:0.0 & relapse:]0.0,1.0]"
Application: tumor/normal/relapse
samples:
romeo:
universe:
all: "0.0 | 0.5 | 1.0"
X: "0.0 | 1.0"
Y: "0.0 | 1.0"
juliet:
universe:
all: "0.0 | 0.5 | 1.0"
Y: "0.0"
events:
both_het: "juliet:0.5 & romeo:0.5"
both_hom: "juliet:1.0 & romeo:1.0"
juliet_only: "(juliet:0.5 | juliet:1.0) & romeo:0.0"
romeo_only: "(romeo:0.5 | romeo:1.0) & juliet:0.0"
Application: chromosome ploidy
samples:
tumor:
universe: "[0.0,1.0]"
resolution: 100
events:
ffpe_artifact: "(C>T | G>A) & tumor:]0.0,0.05["
present: "((C>T | G>A) & tumor:]0.05,1.0]) | (!(C>T | G>A) & tumor:]0.0,1.0[)"
Application: FFPE artifacts
Upcoming application: pan-calling
events:
present_in_any: "count(present) > 0"
recurrent: "count(present) >= 2"
Upcoming application: pan-calling
events:
somatic: "somatic_tumor | somatic_relapse"
recurrent: "count(somatic_tumor | somatic_relapse) >= 2"
https://varlociraptor.github.io
Köster lab
David Lähnemann
Jan Forster
Schönhuth lab
Louis Dijkstra (alumn.)
Tobias Marschall (alumn.)