Single-molecule DNA methylation analysis by SMRT sequencing of short inserts
DELEVOYE Guillaume - 1rst year PhD student
Supervisors: Eric Meyer, Mathieu Bahin
ANR Meeting 08/04/19
Thymine = 5-methyl-uracile ?
<-- 3 known frequent
methylations in DNA that we can detect with PacBio
25X
25X
250X
99% accuracy
Max
75% accuracy
Unusual Slowing: Modification Score (Qv)
Kinetic signature: Identification score (IdQv)
Trained model (ML) allows detection of suspect downturns of polymerase (function of the -3/+8 nt context) --> IPD are captured
~ PHRED scores
Classical PacBio Approach: Higher coverage by overlapping the holes
Our approach: Shorter, real single hole analysis, much more passes
We ~always have either 0X or >>> 25X
Majority of MDS come from the MAC
Murphy's Law statistically hits a lot if you're trying 500.000 times
0 - Quality filter (Z-score)
1 - Create the consensus (=CCS)
2 - Map the CCS on MAC / MIC / MAC+IES (BLASR)
--> Only the best alignment reported (forced)
3 - Filter at >99% identity on at least one genome
4 - Compare the mapping
Reminder: CCS are expected to be somewhat around 99% accuracy
Sequences that come from MIC
N changes
"=" becomes "I"
The polymerase makes random pauses, that are not linked to DNA modification
--->> Values are "capped":
cappingValue = max(99th chunk, 4* modele, 75th local percentile)
For every position in the reference:
cappingValue = max(99th chunk, 4* modele, 75th local percentile)
It doesn't change much
But it's rigorously not the same
What threshold scores should we use ?
The modification/identification scores are sold by PacBio as PHRED scores
For our experiments: we should have about ~30% of modified bases if this is true
--> Modification scores are NOT prhed scores (at least in our case)
HT2
HT6
HTVEG
MAB
MT1A-1B
MT1A-1B-2
MT2
NM9_10
NM4_9_10
WT
Silenced
out the AT
HTVEG
~ Same for every silencing experiment
Lack of sequences (~50 VS ~2.000) don't really allow comparaison between MIC and MAC
Qv20/IdQv20
Logo from HTVEG MAC
Identical everywhere
Number of MAC_IES sequences: will it be enough ?
Capping ok
In silico control --> Experimental
Coverage threshold single-hole ?
Threshold is still a +/- open question
Diminution m6A in some silencings, but which one disappears ?
2.5% m6A in MAC <-> Bad calibration ? MDS missing ? Lack of sensibility ?
NM4_9_10 --> 10% ? Recalibrage par optimisation ?
m4C signal: HT2 = HT6 < HTVEG