PhD Defense
DELEVOYE Guillaume
09/06/2022
Supervisor : Dr MEYER Eric
Jury members : Dr DUHARCOURT Sandra, Dr CHEN Chunlong, Dr DURET Laurent
Barbara McClintock discovers the As/Dc elements in maize
Nobel prize (1983)
Ds jumps in presence of Ac (non-autonomous)
(Finnegan)
"Copy paste" versus "Cut and paste"
TEs were first considered as :
TEs are conserved not because they provide an additional fitness to their host, but despite the fact that they don't. This is the non-phenotypic selection.
Doolittle, Orgel, Crick and Sapienza (1980)
The same day in Nature journal
Wicker et al. 2007
Based on:
The cut-paste versus copy-paste comparison turned out to less relevant over time
The paradigm is currently shifting from "Junk DNA" to "Major actors of evolution".
$$\sim2^N$$
> Any particulary vulnerable organisms has been wiped out in the past
> Logical conclusion : All remaining life forms have some kind of resilience towards TEs.
> All virulent TEs have wiped out their host (and disappeared with them)
Many epigenetic regulations exist :
P. tetraurelia has an original way of dealing with TEs
A. van Leeuwonhoek (1668)
"Animalcules"
Pasteur (1862)
Spontaneous generation
HS Jennings (~ 1900)
Paramecium as a model
T. Sonneborn (1937)
Non-mendelian inheritance
of sexual type
in Paramecium
Carol Greider &
Elizabeth Blackburn
Telomeres (1985) - Nobel prize
Meyer and Duharcourt (2014)
Sexual type is inherited via maternal RNAs, in Paramecium
> The genome-wide programmed rearrangements <
Unicellular eucaryote with 3 nuclei:
DNA ratio: 1 MIC for 200 MAC
A new MAC is formed from a MIC, with important genome re-arrangements
Results in a MAC DNA almost purely made of coding sequences
Coyne et al. 2012
49.260 unique sequences
O. Arnaiz et al 2012
PiggyMac (Pgm)
O. Arnaiz et al. 2012
This is not sufficient for the cell to distinguish IESs from the rest of the genome
E. Allen and M. Nowacki - 2017
6mA abundant in Paramecium:
2.5% in the MAC and MIC of P. aurelia (Cummings et al. 1975)
2) In the new forming MAC
1) Constant pattern in the MIC
Transcient ?
And many other possibilities...
| Silencing | Objective | Target of interest | Location of interest |
| None | WT methylation (MIC and MAC) | 6mA + ? | MIC and MAC |
| Control gene | Control | 6mA + ? | MIC and MAC |
| None | Pattern right before the excision ? | 6mA + ? | new forming MAC |
| None | Pattern right before the excision ? | 6mA + ? | new forming MAC |
| NM4 | Bulk of 6mA | 6mA | MAC ++ |
| NM9 + NM10 | Bulk of 6mA | 6mA | MAC ++ |
| NM4 + NM9 + NM10 | Bulk of 6mA | 6mA | MAC ++ |
| MT1A | Permanent pattern erased ? | ? | MIC |
| MT1A + MT1B | Permanent pattern erased ? | ? | MIC |
| MT1A + MT1B + MT2 | Permanent pattern erased ? | ? | MIC |
> Sequenced with PacBio SMSN sequencing
$$ipdRatio= \frac{MeanIPD_{experience}}{unmethylated\ control}$$
~ 85% accuracy
~ 100% accuracy
Slowing around modified nucleotides (~ time x100)
An analysis for each nucleotide, on each strand, of each molecule (SMSN = Single-Molecule Single Nucleotide)
Possible detection : 4mC, 5mC, 6mA, "other"
$$ipdRatio= \frac{MeanIPD_{experience}}{control}$$
A) Control = Whole Genome Amplified (WGA) DNA
B) Control = Machine-learning (nucleotide context)
Then
First
a.k.a The best way ™
Then
First
E.coli is used to feed paramecium (contaminants)
Separability and coverage are correlated
Either a nucleotide is methylated, or it is not :
Our pragmatical solution : An arbitrary linear threshold
If we make the simplification that all GATC/EcoK sites are methylated and that 6mA is only present there :
$$Sensitivity = P(D|M)$$
$$Se = 92\%$$
But :
$$Specificity = P(\overline{D}|\overline{M})$$
$$Sp = 99.8\%$$
PacBio sequencing was already known for its propensity to generate false positives for 4mC (K. O’Brown et al. 2014)
IES
Other MIC
Other MIC
IES
Mac Destinated Sequences (MDS)
MAC
TA Junction
That is, ~100 to 300 IES+ sequences per experiment
Orders of magnitude :
If p number of positive detections among N tests:
p = FP + TP
So,
Which means
And:
Let FD1 and FD2 be resp:
Then:
With
We can also find the number of hemi-methylated sites being detected as such, and the proportion of sites detected as hemi-methylated that are really hemi-methylated. This is possible because we now approximately know PZ0, PZ1 and PZ2, and P(D|Z) is easy to determine:
Then, P(Z|D) can be determined through Bayes theorem using P(D|Z), P(Z) and P(D) (which are all known)
P(Z=1|D=1)
is our case of interest
modelPrediction is the predicted IPD value by the model in a given context of nucleotides at this position
globalIPD is the mean of all the IPD values of the read.
localIPD represents all IPDs that have been mapped at a given position in the genome, including those from other sequences
Conclusion on the capping
Laura landwebehr 2020
Oxytrichia trifallax
A outAT score 20 isQv20 (812 seq)
A outAT score20 idQv20 + Strong BH correction (176 seq)