PhD Defense
DELEVOYE Guillaume
09/06/2022
Supervisor : Dr MEYER Eric
Jury members : Dr DUHARCOURT Sandra, Dr CHEN Chunlong, Dr DURET Laurent
A. van Leeuwonhoek (1668)
"Animalcules"
Pasteur (1862)
Spontaneous generation
HS Jennings (~ 1900)
Paramecium as a model
T. Sonneborn (1937)
Non-mendelian inheritance
of sexual type
in Paramecium
Carol Greider & Elizabeth Blackburn (1985)
Nobel prize (telomeres in Tetrahymena)
Teams of E. Meyer and S. Duharcourt (2014)
Sexual type determined not by DNA but maternal RNAs, in Paramecium
> The genome-wide programmed rearrangements <
Unicellular eucaryote with 3 nuclei:
DNA ratio: 1 MIC for 200 MAC
After sexual processes, a new MAC is formed, with important genome re-arrangements
Results in a MAC DNA almost purely made of coding sequences
Coyne et al. 2012
49.260 unique sequences
O. Arnaiz et al 2012
PiggyMac (Pgm)
"Invade Bloom Abdicate Fade" model
Adapted from Glen Arthur Herrick 1997
100%
O. Arnaiz et al. 2012
Not sufficient to distinguish IESs from the rest of the genome
S. E. Allen and M. Nowacki - 2017
If not in the maternal MAC : Recognized and excised
Inactivation of scnRNA and iesRNA pathways:
All IESs may be recognized through the small RNAs
... but is there a redundant system for the oldest/shortest ones ?
Problematic ~ Self VS non-self recognition
Hypothesis : DNA methylation
5e base <-> PERMANENT
2.1-2.5% in the MAC and MIC of P. aurelia (Cummings et al. 1975)
N6-methyladenine (6mA) abundant in Paramecium:
?
Other:
If maintained in the whole cell cycle in the MAC
Could explain :
% tetrahymena oxytricha
Tetrahymena : pas de 6mA dans le MIC
Faire remarquer que palindrome
Lien méthylase de maintenance (pour discussion)
Transcient ?
DNA modifications could also play a role in the new MAC in formation (transiently)
Part of the scnRNA pathway
[ QUITTE A ... ] scRNA génèrent la méthylation pour guider plus précisément ?
Pas scnRNA
And many other possibilities...
Méthylase de maintenance (si palindromique)
> Sequenced with PacBio SMSN sequencing
WT Veg
Control
silencing
T=2h
T=6h
RNA interference
Candidate methylases
Reduction 6mA
Southwesternblot
Total DNA
1:200 MIC !!!
LISTER ECHANTILLONS
Première méthylase != première 6mA
Southwestern : 90%. Pk pas nous ?
MTA1 -- orthologue 4-9-10
MTA9 -- Pas catalytique chez Tetrahymena [..] --> MT1A1B2
$$ipdRatio= \frac{MeanIPD_{experience}}{unmethylated\ control}$$
~ 85% accuracy
~ 100% accuracy
Kinetic signatures
Nucleotide context
DNA modifications
depending
on
6mA
12 nucleotides dans le canal
1 séquence de 12 = une vitesse
Only SMSN + in-sillico are compatible with our strategy
No analysis pipeline existed
> 25 measures
SMSN : Same molecule
Measured multiple times
AggSN : Aggregation of
distinct molecules
$$ipdRatio= \frac{MeanIPD_{experience}}{unmethylated\ control}$$
Whole Genome Amplified DNA (WGA)
Machine-learning
"in-sillico"
IES
Other MIC
Other MIC
IES
Mac Destinated Sequences (MDS)
MAC
TA Junction
Parler petits inserts
That is,
Orders of magnitude :
This is not much, but if we are right 100% of the scnRNA independent IESs could be methylated
P(R)
1 - P(R)
5 reads IES -
1 read IES +
2 reads IES +
$$IRS_L = \frac{2}{2+5} \approx 27\%$$
$$IRS_R = \frac{1}{1+5} \approx 16\%$$
The higher the IRS, the higher the retention.
e.g
MIC = 4n, MAC = 800n, R = 0.005 , N = 100 NGS reads
$$\mathbb{E}(IRS)= 0$$
$$P(MIC|IES+) = 50\%$$
No !
Even a low IRS can be problematic for us !
When the N is small (~100), it's just impossible to see small retention levels
Due to the MAC ploidy, even the slightest retention leads to $$P(MIC|IES^+) < P(MAC|IES^+)$$
Let's just keep all IESs with an IRS = 0 ?
??
Le faire pour chaque IES
Implicitement (amalgamé : Dépend de l'IES plus que du réplicat)
Four options to estimate R :
Hamiltonian
Monte-carlo
Inverse
transform
(calculus)
Reject sampling
Monte-carlo
Bayesian approaches
(credible intervals)
Frequentist approach
(confidence interval)
Computation time
Hard to implement
Expected to give similar results
$$\mathbb{E}(IRS) = P(MIC) + P(R) \cdot P(MAC) $$
$$P(MIC|IES+) \in [9.5\% - 93.7\%], \alpha = 5\%$$
Problem : The size of confidence intervals is very big
For most IESs, we will simply not be able to tell wether it comes from the MAC or the MIC
Workaround : Pooling samples
Rare picture of Eric, doing so archeology to find more samples to pool and gain coverage (circa 2022, colourized)
MITO --> Mettre pour conclure 50% retenu quand IES
Implicitement (amalgamé : Dépend de l'IES plus que du réplicat)
Dire explicitement que ce sont des séquen_ages d'ADN total de cellules végétatives
ENLEVER L'HISTOGRAMME
If MAC ploidy = 800n than without retention :
$$E(IRS) = \frac{4}{800+4} \approx 0.005$$
If retention :
$$E(IRS) >> \frac{4}{800+4}$$
0.002-0.003
Pooling samples is not sufficient !
On average, we will have only very vague estimates of P(MIC|IES+) !
Computed with Kmac = 1600n
E.coli is used to feed paramecium (contaminants)
DOnc on peut l'utiliser pour tester le pipeline
Separability and coverage are correlated
Either a nucleotide is methylated, or it is not :
Our pragmatical solution : An arbitrary linear threshold
If we make the simplification that all GATC/EcoK sites are methylated and that 6mA is only present there :
$$Sensitivity = P(D|M)$$
$$Se = 92\%$$
But :
$$Specificity = P(\overline{D}|\overline{M})$$
$$Sp = 99.8\%$$
PacBio sequencing was already known for its propensity to generate false positives for 4mC (K. O’Brown et al. 2014)
Qv30
Est-ce que c'est pareil pour du PCR amplifié ?
• Between 1.25% and 1.45% of 6mA in the MAC
• Between 97.39 and 100% of them are located in
AT sites
Taking account of the uncertainty of Se and Sp :
Problem : Some results will vary greatly depending on Se and Sp !
En faisant les corrections
Mettre couleurs
Parfait
Sous évaluation
Sur évaluation
Quelques confusions
-50% NM4+9+10
>> Bulk of 6mA in the MAC
Other candidates too
Pareil que chez Tetrahymena
Symmetrical methylation of hemi-methylated AT sites
Raise of hemi-methylation, whose intensity depends importantly on how well Se and Sp are well estimated or not
De novo methylation of unmethylated AT sites
The capacity to make symmetrical methylation is never abolished completely
Predicted FDR : 100%
But likely detection outside of AT sites too
AGAA and GAGG motif
are documented as methylated sites (6mA) in C. el-
egans too (Greer et al. 2015)
Number of molecules with at least one exploitable adenine
- several IESs
- variable MAC regions
- extremity outliers
REGIONS VARIABLES !!!!!!!!
The vast majority of IES+ molecules come actually... From the MAC !!!
If p number of positive detections among N tests:
p = FP + TP
So,
Which means
And:
Let FD1 and FD2 be resp:
Then:
With
We can also find the number of hemi-methylated sites being detected as such, and the proportion of sites detected as hemi-methylated that are really hemi-methylated. This is possible because we now approximately know PZ0, PZ1 and PZ2, and P(D|Z) is easy to determine:
Then, P(Z|D) can be determined through Bayes theorem using P(D|Z), P(Z) and P(D) (which are all known)
P(Z=1|D=1)
is our case of interest
modelPrediction is the predicted IPD value by the model in a given context of nucleotides at this position
globalIPD is the mean of all the IPD values of the read.
localIPD represents all IPDs that have been mapped at a given position in the genome, including those from other sequences
Conclusion on the capping
Laura landwebehr 2020
Oxytrichia trifallax
A outAT score 20 isQv20 (812 seq)
A outAT score20 idQv20 + Strong BH correction (176 seq)