if we neglict the differences in genome size :
Ratio MIC/MAC ~ 1:200
P(MIC | Paramecium DNA ) = 4 / 804
In theory:
Note:
\(\ ^{(*)} \) Quite low
Illumina sequencing
(total DNA)
Count the ratio of IES+ reads over the total for each bondary (left / right) of each IES
Expected 1:200 when there is no retention
Expected > 1:200 if retention
The higher the coverage, the better
Thanks to O. Arnaiz and Eric's archeological efforts :
[ ... ]
+
+
+
+
Pooled together
> 1.000 X coverage
CTL WT
Manip 1
CTL WT
Manip 2
CTL WT
Manip 3
CTL WT
Manip 4
CTL WT
Manip
(Total DNA from WT cells)
(L)eft and (R)ight bondary are considered separately
the "Support MAC" overlapping a TA are common to L&R
~ Variable MAC
(ploidy ratio unknown)
artifacts
(heavy tail truncated)
\(\ ^{(*)} \) The number of support_MAC reads is used as a proxy to have an approximate coverage, as it represents the vast majority of reads
Area of interest because the MIC/MAC ploidy ratio can be well estimated
\(\ ^{(*)} \) The number of support_MAC reads is used as a proxy to have an approximate coverage, as it represents the vast majority of reads
Do we have \( \frac{nbIES+}{total\ reads} > \frac{1}{200}\ ^{(*)} \ (0.005) \) here)?
No : \( \frac{nbIES+}{total\ reads} \approx 0.00368 \)
Reminder: Expected 1:200 when there is no retention
Expected > 1:200 if retention
\( \ ^{(*)} \) 4/804 \( \approx \) 1/200
If we consider
... Then we should expect :
Theory : 0.005 if no retention, more if retention
Theory : 0.005 if no retention, more if retention
Need to zoom to see the details...
mean : 0.0025 (4/1604)
Monte-Carlo (4/1604)
No retention
Experimental
Predicted
Predicted
Do we really have 4n in the MIC against 800N in the MAC, or rather against 1600N ?
The distributions of IES+/Total we observe are still function of the retention level, even for very low fractions of IES+/total.
For instance, two expected distributions that are identical despite having different ratios MIC/MAC :
P(MIC) = 4/804
Retention = 0/800
P(MIC) = 4/1604
Retention = 4/1600
Fitness to the expectated distribution is not sufficient
Observed
N Mac = 1600
Retention = 0/1600
N Mac = 2000
Retention = 1/2000
N Mac = 1800
Retention = 0.5/1800
N Mac = 100000
Retention = 245/100000
(absurd)
N Mac = 2400
Retention = 2/2400
Despite the considerations of the previous slide,
No retention level (ranging from 0/800 to 800/800) can possibly lead anywhere close to this figure (obtained from experimental data) while having a MIC/MAC 4:800
4:800 ? 4:1600 ?
Anything else ?
I am naked in the darkness of ignorance
Rationnel de base:
Si N = reads passant sur la jonction TA du MAC + les reads portant l'IES :
A priori: Simple, efficace, pas cher.
= Comparaison de proportion: théorique \(p\) VS observée \(p_0\)
$$ H_0 : p = p_0 $$
$$ H_1 : p > p_0 $$ (unilatérale droite)
$$ pvalue = P(p_0 | H_0) $$
Faisable uniquement si:
$$ N \geq 30,\ N\cdotp_0\geq5\ et\ N(1-p_0)\geq5 $$
Très souvent faux pour nous
Option 1 : Approximation loi normale.
Option 2 : Test exact de Fisher
$$ pvalue \leq 5\% \iff Retention \geq 0.00848 $$
La molécule avec le score de rétention le plus élevé pour laquelle \(H_0\) n'est pas rejetée pour \( \alpha = 5\% \) :
Reads MAC : 70
Reads IES+ : 2
\(p_0= 0.02857... \)
\( Retention = \frac{2}{72} = 0.27..7 \)
> Si on se base hâtivement sur le test, on considérera que ça vient physiquement du MIC
Reads MAC : 70
Reads IES+ : 2
\(p_0= 0.02857... \)
\( Retention = \frac{2}{72} = 0.27..7 \)
Si la probabilité réelle de rétention de cette IES dans le MAC notée \( P(R|MAC) \) était, par exemple, de 1% :
> Prendre \( \alpha \) plus grand !
Soulève d'autres problèmes :
Or, on veut travailler sur les IES avec \(pvalue > \alpha\)...
Conclusion : Le score de rétention n'est pas une métrique appropriée pour notre problème.
Ce qui nous intéresse est uniquement, pour chaque IES :
On ne connait pas le pourcentage de molécules MAC qui retiennent l'IES. Mais :
De là : On a bien une piste pour estimer la proportion des IES+ qui viennent physiquement du MIC:
Exemple :
--> On peut estimer grossièrement \( P(R|MAC) \approx 50\% \)
\( P(MIC |IES⁺) \approx \frac{1}{200} \)
Exemple guide :
N = 100
\( nb_{IES^+} = 3 \)
\( P(MIC) = \frac{4}{804} \)
P(retention) = ??
\( P(MIC|IES⁺) = ?? \)
Calcul des probabilités :
> Pour une IES donnée, on peut donc estimer la probabilité qu'elle vienne du MIC ou du MAC
On peut utiliser cet estimateur en déduire, pour un ensemble de X séquences \(IES^+\), quel sera, en moyenne, la proportion de molécules venant du MIC.
\( \frac{nbMIC}{NbMacRet} \)
0/3
1/3
2/3
3/3