DELEVOYE Guillaume - M2 bioinformatics
Supervisors: Eric Meyer, Mathieu Bahin, Auguste Genovesio
A 340 years old story
Christopher Hooke
1665 - "Micrographia"
30x microscope
First Cell description
Antoni Van Leeuwenhoek
1678 - "Animalcules"
First observation of single-cell organisms
Infusoria, Paramecium
CIliates: Main discoveries
1862 - Pasteur: Refutation of the spontaneous generation theory with infusoria
1937 - Sonneborn: Non-mendelian inheritance of sexual type in paramecium
Elizabeth blackburn & Carol Greider: 1985 - Telomeres and telomerases on Tetrahymena (Nobel Prize 2009)
Eric Meyer, Sandra Duharcourt
(IBENS, I. Jacques Monod 2014)
Sexual type in paramecium are transmitted by maternal RNAs, not by DNA
3 important events
Survival
Transposable elements are suppressed in the MAC, where the transcription occurs
Avoids the negative effect of TE
PiggyMac
Genetic drift...
The old MAC (maternal) drives everything during the formation of the new one:
= Non-mendelian / Cytoplasmic heridity
~99% TA-boundary
~ 100% PGM-dependant
"Domesticated enzyme to fight transposons"
Clean cut&Paste mechanism
60% SCAN-RNA Pathway
So, recently in Eric's lab...
Modified bases play an important role in:
Lots of "orphan-MTases"
Remains highly misunderstood
Seeking 6-mA recognition domain
Protein identification
RNA silencing
Death
--> Role in IES excision ?
Currently being studied
Please cut me !
Don't cut me !
Thymine = 5-methyl-uracile ?
And many others in RNA...
Pseudouridine (ψ) Dihydrouridine (D) m1A m2A m6A m62A i6A t6A Am Ar(p) DHT m3U mo5U s2U s4U iG m1G m2G m22G Gm k2C iC ψC ψiC m3C m4C m5C m42C Cm ac4C s2C m1I Im...
<-- 3 known most frequent
methylations in DNA
Sequencing : Sorting MIC and MAC
98% of IES are < 200 pb: "Long read" needed (to sort)
99% accuracy
Max
75% accuracy
Methylation analysis needs local > 25X
Trained model (ML) allows detection of suspect downturns of polymerase (function of the -3/+8 nt context) --> IPD are captured
A very user-friendly browser interface
Advanced analysis are impossible
--> Command-line tools required
Classical PacBio Approach: Higher depth by overlapping the holes
Our approach: Shorter, real single hole analysis, much more passes
We ~always have either 0X or >>> 25X
Murphy's Law statistically hits a lot if you're trying 500.000 times
Other cases:
0 - Quality filter (Z-score)
1 - Create the consensus (=CCS)
2 - Map the CCS on MAC / MAC+IES (BLASR)
--> Only the best alignment reported (forced)
3 - Filter 90% identity for either MAC or MAC+IES
4 - Compare the mapping
Reminder: CCS are expected to be somewhat around 99% accuracy
Sequences that come from MIC
N changes
"=" becomes "I"
Threshold N = 27
At least N = 27 positions map on IES
A least 2 positions have to be on a junction
N > 27 changes
"D doesn't match I on expected IES"
Everything else:
"Trash"
Everyone else's PacBio usage:
Eric, a weird scientist:
Sometimes the polymerase is "Lazy" and stops for a VERY long time --> Really annoying (methylation analysis)
Outliers:
Modification detection, modification identification, p-value, PHRED-score...
--->> Values are "capped":
cappingValue = max(99th chunk, 4* modele, 75th local percentile)
For every position in the reference:
cappingValue = max(99th chunk, 4* modele, 75th local percentile)
Probably it doesn't change much
But it's rigorously not the same
Unfortunately
>>> This is also how the recently publication of Beaulaurier works (with hd5)
PacBio "kineticsTools" could work only:
Now it's properly hacked:
Strand issue (BLASR) to fix: Only strand 0 is available
Quality threshold