Studying Paramecium's epigenetics with PacBio sequencing

Introduction

Transposable elements

=

Present = Ruins that remained after the apocalypse(s)

Many strategies exist against TE

piRNA / siRNA

RNA guided Methylation of DNA

Histone modification

CRISPR / Restriction enzymes

Hypermutation (neurospora crassa)

Random excision

RNA decays against non-sens ORFs

...

The ciliates: a specific case

P. tetraurelia: Genomic architecture

  • Eucaryote, Ciliate
  • 3 nuclei:
    • 2xMIC nuclei (2n)
      • Reproduction
      • Contains:
        • TE & IES
    • 1xMAC nucleus (up to 800n)
      • Transcription
      • Partial MIC

 

 

 

 

Sexual processes

2 sexual processes:

A) Autogamy

B) Conjugation

 

Everytime : Karyogamy of 2 haploid MIC

 

A new MAC is formed according to the new MIC, with the proteins and the leading of the old MAC

Transposable elements and IES are removed during the process

IES excision in the new MAC

TE/IES: suppressed in the MAC (transcription OK)

--> Avoids the negative effect of TE and IES

~99% IES have TA-boundary

No other consensus sequence, unlike other ciliates

Lots of IES inside coding sequences

~ 100% PGM-dependant excision

PiggyMac

IES

Excision of IES: ScanRNA pathway

  • SCAN-RNA --> Excision of 60% of IES max  (shown by DICER-like silencing)

 

  • Piwi shuttle

 

  • What about the 40% remaining ?

DNA Methylation ?

So, recently in Eric's lab...

Modified bases play an important role in:

  • Procaryote's DNA/RNA
  • Eucaryote's RNA/DNA

Lots of "orphan-MTases"

Remains highly misunderstood

 

Seeking 6-mA recognition domain

Protein identification

RNA silencing

Death of

progeny after autogamy/conjugation

--> Role in IES excision ?

Currently being studied

Hypothesis

Goals

Sequence the methylation

Sort MIC and MAC DNA data

Compare, especially on the IES boundaries

Materials & Methods

PacBio sequencing meets all expectations

99% accuracy

Max

75% accuracy

Methylation analysis needs local > 25X

Trained model (ML) allows detection of suspect downturns of polymerase (function of the -3/+8 nt context) --> IPD are captured

SMRT tracks 3 DNA methylation

<-- 3 known most frequent

methylations in DNA

6mA is the most suspected one in Paramecium

Sorting the sequences

Unknown = Mac Destinated Sequence (MDS)

Available data

PacBio sequencing

 

Wild type:

  • Vegetative Cell (HTVEG)
  • Post autogamic cells 2h (HT2)
  • Post autogamic cells 6h (HT6)
     

Silencing of methylase candidates:

  • Si/MAB
  • Si/MT2
  • Si/MT1A-1B
  • Si/MT1A-1B-2
  • Si/NM4-9-10
  • Si/NM9-10
  • Si/NM4

Other:

  • SiPGM

 

Previously ...

Sorting DONE

Hacking for Single-molecule (IPD, capping) DONE

Methylation analysis (production of outputs) DONE

Work on cutoffs, scores, GMM... DONE

AT and TA, various score stats DONE

Motif analysis DONE

MDS To-do

Transcription start site To-do

IES scRNA VS undepending To-do

 

A few flashbacks on the results

Diffential mapping

Sequences that come from MAC

Sorting stats

rdna ??

mito ??

Score distributions: The case of the adenines

 

Modifications in AT/TA

  • 95% AT
  • Same MIC/MAC in AT
  • No difference between experiments
  • ~75% methylated symetrically

 

Modest variations in the silencing

Modifications out AT

What interests us most

We didn't find no difference between experiments

FIndings in cytosines

Logo from HTVEG MAC

Identical everywhere

I fixed some problems inbetween

 

Sorting was wrong --> Only MAC data can be trusted FIXED

RDNA and mito are back <3

Missing RDNA and Mito --> Pb for southwesternblot FIXED

Mis-excised IES in the MAC --> False IESs FIXED

Mis-estimation of % because of missing MDS --> FIXED

Scan-RNA handling --> SOON

GMM --> Unsatisfactory ONGOING

 

 

 

 

Slow computation ??

Dataset size ??

All stats and motifs must be done again now that the sorting is fixed

Reassuring findings

  • Consistency with Gif's Lab on our results
  • Consistency with new litterature on batched-SMRT
  • Consistency with new evidences on ocytrichia and tetrahymena
  • We found 6mA in the MIC and we have exclusivity for that
  • No one has our resolution for the moment
  • Homologs of our silencing were found to be relevant in other ciliates this year
  • Experience acquired from errors this year...
  • m4C in MIC (and MAC !) absent at T2 and T6 would be a real scoop for eucaryotes

Tasks for the near future

percentages, motifs analysis, subpercentages, multiple score thresholds testing... Redo what I've done before

And to conclude...

SMRT - Sept 2019

By biocompibens

SMRT - Sept 2019

Lab meeting - 19/06/18

  • 93