Google meeting

Google meeting-

MRC CSC objectives

  • Platform for bioinformatics training

 

  • Project to analyse and report QC metrics across all ChIP-seq data.

Training

  • MRC asking for all PhD students to receive bioinformatics training.
    • Introduction to R
    • Statistical inference in Biology.
    • Applied bioinformatics
  • Post docs and senior scientists wish to acquire skills.
    • R
    • High-throughput sequencing analysis.
    • Simple statistics

Training from Bioinformatics team

  • General training in genomics data
  • Visualisation of high throughput sequencing data
  • Training in the analysis of high throughput sequencing data in R

Analysis of HTS in R

  • All courses are presented as in RStudio.
  • Rpres format used to allow display of material within RStudio itself.
  • All material public and version controlled on github
  • Topics include
    • Introduction to R
    • Reproducible R
    • Introduction to Bioconductor
    • Introduction to ChIP-seq and RNA-seq

Analysis of HTS in R - Issues

  • Very popular course.
    • Requires large room.
    • Requires computer systems set-up in advance.
    • Any changes require update to all computers.
  • Analysis of HTS is computationally expensive.
    • Most course use dummy data.
    • Use of subsetted data limits usefullness of course to real world data.
  • Data for course must be downloaded to every machine.
    • Often incorrect selection of data leads to problems.

Analysis of HTS in R - Solution

  • Rstudio server.
    • Bring low power laptop simply to login to server.
    • Environment already set-up and easy to update.
    • All computation run on server. Allows for examples using multicore processing and more memory.
    • Central/shared directory for data.

An Rstudio-server?

  • Where to get an Rstudio server.
    • Invest in MRC CSC server?
      • Not so scalable
      • Not useful for most of the year
    • Use cloud platform? Tried in BioC 2014 and received grant for MRC CSC courses.

HTS pipelines

  • Rapid and reproducible results from high thoughput sequencing data is essential for any modern bioinformatics core.
  • To achieve this we use version controlled pipelines optimised for our local compute resources.

ChIP-seq pipeline

  • Previous versions of ChIP-seq pipeline were developed at Cambridge University by Thomas Carroll.
  • Widely used in CRUK, Sanger and Cambridge University and now adapted for MRC CSC in London.
  • Many of the tools within pipeline have now been wrapped up in R/Bioconductor packages maintained by MRC CSC.
    • ChIPQC.
    • Triform.
    • soGGi.
    • tracktables.

Updated ChIP-seq pipeline

  • First pipeline used multiple tool sets and so was hard to version control and install.
  • An R centric ChIP-seq pipeline has now been developed within MRC Clinical Sciences Centre.
    • This new pipeline runs as a single R markdown script and generates HTML reports linked to IGV.
    • Easily installed dependencies from Bioconductor and CRAN repositories.

Testing the ChIP-seq pipeline

  • To test the first version of the ChIP-seq pipeline we analysed 1400 datasets and investigated QC metrics and their relation to both each other and related processing steps.
  • This provided us with essential knowledge of which QC flags to be used in controlling ChIP-seq data.

 

Testing the R-centric ChIP-seq pipeline

  • More recently new forms or ChIP-seq have emerged (MNAse seq, ChIP-exo) and with new technologies the sequencing output has rapidly grown.

 

  • A large scale reanalysis of ChIP-seq data is required to gain a better understanding of how metrics relate to new ChIP methods as well as the increased length/depth and complexity of sequencing.

 

Testing the R-centric ChIP-seq pipeline

  • Re-run study with new tools and new pipeline on all available ChIP-seq, ChIP-exo, MNAse-seq data.

 

  • Analysis of results will be reviewed within the Bioinformatics team and with Shamith Samarajiwa (MRC Cancer Unit) and Ines De Santiago (CRUK Cambridge) .

 

  • The full pipeline and all metric results will be freely available and published within Github. Analysis of QC flags published in a peer reviewed journal.