Google meeting
Google meeting-
MRC CSC objectives
- Platform for bioinformatics training
- Project to analyse and report QC metrics across all ChIP-seq data.
Training
- MRC asking for all PhD students to receive bioinformatics training.
- Introduction to R
- Statistical inference in Biology.
- Applied bioinformatics
- Post docs and senior scientists wish to acquire skills.
- R
- High-throughput sequencing analysis.
- Simple statistics
Training from Bioinformatics team
- General training in genomics data
- Visualisation of high throughput sequencing data
- Training in the analysis of high throughput sequencing data in R
Analysis of HTS in R
- All courses are presented as in RStudio.
- Rpres format used to allow display of material within RStudio itself.
- All material public and version controlled on github
- Topics include
- Introduction to R
- Reproducible R
- Introduction to Bioconductor
- Introduction to ChIP-seq and RNA-seq
Analysis of HTS in R - Issues
- Very popular course.
- Requires large room.
- Requires computer systems set-up in advance.
- Any changes require update to all computers.
- Analysis of HTS is computationally expensive.
- Most course use dummy data.
- Use of subsetted data limits usefullness of course to real world data.
- Data for course must be downloaded to every machine.
- Often incorrect selection of data leads to problems.
Analysis of HTS in R - Solution
- Rstudio server.
- Bring low power laptop simply to login to server.
- Environment already set-up and easy to update.
- All computation run on server. Allows for examples using multicore processing and more memory.
- Central/shared directory for data.
An Rstudio-server?
- Where to get an Rstudio server.
- Invest in MRC CSC server?
- Not so scalable
- Not useful for most of the year
- Use cloud platform? Tried in BioC 2014 and received grant for MRC CSC courses.
- Invest in MRC CSC server?
HTS pipelines
- Rapid and reproducible results from high thoughput sequencing data is essential for any modern bioinformatics core.
- To achieve this we use version controlled pipelines optimised for our local compute resources.
ChIP-seq pipeline
- Previous versions of ChIP-seq pipeline were developed at Cambridge University by Thomas Carroll.
- Widely used in CRUK, Sanger and Cambridge University and now adapted for MRC CSC in London.
- Many of the tools within pipeline have now been wrapped up in R/Bioconductor packages maintained by MRC CSC.
- ChIPQC.
- Triform.
- soGGi.
- tracktables.
Updated ChIP-seq pipeline
- First pipeline used multiple tool sets and so was hard to version control and install.
- An R centric ChIP-seq pipeline has now been developed within MRC Clinical Sciences Centre.
- This new pipeline runs as a single R markdown script and generates HTML reports linked to IGV.
- Easily installed dependencies from Bioconductor and CRAN repositories.
Testing the ChIP-seq pipeline
- To test the first version of the ChIP-seq pipeline we analysed 1400 datasets and investigated QC metrics and their relation to both each other and related processing steps.
- This provided us with essential knowledge of which QC flags to be used in controlling ChIP-seq data.

Testing the R-centric ChIP-seq pipeline
- More recently new forms or ChIP-seq have emerged (MNAse seq, ChIP-exo) and with new technologies the sequencing output has rapidly grown.
- A large scale reanalysis of ChIP-seq data is required to gain a better understanding of how metrics relate to new ChIP methods as well as the increased length/depth and complexity of sequencing.
Testing the R-centric ChIP-seq pipeline
- Re-run study with new tools and new pipeline on all available ChIP-seq, ChIP-exo, MNAse-seq data.
- Analysis of results will be reviewed within the Bioinformatics team and with Shamith Samarajiwa (MRC Cancer Unit) and Ines De Santiago (CRUK Cambridge) .
- The full pipeline and all metric results will be freely available and published within Github. Analysis of QC flags published in a peer reviewed journal.
googlemeeting
By tom carroll
googlemeeting
- 547