Analysis of factors influencing transcript quantification from RNA-Seq paired ended experiments

Factors that may influence transcript quantification

  • multiple splice forms

  • polymorphisms

  • intron signal (intronic seq)

  • sequencing errors

  • alignment errors

  • annotation errors

  • Differential GC content across exons

  • Random exon priming

  • Positional bias (degradation 5->3')

  • Fragment length?

  • read length

multiple splice forms

  • A gene with 1 isoform is trivial
  • Concurrent expression of multiple isoforms obviously increases the complexity and uncertainty of read assignment. Particularly when combined with multiple levels of transcript expression
  • False negatives may occur for splice-regulatory variants as the error is proportional on the number and uniformity of expression levels of isoforms.

Read length

  • Increasing read length decreases numbers of ambiguously mapped reads. This increases the accuracy of quantification.
  • Transcript identification accuracy is co-dependent on coverage and read length. For coverage > 10million, read length becomes less important as there are plenty of reads to unambiguously identify transcripts.
  • Transcript quantification accuracy may increase with read length due to increased numbers of correctly detected splice junctions. For read lengths > 50bp identification and quantification accuracy of junctions does not significantly improve

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0695-9

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/

http://bioinformatics.oxfordjournals.org/content/31/24/3938.full

Made with Slides.com