Loading
aalexmmaldonado
This is a live streamed presentation. You will automatically follow the presenter and see the slide they're currently on.
Computational Biology
(BIOSC 1540)
Feb 11, 2025
Lecture 06A
Read mapping
Foundations
I have to
(1) Prepare computationalists for future classes
(2) Introduce non-computationalists to the field
and
Programming
is a crucial skill for anything computational
is a helpful skill for science
What?
Use?
How?
What can we do with comp bio?
How do we use the user-friendly tools?
How do the tools work?
1
10
5
Changes I can make
A. Simpler and fewer Python problems
B. Offer an optional Python recitation either in the evening or weekend
C. "Flipped" classroom where I record lectures/assign readings and use classtime for Python
D. No changes
E. No Python
Assignments
Quizzes
CBytes
ATP until the next reward: 653
DNA
DNA sequences are stable across an organism’s lifetime
Questions genomics can answer:
Genomics tells us what’s possible for an organism to do but not when or how it does it.
Examples:
DNA is like a book of instructions—just because a gene exists doesn’t mean it’s being used.
Key insight: To understand cellular function, we need to know which genes are active and when.
RNA
Transcriptomics allows us to see precisely what genes are active at a given moment
We can see gene expression changes over time
Allows us to see which annotated genes are actually being used
The transcriptome is constantly changing and captures the cell's response to its environment and internal signals
This includes
mRNA: instructions for protein synthesis
rRNA: forms part of the ribosome structure
tRNA: helps translate the genetic code into proteins
Non-coding RNAs: play regulatory roles in the cell
(And more)
A single gene can produce multiple mRNA transcripts, which we call isoforms
One of the main ways organisms can increase protein diversity without increasing the number of genes
It's estimated that over 90% of human genes undergo alternative splicing
Drosophila melanogaster has over 38,000 isoforms from this one gene
Dscam (Down syndrome cell adhesion molecule) is involved in neural development
Genomics
Transcriptomics
Sample collection
Great! We have our cells, but how can we extract our RNA?
The first step is always to centrifuge and separate our cells and media
Keep the part that has our component of interest (RNA)
Chemical lysis destabilizes the lipid bilayer and denatures proteins
Surfactants have a hydrophilic head and hydrophobic tail
Phosphate backbone
(negative charged)
Denatures and aggregates at interface
Phenol
Chloroform
Water
Nonpolar
DNA
RNA
Protein
Lipids
Collecting our aqueous phase selects only DNA and RNA
RNA is converted to cDNA using reverse transcriptase
Enrichment method affects
Poly(A) selection captures mature mRNAs
How could we filter our sample for only mRNA?
Sample quality
Assess RNA integrity (RNA Integrity Number)
Low RIN
High RIN
Microarrays
(Now obsolete)
RNA-seq
Now we just sequence the cDNA
Advantages
What is the primary advantage of RNA-seq over microarray technology?
Which sample has a higher RIN?
The human genome is ~3 billion bases, but RNA-seq reads are only ~100 bases long.
A naïve approach would require searching for every read across billions of bases, which is computationally infeasible.
Why is this a problem?
Lecture 06B:
Read mapping -
Methodology
Lecture 06A:
Read mapping -
Foundations
Today
Thursday