
Computational Biology

(BIOSC 1540)
Jan 14, 2025
Lecture 02A
DNA sequencing
Foundations
Announcements
After today, you should have a better understanding of
Importance and applications of DNA sequencing
DNA sequencing revolutionizes biology and medicine through diverse applications
- Medicine: Enables precision medicine, genetic disease diagnosis, and cancer genomics.
- Agriculture: Enhances crop improvement, pest resistance, and livestock genetics.
- Evolution: Deciphers evolutionary relationships and molecular phylogenies.
- Microbiology: Identifies pathogens and studies microbial communities (e.g., metagenomics).
- Ecology: Monitors biodiversity and tracks species in ecosystems.
After today, you should have a better understanding of
Techniques for extracting and purifying high-quality DNA
DNA extraction
How do we acquire our DNA sample?
Computationalists need to understand the underlying source of our data for quality control

Let's start with a bacterial culture

Fun fact: Pitt has a beer brewing class (ENGR 1933)
We let our bacterial culture produce our products of interest

Biotechnology frequently uses massive E. coli cultures to produce bioproducts
Separate cells from media

Great! We have our cells, but how can we get DNA out of our cells?
The first step is always to centrifuge and separate our cells and media
Keep the part that has our component of interest (DNA)
We break open our cells by lysing them
Chemical lysis destabilizes the lipid bilayer and denatures proteins

Surfactants have a hydrophilic head and hydrophobic tail


Wait, surfactants sound a lot like phospholipids?


What's the primary difference, and how does this change its behavior?
Surfactants possess a single hydrophobic tail. Why does the incorporation of these surfactants destabilize the phospholipid membrane?
Please note: TopHat questions are ungraded. Engaging honestly with the question will benefit you far more than any shortcuts.
After today, you should have a better understanding of
Techniques for extracting and purifying high-quality DNA
DNA purification
At this stage, we need to separate DNA from other biomolecules ... how?


We need to exploit physicochemical property differences (such as solubility, charge, and hydrophobicity) to separate DNA from other biomolecules
Phenol-chloroform extraction exploits solubility and density differences
Phosphate backbone
(negative charged)
Denatures and aggregates at interface

Phenol

Chloroform

Water
Nonpolar

DNA

RNA

Protein

Lipids
Collecting our aqueous phase selects only DNA and RNA
Silica column-based purification relies on ionic interactions
Under high-salt conditions, negatively charged DNA binds to the positively charged silica membrane via electrostatic interactions
Contaminants like proteins and salts do not bind or are washed away
DNA is then eluted with a low-salt buffer or water
Magnetic beads rely on selective adsorption and surface chemistry

Magnetic beads coated with DNA-binding agents (e.g., silica or polymer) selectively adsorb DNA in the presence of binding buffers
Magnetic fields are used to separate beads with bound DNA from the solution, allowing for washing away impurities like proteins, RNA, and salts
Note: Nowadays, most labs use highly effective kits

After today, you should have a better understanding of
Techniques for extracting and purifying high-quality DNA
DNA quality quantification
Before sequencing our sample, we should check the quality

DNA
Likely contaminants
RNA contamination can inflate DNA quantification readings due to similar properties

RNA

Protein
Why it's a problem
Proteins can inhibit enzymatic reactions in library preparation and distort DNA quantification
UV radiation is selectively absorbed based on molecular structure


Molecules with aromatic rings absorb UV light strongly due to their conjugated π-electron systems
UV light excites electrons in the π-bonds of aromatic systems to higher energy states
UV radiation is selectively absorbed based on molecular structure
Proteins absorb UV light primarily at 280 nm, mainly due to aromatic amino acids
DNA and RNA absorb UV light at 260 nm because their bases contain highly conjugated double bonds




A260/A280 ratio relates to sample purity
After today, you should have a better understanding of
Steps in preparing DNA libraries for sequencing
A DNA library is a collection of DNA fragments ready for sequencing
Fragmentation breaks DNA into smaller, manageable pieces
Methods include
- Mechanical shearing (e.g., sonication)
- Enzymatic digestion using restriction enzymes
Long DNA molecules cannot be sequenced by most platforms due to size constraints
DNA is fragmented to an optimal size range (e.g., 200–500 bp) for efficient sequencing and alignment

Adapter ligation enables amplification and sequencing
Adapters are short, synthetic DNA sequences that are ligated to the ends of DNA fragments during library preparation



PCR amplification ensures sufficient DNA for sequencing

During next-generation sequencing library preparation, short “adapter” sequences are added to the ends of DNA fragments. Which of the following best describes the primary reason for adding these adapters?
A. To link multiple fragments into a single chain for more efficient sequencing.
B. To selectively remove unwanted DNA fragments before sequencing for a better distribution.
C. To incorporate chemical modifications that prevent secondary structure formation.
D. To provide binding sites for PCR and enable recognition by the sequencing instrument.
Please note: TopHat questions are ungraded. Engaging honestly with the question will benefit you far more than any shortcuts.
After today, you should have a better understanding of
Principles and innovations of DNA sequencing technologies
Our main problem: Determine the precise ordering of nucleotides

All DNA sequencing technologies are designed to produce a distinct signal corresponding to nucleotides in a specific sequence
- Optical: Generated by the interaction of light with nucleotides, often through fluorescence or absorbance.
- Electrical: Variations in current or voltage as nucleotides interact with a sensing element.
- Chemical: Produced by enzymatic or chemical reactions.
Common signals
After today, you should have a better understanding of
Principles and innovations of DNA sequencing technologies
Chain termination (Sanger)
DNA elongation happens rapidly and continuously
We use DNA polymerase + excess nucleotides to make copies of DNA
Fluorescent tags enable nucleotide detection but require precise signal localization

When excited by light, fluorescent tags emit distinct signals, providing a mechanism to detect nucleotide identity
Issue: How can we determine where the signal is coming from in the sequence?
The length of a DNA fragment can be used to specify a nucleotide location (i.e., the last nucleotide)
3' OH is required for DNA elongation

What happens if we don't have the 3' OH?
We cannot add another nucleotide
Di-deoxynucleotides stop replication




ddNTP will randomly stop DNA elongation
We will be left with DNA strands of variable length with an optical-based signal at the end
When DNA polymerase adds a
ddNTP
, it cannot add any other
nucleotide
Ratio is usually
1
:
100
By sorting DNA fragments by length, we can identify the last nucleotide is














Variable-length fragments
Fragments sorted by length
Last nucleotide order

Original setup
- Split DNA sample into four beakers
- Add all four dNTPs to each beaker
- Add some amount of radioactive ddNTP in a single beaker
- Add Taq polymerase and let PCR run
Why would we need separate beakers?
Once we have fragments, how can we separate them by length?

Gel electrophoresis!
Cannot differentiate between radioactive nucleotides
We can build our sequence based on what (radioactive) ddNTP is at that position

Now we use fluorescence to distinguish ddNTPs

Only need one PCR!
We also can automate fragment separation

Capillary gel electrophoresis can accelerate fragment length sorting and detection
Unique fluorescence signal per ddNTP produces a chromatogram
Ideal chromatogram

After today, you should have a better understanding of
Principles and innovations of DNA sequencing technologies
Sequencing by synthesis (Illumina)
Sanger sequencing is highly accurate but lacks scalability and speed for large-scale sequencing
What if we could identify nucleotides as they are being added, allowing us to sequence faster and at a larger scale?
Sequencing by synthesis identifies nucleotides as DNA strands are being synthesized


Immobilizing DNA fragments on a flow cell enables stable signal detection

Bridge amplification generates clusters of identical DNA fragments, amplifying the signal for detection

Bridge amplification creates double-stranded bridges
Clusters will give off a stronger signal compared to a single fragment

Double-stranded clonal bridges are denatured with cleaved reverse strands
Even with immobilization, the signal from a single fragment is often too weak to detect

Forward
Reverse
More on this in later lectures
After today, you should have a better understanding of
Principles and innovations of DNA sequencing technologies
Single molecule sequencing (Nanopore)
Illumina sequencing is cost-effective, scalable, and highly parallel, but limited by short read lengths
Short DNA reads make genome assembly difficult, especially in repetitive regions

Single-molecule sequencing enables long-read sequencing by reading DNA molecules directly
Nanopore sequencing detects nucleotide sequences by measuring changes in ionic current as DNA passes through a pore
- DNA passes through a nanopore driven by an electric field.
- Each nucleotide disrupts ionic current in a unique, measurable way.
- Real-time signal capture translates into nucleotide sequence.


Match each modern sequencing technology with the correct combination of features or characteristics.
Please note: TopHat questions are ungraded. Engaging honestly with the question will benefit you far more than any shortcuts.
Before the next class, you should
Lecture 02B:
DNA sequencing -
Methodology
Lecture 02A:
DNA sequencing -
Foundations
Today
Thursday
BIOSC 1540: L02A (Sequencing)
By aalexmmaldonado
BIOSC 1540: L02A (Sequencing)
- 164