Computational Biology

(BIOSC 1540)

Oct 22, 2024

Lecture 14:
Molecular system representations

Announcements

  • A05 is due Thursday by 11:59 pm
  • A06 will be posted on Friday

After today, you should be able to

Explain why DHFR is a promising drug target.

THF production is crucial for cellular growth

THF is needed for

  • Producing red blood cells,
  • Synthesizing purines,
  • Interconverting amino acids,
  • Methylating tRNA,
  • Generating and using formate.

5,6,7,8-tetrahydrofolate (THF) is essential for all organisms

Disrupting THF production has a cascading effect on essential cellular processes, primarily affecting DNA and RNA synthesis and amino acid metabolism

This is a useful process for drug design

DHFR is responsible for synthesizing THF

Dihydrofolate reductase (DHFR) is a crucial enzyme that produces THF from dihydrofolate (DHF)

DHF + NADPH

THF + NADP(+)

DHF

NADPH

(We will use this protein for our project)

DHFR has been extensively studied as an antibiotic (e.g., trimethoprim) and cancer (e.g., methotrexate) target

DHFR conservation complicates drug design

Patient could have deleterious side effects

What would happen if a patient with a bacterial infection is prescribed a drug loosely targeting DHFR

Both proteins have high structural similarity, even around the active site

DHFR conservation complicates drug design

Bacteria and humans have similar structures, but their dynamics are different

Outcome: We need to ensure drugs only bind to bacterial proteins by exploiting dynamic insights

Simulating DHFR provides insight into druggable conformations

MD simulations will explore various low-energy conformations that are, hopefully, similar to reality

Knowing conformations unique to bacteria allow us to design a small molecule that competitively inhibits DHFR

After today, you should be able to

Select and prepare a protein structure for molecular simulations.

We need a structure before starting any molecular simulation

If our starting structure is very far away from our desired equilibrium, our simulations will take longer

For example, we would have to wait for the protein to fold to study it's dynamics

  • Low-quality experimental structures
  • Inaccurate computational predictions
  • High-energy conformations
  • Missing or incorrect cofactors

We can obtain starting structures from experimental databases

Experimental structures offer the best option for their accuracy

PDB contains experimentally determined structures for thousands of proteins

General resolution preference: X-ray, Cryo-EM, NMR

Not all structures in the PDB are equally suitable for simulations

 Proteins can exist in different functional conformations: active vs. inactive state, bound to ligands or unbound

Functional state

Higher B-factors suggest more uncertainty in atom positions, which might make that part of the structure less reliable

B-factors

Flexible loops or disordered regions are often missing from the structure

Completeness

Resolution

The resolution of a structure refers to how well the atomic positions are determined

Tip: A resolution below 2.0 Å is generally preferred for high-quality simulations.

Not all structures in the PDB are equally suitable for simulations

Factor 7D4L 4NX6 4KJK 4NX7
Resolution (Å) 1.60 1.35 1.35 1.15
Temperature 298 298 298 100
R-free 0.196 0.190 0.166 0.170
Clashscore 2 5 8 12
Ramachandran outliers 0 0 0 0
Rotamer outliers 1 2 1 5

Here are some example structural characteristics with the best value in bold

7D4L is a good choice 

Resolution and R-free are comparable, and few clashes are highly desirable

Reasonable structures will likely provide similar results

Factor 7D4L 4KJK
Resolution 1.60 1.35
Temperature 298 298
R-free 0.196 0.166
Clashscore 2 8
Ramachandran outliers 0 0
Rotamer outliers 1 1

Blue: 7D4L

Grey: 4KJK

Alpha carbon RMSD is 0.141 (indicating high similarity)

Either structure would provide comparable results if simulation protocols are appropriate

Simulations cannot have missing residues

It’s essential to fix chain breaks and missing loops before simulation

Blue: 7D4L

Pink: 8UCX

8UCX is missing residues 17 and 18

Protein structure predictions are used to add missing residues

Missing atoms or residues can be added using modeling software like Modeller

Pink: 8UCX

Grey: Modeller

Dashed lines often indicate missing atoms

Unwanted components like ligands or non-essential ions should be removed

Many PDB structures contain ligands, ions, or crystallization agents that are not physiologically relevant

These can distort the protein's behavior in a simulated biological environment if not removed

Manganese (II)

Mercaptoethanol

Water molecules

Ligands

Correct protonation states are essential for accurate simulations

Experimental structures often cannot resolve hydrogens, so we need to add them ourselves

pH-sensitive residues 

Histidine (His, H): pKa ~6.0

Protonation switching around pH 6 - 7

Cysteine (Cys, C): pKa ~8.3

Could form disulfide bonds in oxidizing environments

Aspartic Acid (Asp, D): pKa: ~3.9

Affects interactions like salt bridges and hydrogen bonds

Glutamic Acid (Glu, E): pKa: ~4.2

Glu's protonation state affects electrostatic interactions.

Lysine (Lys, K): pKa: ~10.5

Can form ionic bonds with negatively charged residues

Tyrosine (Tyr, Y): pKa: ~10.1

Hydrogen bonding and in enzyme active sites

Protonation states of amino acids affect the charge distribution, which influences electrostatic interactions during the simulation

We now have a fully prepared protein

After today, you should be able to

Explain the importance of approximating molecular environments.

DHFR is localized in the cytoplasm, which contains a multitude of chemical species

Ions

Potassium, Sodium, Calcium, Magnesium, Iron, Zinc, Copper, Manganese, Phosphate, Chloride, Bicarbonate, Sulfate, Citrate, ATP, ADP, AMP, . . .

Molecules

Glucose, pyruvate, lactate, amino acids, fatty acids, nucleotides, NADH, FADH, citrate, oxaloacetate, biotin, riboflavin, coenzyme A, ubiquinone, . . .

Proteins

Glycolytic enzymes, TCA cycle enzymes, DNA/RNA polymerases, kinases, phosphatases, G-proteins, heat shock proteins, molecular motors, transcription factors, transcription regulators, ribosomes, proteasomes, . . .

Organelles

Mitochondria, endoplasmic reticulum, golgi apparatus, lysosomes, peroxisomes, vacuoles, endosomes, ribosomes, centrosomes, . . .

Cytoskeleton

Actin, profilin, cofilin, myosin, keratins, vimentin, neurofilaments, tubulin, . . .

Membranes

Phospholipid bilayer with embedded proteins, cholesterol, clycoproteins, glycolipids, . . .

and more

Simulations should accurately represent reality

What biological or chemical components are crucial for modeling the dynamics of a protein in the cytosol?

We must balance computational feasibility with biological realism

  • Protein of interest (already prepared)
  • Water molecular at the appropriate temperature (310 K) and pressure (1 atm)
  • Cations (Na+ or K+) and anions (Cl-) at an ionic strength of 150 millimolar
  • Any cofactors (e.g., NADPH and Folate for DHFR)

Example of system: roGPF2

Starting structure for simulating Cu(I) binding to Cys147 and 204 in roGFP2 with Na+ and Cl- counterions

(Actually used in my research.)

After today, you should be able to

Describe periodic boundary conditions and their role in MD simulations.

Realistic systems do not have walls

For this simulation, we would have to apply a force to keep the molecules in this box

Water molecules and proteins would bounce off these walls in an unphysical manner (i.e., edge effects)

A protein in vivo or in vitro will have plenty of space to move around

We could make the box very large, but this would dramatically increase the cost

Periodic boundary conditions (PBC) is how we solve this issue

PBC simulate infinite systems from a finite box

Think PackMan: If he crosses the right side of the map, he reappears on the left

We (virually) place exact copies of our system in all directions

Atoms that cross the box edge reappear on the other side; thus, do not have edge effects

The minimum image convention ensure correct interaction

Image atoms in adjacent boxes are used to calculate interactions across the boundaries

The minimum image convention (MIC) ensures that an atom in the primary box only interacts with the closest image of another atom

After today, you should be able to

Explain the role of force field selection and topology generation.

We now have a fully prepared system, now we prepare our simulation

Force fields are parameterized to reproduce quantum chemical and experimental data

1. Generate structures and use quantum chemistry to compute energy and forces

2. Optimize force field parameters until they reproduce the quantum chemistry dataset

3. Run MD simulations and predict experimental data (e.g., NMR, Raman spectroscopy, solvation energies, etc.)

4. Continue to optimize force field parameters to minimizing quantum chemistry and simulation prediction errors

Force fields are dependent on fitting data and simulation setup

Force fields are not inherently compatible with each other

Example: Simulating a DNA-binding protein

Suppose my protein force field was fit to:

  • Membrane proteins
  • Proteins and RNA

Suppose my DNA force field was fit to:

  • Single-stranded DNA
  • Protein binding with a different type of force field

Simulations would be unreliable because the force fields are incompatible with each other

Forcefields are compatible by design, or are validated against experimental data

Key factors for selecting a force field

  • System type: Different force fields are optimized for specific systems
  • Accuracy vs. speed: High-accuracy force fields may require more computational resources
  • Compatibility: Choose a force field based on compatibility with available topology generators and the type of molecules in your simulation.
  • AMBER: Best for proteins and nucleic acids, optimized for biomolecular interactions.
  • CHARMM: Known for its extensive parameter set, suitable for complex systems including proteins, lipids, and membranes.
  • OPLS: Optimized for small molecules, organic compounds, and polymers, with emphasis on accurate non-bonded interactions.

Examples:

Topology files define the molecular structure and interactions in a simulation

A topology file contains information on atom types, bonds, angles, dihedrals, and non-bonded interactions based on the chosen force field

Essentially tells the program which force field parameters to use where

Example AMBER topology

%VERSION  VERSION_STAMP = V0001.000  DATE = 01/20/24  21:37:10                  
%FLAG TITLE                                                                     
%FORMAT(20a4)                                                                   
default_name                                                                    
%FLAG POINTERS                                                                  
%FORMAT(10I8)                                                                   
   33582      19   31714    1852    4022    2505    8139    7890       0       0
   59724   10270    1852    2505    7890      89     205     205      47       0
       0       0       0       0       0       0       0       1      36       0
       0
%FLAG ATOM_NAME                                                                 
%FORMAT(20a4)                                                                   
N   H1  H2  H3  CA  HA  CB  HB2 HB3 CG  HG2 HG3 SD  CE  HE1 HE2 HE3 C   O   N   
H   CA  HA  CB  HB2 HB3 OG  HG  C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 
CD  HD2 HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   O   N   H   CA  HA2 HA3 C   O   N   
H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  OE1 OE2 C   O   N   H   CA  HA  CB  HB2 
HB3 CG  HG2 HG3 CD  OE1 OE2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11
HD12HD13CD2 HD21HD22HD23C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 
CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 
C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 
HG21HG22HG23C   O   N   H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 HG21HG22HG23C   
O   N   CD  HD2 HD3 CG  HG2 HG3 CB  HB2 HB3 CA  HA  C   O   N   H   CA  HA  CB  
HB  CG2 HG21HG22HG23CG1 HG12HG13CD1 HD11HD12HD13C   O   N   H   CA  HA  CB  HB2 
HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C   O   N   H   CA  HA  CB  HB  CG1 
HG11HG12HG13CG2 HG21HG22HG23C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  
OE1 OE2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22
HD23C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 OD2 C   O   N   H   CA  HA2 HA3 
C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 OD2 C   O   N   H   CA  HA  CB  HB  
CG1 HG11HG12HG13CG2 HG21HG22HG23C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 ND2 
HD21HD22C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  ND1 
HD1 CE1 HE1 NE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 
HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 
CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB2 HB3 OG  HG  C   
O   N   H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 HG21HG22HG23C   O   N   H   CA  
HA  CB  HB2 HB3 OG  HG  C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  HA  CB  
HB2 HB3 CG  HG2 HG3 CD  OE1 OE2 C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  
HA  CB  HB2 HB3 CG  HG2 HG3 CD  OE1 OE2 C   O   N   H   CA  HA2 HA3 C   O   N   
H   CA  HA  CB  HB2 HB3 CG  OD1 OD2 C   O   N   H   CA  HA  CB  HB1 HB2 HB3 C   
O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   N   H   CA  HA  CB  
HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  OH  HH  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA2 
HA3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 NZ  
HZ1 HZ2 HZ3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21
HD22HD23C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   N   H   
CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C   O   N   H   CA  
HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   O   N   
H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   
H   CA  HA  CB  HB  CG2 HG21HG22HG23CG1 HG12HG13CD1 HD11HD12HD13C   O   N   H   
CA  HA  CB  HB2 HB3 OG  HG  C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 
HG1 C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   N   H   CA  
HA2 HA3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 
NZ  HZ1 HZ2 HZ3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 
HD21HD22HD23C   O   N   CD  HD2 HD3 CG  HG2 HG3 CB  HB2 HB3 CA  HA  C   O   N   
H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 HG21HG22HG23C   O   N   CD  HD2 HD3 CG  
HG2 HG3 CB  HB2 HB3 CA  HA  C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 NE1 
HE1 CE2 CZ2 HZ2 CH2 HH2 CZ3 HZ3 CE3 HE3 CD2 C   O   N   CD  HD2 HD3 CG  HG2 HG3 
CB  HB2 HB3 CA  HA  C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   
O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C   O   
N   H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 HG21HG22HG23C   O   N   H   CA  HA  
CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23
OG1 HG1 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22
HD23C   O   CD2 CE2 CZ  CG2 CD1 CE1 CB2 CA2 C2  H10 H12 H11 H9  H8  OH  O2  N2  
N3  C1  CA3 CA1 H13 H14 C3  O3  N1  H1  H2  CB1 CG1 H5  H6  H7  H3  OG1 H4  N   
H   CA  HA  CB  HB  CG1 HG11HG12HG13CG2 HG21HG22HG23C   O   N   H   CA  HA  CB  
HB2 HB3 CG  HG2 HG3 CD  OE1 NE2 HE21HE22C   O   N   H   CA  HA  CB  HB2 HB3 SG  
HG  C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 
HD2 C   O   N   H   CA  HA  CB  HB2 HB3 OG  HG  C   O   N   H   CA  HA  CB  HB2 
HB3 CG  HG2 HG3 CD  HD2 HD3 NE  HE  CZ  NH1 HH11HH12NH2 HH21HH22C   O   N   H   
CA  HA  CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  OH  HH  CE2 HE2 CD2 HD2 C   O   N   
CD  HD2 HD3 CG  HG2 HG3 CB  HB2 HB3 CA  HA  C   O   N   H   CA  HA  CB  HB2 HB3 
CG  OD1 OD2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  ND1 HD1 CE1 HE1 NE2 CD2 HD2 
C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 SD  CE  HE1 HE2 HE3 C   O   N   
H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   
O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 NE  HE  CZ  NH1 HH11HH12
NH2 HH21HH22C   O   N   H   CA  HA  CB  HB2 HB3 CG  ND1 CE1 HE1 NE2 HE2 CD2 HD2 
C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 OD2 C   O   N   H   CA  HA  CB  HB2 
HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB2 
HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB2 
HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   O   N   H   CA  HA  
CB  HB2 HB3 OG  HG  C   O   N   H   CA  HA  CB  HB1 HB2 HB3 C   O   N   H   CA  
HA  CB  HB2 HB3 CG  HG2 HG3 SD  CE  HE1 HE2 HE3 C   O   N   CD  HD2 HD3 CG  HG2 
HG3 CB  HB2 HB3 CA  HA  C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  OE1 
OE2 C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 
CE1 HE1 CZ  OH  HH  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  CB  HB  CG1 HG11HG12
HG13CG2 HG21HG22HG23C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  OE1 NE2 
HE21HE22C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  OE1 OE2 C   O   N   
H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 NE  HE  CZ  NH1 HH11HH12NH2 HH21
HH22C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   N   H   CA  
HA  CB  HB  CG2 HG21HG22HG23CG1 HG12HG13CD1 HD11HD12HD13C   O   N   H   CA  HA  
CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  
CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  HZ  CE2 HE2 CD2 HD2 C   O   N   H   CA  HA  
CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 NZ  HZ1 HZ2 HZ3 C   O   N   H   
CA  HA  CB  HB2 HB3 CG  OD1 OD2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 OD2 
C   O   N   H   CA  HA2 HA3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  OD1 ND2 HD21
HD22C   O   N   H   CA  HA  CB  HB2 HB3 CG  CD1 HD1 CE1 HE1 CZ  OH  HH  CE2 HE2 
CD2 HD2 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG2 HG3 CD  HD2 HD3 CE  HE2 HE3 
NZ  HZ1 HZ2 HZ3 C   O   N   H   CA  HA  CB  HB  CG2 HG21HG22HG23OG1 HG1 C   O   

We never actually look at these files

Complex molecules and ligands requires parameterization and careful integration

Non-standard residues or ligands are not always included in standard force field parameter sets

These require additional parameterization to ensure proper interactions in the simulation

Example: GFP chromophore

After today, you should be able to

Outline the process of energy minimization and its significance.

Energy minimization is necessary before running molecular dynamics simulations

Energy minimization adjusts the initial structure to remove unfavorable atom positions and steric clashes that could cause instability during simulations

Without minimization, high-energy configurations may lead to unrealistic results or early failures in the molecular dynamics simulation

Energy minimization removes steric clashes and optimizes the initial geometry

Steric clashes occur when atoms are too close together, resulting in excessively high energy

Energy minimization gently adjusts the structure to lower the system’s energy

Unphysical

Physical

Before the next class, you should

  • Work on A05

Lecture 14:
Molecular system representations

Today

Thursday

Lecture 15:
Atomistic insights