Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP-Complete
BERGER AND LEIGHTON, 1998
Overview
- Introduction
- Proof
- Next Steps
Introduction
- What are proteins ?
- can be conceptualised as
strings of amino-acids.
- there are 23 amino-acids
that are proteinogenic
Introduction
- H
- P
Introduction
- Protein Folding
Introduction
- Protein Folding
-
So in this model, folding of a protein
sequence is defined as a self-avoiding walk
in a 3D lattice.
- The HP model abstracts the hydrophobic
interaction in protein folding by labeling
the amino acids as:
- hydrophobic (H)
- hydrophilic (P)
Proof
The main result is that HP STRING-FOLD is NP-complete when G is Z3. The proof follows by showing that the following folding problem is NP-hard.
- HP STRING-FOLD
Instance: A finite sequence S over the alphabet {H, P},
an integer m, and a graph G.
Question: Is there a fold of S in G where the number of H-H bonds is at least m?
Proof
The proof that PERFECT HP STRING-FOLD is NP-hard involves a transformation from the (strongly) NP-complete problem of BIN PACKING ...
- PERFECT HP STRING-FOLD
Instance: An integer n and a finite sequence S over the alphabet {H, P} which contains n^3 H's.
Question: Is there a fold of S in Z3 for which the H's are perfectly packed into an n x n x n cube?
Proof
To simplify matters, the paper uses a variation of BIN PACKING which is more easily shown to be strongly NP-complete.
- BIN PACKING
Instance: A finite set U of items, a size s(u) ∈ Z+ for each u ∈ U, a positive integer bin capacity B, and a positive integer K.
Question: Is there a partition of U into disjoint sets U1, U2, ... ,Uk such that the sum of the sizes of the items in each U¡ is B or less?
Proof
- MODIFIED BIN PACKING
Instance: A finite set U of items, a size s(u) that is a positive even integer for each u ∈ U, a positive integer bin capacity B, and a positive integer K, where ∑ u ∈ U s(u) = BK.
Question: Is there a partition of U into disjoint sets U1, U2, ... ,Uk such that the sum of the sizes of the items in each U¡ is precisely B ?
Proof
- INTUITION
- if the H's in a String S are to be perfectly packing into a cub, then any H in S which is adjacent to a P in S must be packed on to the surface of the cube.
- This is because any H which is mapped to an internal node of the cube must have only H's for neighbours, because adjacent items in S must be mapped to adjacent locations in the cube.
Proof
- REDUCTION
- Consider any MODIFIED BIN PACKING problem β. Let :
q = 2 max K, [ β^(1/3) ] )
T = (2q - 1)(n - 2)
n = q (2q + 1) + 2
Proof
- REDUCTION
- Arrows indicate the order in which the first nine edges are filled:
Proof
- REDUCTION
- The back-face is filled with
( n - 2 )^(2) / 2 P H H P's
Proof
- REDUCTION
- The n x n x n cube with paths to indicate the order in which the left, bottom, right, and top faces are filled:
Proof
- REDUCTION
- The front face of the cube partitioned into q "bins." The top and bottom edges of the face have previously been filled in.
Proof
- Any single H surrounded by P's in S mapped to β (eg. P H P) must be in a node that is on the edge of the cube.
- Any H that is adjacent to a P in S mapped to β (eg. H P) must be embedded on a face (including its edges) of the cube.
- If a pair of H's are separated by two P's S mapped to β (eg. H P P H), then the H's are embedded in adjacent positions on a face of the cube.
Next Steps
- Protein folding is far more complex in reality and new directions are being explored using metaheuristic algorithms:
- An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem – 2005
- Using Modified Bat Algorithm to Solve Toy Model of Protein Folding – 2014
HP Protein Folding is NP-Complete
By Jonathon Hope
HP Protein Folding is NP-Complete
- 1,280