Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP-Complete

BERGER AND LEIGHTON, 1998

Overview

  • Introduction
  • Proof
  • Next Steps

Introduction

  • What are proteins ?
  • can be conceptualised as
    strings of amino-acids.
  • there are 23 amino-acids
    that are proteinogenic

Introduction

  • H
  • P

Introduction

  • Protein Folding

Introduction

  • Protein Folding
  • So in this model, folding of a protein
    sequence is defined as a self-avoiding walk
     in a 3D lattice.
  • The HP model abstracts the hydrophobic
    interaction in protein folding by labeling
    the amino acids as:
  • hydrophobic (H)

 

  • hydrophilic (P)

Proof

  

The main result is that HP STRING-FOLD is NP-complete when G is Z3. The proof follows by showing that the following folding problem is NP-hard.

  • HP STRING-FOLD

Instance: A finite sequence S over the alphabet {H, P},
an integer m, and a graph G.

Question: Is there a fold of S in G where the number of H-H bonds is at least m?

Proof

  

The proof that PERFECT HP STRING-FOLD is NP-hard involves a transformation from the (strongly) NP-complete problem of BIN PACKING ...

  • PERFECT HP STRING-FOLD

Instance: An integer n and a finite sequence S over the alphabet {H, P} which contains n^3 H's.

Question: Is there a fold of S in Z3 for which the H's are perfectly packed into an n x n x n cube?

Proof

To simplify matters, the paper uses a variation of BIN PACKING which is more easily shown to be strongly NP-complete.

  • BIN PACKING

Instance: A finite set U of items, a size s(u)  Z+ for each u  U, a positive integer bin capacity B, and a positive integer K

Question: Is there a partition of U into disjoint sets U1, U2, ... ,Uk such that the sum of the sizes of the items in each is B or less? 

Proof

  • MODIFIED BIN PACKING

Instance: A finite set U of items, a size s(u) that is a positive even integer for each u ∈ U, a positive integer bin capacity B, and a positive integer K, where  ∈ s(u) = BK

Question: Is there a partition of U into disjoint sets U1, U2, ... ,Uk such that the sum of the sizes of the items in each  is precisely B ?

Proof

  • INTUITION
  • if the H's in a String  S are to be perfectly packing into a cub, then any H in S which is adjacent to a in must be packed on to the surface of the cube. 
  • This is because any H which is mapped to an internal node of the cube must have only H's for neighbours, because adjacent items in S must be mapped to adjacent locations in the cube. 

Proof

  • REDUCTION
  • Consider any MODIFIED BIN PACKING problem β. Let :

q = 2 max K, [ β^(1/3) ] )

T = (2q - 1)(n - 2)

n = q (2q + 1) + 2 

Proof

  • REDUCTION
  • Arrows indicate the order in which the first nine edges are filled:

Proof

  • REDUCTION
  • ​The back-face is filled with
    ( n - 2 )^(2) / 2  P H H P's

Proof

  • REDUCTION
  • ​The n x n x n cube with paths to indicate the order in which the left, bottom, right, and top faces are filled:

Proof

  • REDUCTION
  • The front face of the cube partitioned into q "bins." The top and bottom edges of the face have previously been filled in.

Proof

  • Any single surrounded by P's in S mapped to β  (eg. P H P) must be in a node that is on the edge of the cube.
  • Any that is adjacent to a in S mapped to β  (eg. H P) must be embedded on a face (including its edges) of the cube.
  • If a pair of H's are separated by two P's S mapped to β  (eg. H P P H), then the H's are embedded in adjacent positions on a face of the cube.

Next Steps

  •  Protein folding is far more complex in reality and new directions are being explored using metaheuristic algorithms:
  • An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem – 2005
  • Using Modified Bat Algorithm to Solve Toy Model of Protein Folding – 2014

HP Protein Folding is NP-Complete

By Jonathon Hope

HP Protein Folding is NP-Complete

  • 1,280