21AIE212 

RNA Secondary Structure using Dynamic Programming

Design and Analysis of Algortihms

21AIE212 

Anirudh Edpuganti                     -         CB.EN.U4AIE20005

Onteddu Chaitanya Reddy        -         CB.EN.U4AIE20045
Pillalamarri Akshaya                   -         CB.EN.U4AIE20049

Pingali Sathvika                           -         CB.EN.U4AIE20050

Team-2

RNA Secondary Structure using Dynamic Programming

21AIE212 

Contents

Contents

  • RNA Secondary Structure

Contents

  • RNA Secondary Structure
  • Conditions

Contents

  • RNA Secondary Structure
  • Conditions
  • Formulation

Contents

  • RNA Secondary Structure
  • Conditions
  • Formulation
  • Implementation

Contents

  • RNA Secondary Structure
  • Conditions
  • Formulation
  • Implementation
  • Time Complexity

21AIE212 

RNA Secondary Structure

21AIE212 

Before that

Double-Stranded DNA

21AIE212 

Before that

Double-Stranded DNA

Complimentary Base-Pairing

21AIE212 

Double-Stranded DNA

Complimentary Base-Pairing

21AIE212 

Double-Stranded DNA

Complimentary Base-Pairing

A

T

C

G

21AIE212 

Single-Stranded RNA

21AIE212 

Single-Stranded RNA

Second strand

21AIE212 

Single-Stranded RNA

Second strand

Base Pairing ???

21AIE212 

Single-Stranded RNA

Second strand

Base Pairing 

Itself

21AIE212 

Single-Stranded RNA

Second strand

Base Pairing 

Itself

Formation of RNA Secondary Structure

21AIE212 

RNA Secondary Structure

RNA Secondary Structure

Let us consider a sample RNA sequence

RNA Secondary Structure

RNA = ACAUGAUGGCCAUGU

Now our sequence folds as 

21AIE212 

Conditions

Conditions

\text{Let } B = b_1b_2...b_n

Conditions

\text{Let } B = b_1b_2...b_n
b_i \in \{A,C,G,U\}

Conditions

Now we can assume our sequence to be a set of pairs

\text{S = \{(i,j)\} where i , j} \in \{1,2.....n\}

Conditions

No sharp turns 

1

Conditions

No sharp turns 

\text{if (i,j)} \in \text{S} \, \, \, \text{then} \, \, \, i < j-4

Conditions

Complementary Base-pairs

2

Conditions

Complementary Base pairs

A

U

C

G

Conditions

Single Pair

3

Conditions

Single Pair

\text{No base appears in more than one pair}

Conditions

Non Crossing Condition

4

Conditions

Non Crossing Condition

\text{If (i,j) and (k,l)} \in S
\text{Then we cannot have} \,\, \,\, \, i < k < j < l

Conditions

What is the problem ?

Conditions

What is the problem ?

Molecule stability

Conditions

What is the problem ?

Molecule stability

# of base pairs

\propto

Conditions

What is the problem ?

# of base pairs

Conditions

What is the problem ?

Algorithm

Conditions

What is the problem ?

Algorithm

B

Input

Conditions

What is the problem ?

Algorithm

B

Input

Max. # base pairs

Output

21AIE212 

Formulation

Formulation

OPT(j)

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j

Recall

Condition 1

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j

No-sharp turns

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j
OPT(j) = 0 \text{ for } j \leq 5

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j

Final Solution

OPT(n)

Formulation

OPT(j)
\text{Max. \# of base pairs in } b_1b_2...b_j

Final Solution

OPT(n)
\text{ for } B = b_1b_2...b_n

Formulation

Try for a recurrence

Formulation

OPT(j)

Subproblems

Formulation

OPT(j)
(b_1b_2...b_j)

Formulation

OPT(j)
(b_1b_2...b_j)
b_j

Formulation

OPT(j)
(b_1b_2...b_j)
b_j
b_j

Formulation

OPT(j)
(b_1b_2...b_j)
b_j
b_j
OPT(j-1)

Formulation

OPT(j)
(b_1b_2...b_j)
b_j
b_j
OPT(j-1)

Formulation

b_t
b_j
b_1...b_{t-1}

Formulation

b_t
b_j
b_1...b_{t-1}
b_{t+1}...b_{j-1}

Formulation

b_t
b_j
OPT(t-1)
b_{t+1}...b_{j-1}

Formulation

b_t
b_j
OPT(t-1)
???

Formulation

b_t
b_j
OPT(t-1)
???

1 variable

Formulation

b_t
b_j
OPT(t-1)
???

2 variables

1 variable

Formulation

b_t
b_j
OPT(t-1)
???
B = b_ib_{i+1}...b_j

Formulation

b_t
b_j
OPT(t-1)
???
B = b_ib_{i+1}...b_j
i \leq j
OPT(i,j) = 0

Formulation

b_t
b_j
OPT(t-1)
???
B = b_ib_{i+1}...b_j
i \leq j
OPT(i,j) = 0
i \geq j-4

Formulation

b_t
b_j
b_i...b_{t-1}
b_{t+1}...b_{j-1}
B = b_ib_{i+1}...b_j
i \leq j

Formulation

b_t
b_j
OPT(i,t-1)
b_{t+1}...b_{j-1}

Formulation

b_t
b_j
OPT(i,t-1)
OPT(t+1 , j-1)

Formulation

b_t
b_j
OPT(i,t-1)
OPT(t+1 , j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1

Formulation

b_t
b_j
OPT(i,t-1)
OPT(t+1 , j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
t
i \leq t < j-4

Formulation

OPT(j)
(b_1b_2...b_j)
b_j
b_j
OPT(j-1)

Formulation

OPT(i,j)
b_j
b_j
OPT(i,j-1)

Formulation

OPT(i,j)
b_j
b_j
OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max

Formulation

OPT(i,j)
b_j
b_j
OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
\max
[
]
,

Formulation

OPT(i,j)
OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
\max
[
]
,

Formulation

OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
\max
[
]
,
\text{Let } k = j-i

Formulation

OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
\max
[
]
,
\text{Let } k = j-i
Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)
OPT(i,j) =

Formulation

OPT(i,j-1)
OPT(i,t-1)
OPT(t+1 , j-1)
+
+
1
(
)
\max
\max
[
]
,

Text

Attempt for another formulation

Formulation

Text

Attempt for another formulation

Formulation

Text

i,j pair

Formulation

Text

i,j pair

OPT(i+1 , j-1) + Pair(i,j)

Formulation

Text

i unpaired

Formulation

Text

i unpaired

OPT(i+1 , j)

Formulation

Text

j unpaired

Formulation

Text

j unpaired

OPT(i , j-1)

Formulation

Text

non crossing condition

Formulation

Text

non crossing condition

\max (OPT(i,k) + OPT(k+1 , j))

Formulation

Text

Formulation

Text

OPT(i+1 , j-1) + Pair(i,j)
OPT(i+1 , j)
OPT(i , j-1)
\max (OPT(i,k) + OPT(k+1 , j))

Formulation

Text

OPT(i+1 , j-1) + Pair(i,j)
OPT(i+1 , j)
OPT(i , j-1)
\max (OPT(i,k) + OPT(k+1 , j))
\max
(
)
,
,
,

Formulation

Text

OPT(i+1 , j-1) + Pair(i,j)
OPT(i+1 , j)
OPT(i , j-1)
\max (OPT(i,k) + OPT(k+1 , j))
\max
(
)
,
,
,
OPT(i,j) =

Formulation

Text

OPT(i+1 , j-1) + Pair(i,j)
OPT(i+1 , j)
OPT(i , j-1)
\max (OPT(i,k) + OPT(k+1 , j))
\max
(
)
,
,
,
Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)
OPT(i,j) =

21AIE212 

Time Complexity

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)

n

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)

n

n

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)

n

n

n

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)

n

n

n

O(          )

Time Complexity

Initialize OPT(i,j) = 0 whenever i >= j-4
for k = 5,6,..,n-1
   for i = 1,2,...,n-k
      Set j = i + k
      Compute OPT(i,j)
   end
end
Return OPT(1,n)
O(n^3)

21AIE212 

Implementation

Implementation

1st Approach

''' The function find_index takes in the matrix 
and returns us the index corresponding to i and j
if i or j is not found in the matrix then it returns
the index of last row and 1st column element which 
corresponds to zero'''

def find_index(opt,i,j):
    if i==0 or j==0:
        return np.shape(opt)[0]-1,0
    else:
        i_ = np.argwhere(opt[:,0]==i)
        j_ = np.argwhere(opt[np.shape(opt)[0]-1,:]==j)
        try:
            return i_[0][0],j_[0][0]
        except IndexError:
            return np.shape(opt)[0]-1,0

Implementation

def RNA(sequence):
    n = len(sequence)
    if n%2 == 0:
        opt = np.zeros((int((n/2)+1),int(((n/2)+1))))
    else:
        opt = np.zeros((int((n-1)/2)+1,int((n-1)/2)+1))
    
    opt[0,0] = np.shape(opt)[0] - 1
    opt[np.shape(opt)[0]-1,1] = np.shape(opt)[1] + 1

    for a in range(1,np.shape(opt)[0]):
        opt[a,0] = opt[a-1,0] - 1
    for b in range(2,np.shape(opt)[1]):
        opt[np.shape(opt)[0]-1,b] = opt[np.shape(opt)[0]-1,b-1] + 1

    
    for k in range(5,n):
        for i in range(1,n-k+1):
            j = i + k
            second = [1+opt[find_index(opt,i,t-1)]+opt[find_index(opt,t+1,j-1)] for t in range(i,j-4)]
            second_max = max(second,default=0)
            if find_index(opt,i,j)==((np.shape(opt)[0]-1),0):
                opt[find_index(opt,i,j)] = 0
            else:
                opt[find_index(opt,i,j)] = max(opt[find_index(opt,i,j-1)],second_max)
        

    return opt[find_index(opt,1,n)]

1st Approach

Implementation

def init_matrix(seq):

	M = len(seq)

	matrix = np.empty([M, M])
	matrix[:] = np.NAN


	matrix[range(M), range(M)] = 0
	matrix[range(1, len(seq)), range(len(seq) - 1)] = 0

	return matrix

2nd Approach

Implementation

def Pair(pair):

    pairs = {"A": "U", "U": "A", "G": "C", "C": "G"} 
	
    if pair in pairs.items():
	    return True

    return False

2nd Approach

Implementation

def fill(OPT, sequence):
	"""
	Fillint the matrix with the given conditions
	"""
	for k in range(1, len(sequence)):
		for i in range(len(sequence) - k):
			j = i + k

			if j - i >= 4:
				i_unpaired = OPT[i + 1][j] # i unpaired
				j_unpaired = OPT[i][j - 1] # j unpaired
				ij_pair = OPT[i + 1][j - 1]  + Pair((sequence[i], sequence[j])) # i,j paired
				non_crossing = max([OPT[i][t] + OPT[t + 1][j] for t in range(i, j)]) # non crossing condition

				OPT[i][j] = max(i_unpaired , j_unpaired, ij_pair, non_crossing ) # max of all
			
			else:
				OPT[i][j] = 0

	return OPT

2nd Approach

Implementation

sequence = "ACAUGAUGGCCAUGU"

initial_matrix = init_matrix(sequence)
filled_matrix = fill(initial_matrix, sequence)

names = [_ for _ in sequence]
df = pd.DataFrame(filled_matrix, index = names, columns = names)	
print(df)
print(f"Max # of base pairs : {filled_matrix[1,:][-1]}")

2nd Approach

Implementation

Output

ACAUGAUGGCCAUGU

Implementation

Output

ACAUGAUGGCCAUGU

Implementation

Output

ACAUAAUGGCCAUGU

Implementation

Output

ACAUAAUGGCCAUGU

A

Thank you Sir

DAA

By Incredeble us

DAA

  • 35