19BIO201

Intelligence in Biological Systems - 3

Linear Neighbourhood Propagation method for predicting Long Non-Coding RNA-Protein Interactions

19BIO201

Intelligence in Biological Systems - 3

Aadharsh Aadhithya                  -         CB.EN.U4AIE20001
Anirudh Edpuganti                     -         CB.EN.U4AIE20005

Madhav Kishor                            -         CB.EN.U4AIE20033

Onteddu Chaitanya Reddy        -         CB.EN.U4AIE20045
Pillalamarri Akshaya                   -         CB.EN.U4AIE20049

Team-1

Non Coding DNA

Non Coding DNA

  • Non coding regions in dna are regions which do not code for proteins
  • only about 1% of DNA is estimated to be coding
  • However, It is of importance as it contains sequences that can act as regulatory elements

Non Coding DNA

  • Promoters
  • Enhancers
  • Silencers
  • Insulators

Non Coding DNA

Other Regions Code for RNA's like tRNA's and rRNA's

Non Coding DNA

  • Recent studies have proved that only one-fifth of transcription across the human genome is associated with protein-coding genes
  • Expressed in Tissue Specific contexts
  • long non-coding RNAs, which consist of more than 200 nucleotides, have gained wide attention because of their large number and their essential functions

Non Coding DNA

  • lncRNAs play an important role in regulating many biological processes, such as transcription, splicing and gene expression
  • Because of the high cost of experimental identification for lncRNA– protein interactions, a great number of computational methods have been developed.

Features

Features

l_1
l_1
l_2
l_3
.
.
.
l_n
p_1
p_2
p_m
.
.
.
e_{ij} = 1 \,\, p_i \text{ intereacts with } l_j

Features

l_1
l_1
l_2
l_3
.
.
.
l_n
p_1
p_2
p_m
.
.
.

Interaction 

Profile of l1

Interaction 

Profile of p1

Expression Profiles

l_i
1 , 2 , \cdots 24

24 Types of Human Tissues

Expression Profile vector

l_i \,\,\,\, ACGAATCGAAGT
A\%
C\%
G\%
T\%
AA\%
AG\%
\cdots
1,2,3 \cdots 20

Long Noncoding RNA Sequence Composition

Features for Proteins

  • Interaction Profile
  • Composition,Transition,Destruction Vectors
  • Interaction Profile
  • Expression Profile
  • Sequence Composition
  • Interaction Profile
  • CTD

lncRNA

Proteins

Linear Neighbourhood Similarity

Linear Neighbourhood Similarity

  • Local Neighbourhood of a manifold in feature space can have a linear approximation
  • Hence, A datapoint can be a linear combination of its neighbouring points

Linear Neighbourhood Similarity

w_{ij}

Reconstructive

Contribution of j

X_i
X_{i1}
X_{i2}
X_{ij}
w_{ij}

Linear Neighbourhood Similarity

X_{i2}
\sum_{j\in N(X_i)} w_{ij} =1
X_i
X_{i1}
X_{i2}
X_{ij}
w_{ij}

Linear Neighbourhood Similarity

X_i
X_{i1}
X_{i2}
X_{ij}
w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
| X_i - X_{ij}|^2

Linear Neighbourhood Similarity

X_i
X_{i1}
X_{i2}
X_{ij}
w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
| X_i - \sum w_{ij}X_{ij}|^2

Linear Neighbourhood Similarity

X_i
\sum_{j\in N(X_i)} w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
| X_i - \sum w_{ij}X_{ij}|^2

Linear Neighbourhood Similarity

X_i
\sum_{j\in N(X_i)} w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
min | X_i - \sum w_{ij}X_{ij}|^2

Linear Neighbourhood Similarity

X_i
\sum_{j\in N(X_i)} w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
min | X_i - \sum w_{ij}X_{ij}|^2
w_{ij} \geq 0

The number of nearest neighbors is a hyperparameter

In Matrix Representation,

G_{jk} = (X_i - X_j)^T (X_i - X_k)
min | X_i - \sum w_{ij}X_{ij}|^2 = \sum_{X_j,X_k \in N(X_i)} w_{ij} G_{j,k} w_{ik}

In Matrix Representation,

G_{jk} = (X_i - X_j)^T (X_i - X_k)
\epsilon = \sum_{X_j,X_k \in N(X_i)} w_{ij} G_{j,k} w_{ik}

To avoid over-fitting, penalty parameter can be used 

\epsilon = \sum_{X_j,X_k \in N(X_i)} w_{ij} G_{j,k} w_{ik} + \lambda |w_i|^2
w_i = \begin{bmatrix} w_{i1} & w_{i2} \cdots w_{ik} \end{bmatrix}^T
w_i = \begin{bmatrix} w_{i1} & w_{i2} \cdots w_{ik} \end{bmatrix}^T
w_{ij} \rightarrow \text{Similarity Between i and j}

Linear Neighbourhood Similarity

X_i
X_{i1}
X_{i2}
X_{ij}
w_{ij}
\sum_{j\in N(X_i)} w_{ij} =1
| X_i - \sum w_{ij}X_{ij}|^2

Label Propagation

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

If a protein Pi Is known to be Interacting

with lncRNA l1

p_i

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

If a protein Pi Is known to be Interacting

with lncRNA l1

p_i

if l2 is Similar to l1 

It is more likely that Pi interacts with l2 as well

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

That depends on the Similarity Score

p_i

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

That depends on the Similarity Score

p_i

How to use what we know, to infer about what we don't?

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}
p_i

How to use what we know, to infer about what we don't?

What do we know?

\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}

Protein pi interacts with 

lncRNA l1

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}
p_i

How to use what we know, to infer about what we don't?

What can we Infer?

\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}
\begin{bmatrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33} \\ \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

1 \begin{bmatrix} w_{11} \\ w_{21} \\ w_{31} \end{bmatrix} + 0 \begin{bmatrix} w_{12} \\ w_{22} \\ w_{32} \end{bmatrix} +0 \begin{bmatrix} w_{13} \\ w_{23} \\ w_{33} \end{bmatrix}

Select Information

About lncRNAs similar to 

l1

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

1 \begin{bmatrix} w_{11} \\ w_{21} \\ w_{31} \end{bmatrix} + 0 \begin{bmatrix} w_{12} \\ w_{22} \\ w_{32} \end{bmatrix} +0 \begin{bmatrix} w_{13} \\ w_{23} \\ w_{33} \end{bmatrix}

Select Information

About lncRNA similar to 

l1

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

Information About 

Closeness of l1 to l2 is 

incorporated

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

Information About 

Closeness of l1 to l2 is 

incorporated

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}

However, Our Initial Knowledge that Pi intreacts with l1 is lost

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}

However, Our Initial Knowledge that Pi intreacts with l1 is lost

We can Impart this knowledge by 

Adding the Ground Truth

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}

However, Our Initial Knowledge that Pi intreacts with l1 is lost

We can Impart this knowledge by 

Adding the Ground Truth

+ \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

What we Infer?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}
\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}
+
\alpha

Can be used to select

between inference and ground

truth

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

What we Infer?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}
\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}
+
\alpha
1-\alpha

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

What we Infer?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0 \\ 0.8 \\ 0.2 \end{bmatrix}

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}
\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}
+
\alpha
1-\alpha
Y^{t+1} = \alpha (WY^t) + (1-\alpha)Y^0

Label Propagation

1
2
3
w_{13}
w_{31}=0.2
w_{21}=0.8
w_{12}
w_{32}
w_{23}

How to use what we know, to infer about what we don't?

What we Infer?

Protein pi interacts with 

lncRNA l2

If l2 is similar l1, more likely

it should interact with pi

\begin{bmatrix} 1 \\ 1\\ 0 \end{bmatrix}
Y^{t+1} = \alpha (WY^t) + (1-\alpha)Y^0
Y = (1-\alpha)(I-\alpha W)^{-1} Y^0

In Matrix Form

\lim_{t\rightarrow \inf} Y^t = Y

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12} = 0.8
w_{32} = 0.2
w_{23}
p_i

How to use what we know, to infer about what we don't?

What do we know?

Protein pi interacts with 

lncRNA l2

0 \begin{bmatrix} w_{11} \\ w_{21} \\ w_{31} \end{bmatrix} +1 \begin{bmatrix} w_{12} \\ w_{22} \\ w_{32} \end{bmatrix} +0 \begin{bmatrix} w_{13} \\ w_{23} \\ w_{33} \end{bmatrix}

Select Information

About lncRNA similar to 

l2

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12} = 0.8
w_{32} = 0.2
w_{23}
p_i

How to use what we know, to infer about what we don't?

What do we know?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0.8 \\ 0 \\ 0.2 \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12} = 0.8
w_{32} = 0.2
w_{23}
p_i

How to use what we know, to infer about what we don't?

What do we know?

Protein pi interacts with 

lncRNA l2

\begin{bmatrix} 0.8 \\ 0 \\ 0.2 \end{bmatrix}

Label Propagation

1
2
3
w_{13}
w_{31}
w_{21}
w_{12}
w_{32}
w_{23}

If a protein Pi Is known to be Interacting

with lncRNA l1

if l2 is Similar to l1 

It is more likely that Pi interacts with l2 as well

Z_{ij} = \sum_k \omega_k Y^k_{ij}
k \rightarrow \text{Number of Features}
l_i , p_j

Thank you sir!

 

References

Text

[1] Zhang, Wen; Qu, Qianlong; Zhang, Yunqiu; Wang, Wei (2017).

The linear neighborhood propagation method for

predicting long non-coding RNA–protein interactions. Neurocomputing

, (), S0925231217313899–. doi:10.1016/j.neucom.2017.07.065

IBS3_Proj

By Incredeble us

IBS3_Proj

  • 38