ML for Biological Applications

Luisa Cutillo

Lecturer at the University Parthenope of Naples, IT

Visiting academic at the Dep. of Computer Science, Sheffield, UK

Twitter: @luisa_cutillo78

Why is Biology a growing application area of Machine Learning?

  • Biology is an evidence driven discipline
  • Large quantities of highly heterogeneous data
  • Data kinds: numerical, text, image, sound, and video

 Machine Learning and Bioinformatics

 Bioinformatics applies mathematics, computer science, and statistics techniques to understand and organises the information associated with the biological data

Main Limitation:

Bioinformatics is sometimes driven by the need to make most of small sample size!

Why ML in Bioinformatics?

size and number of available biological datasets have skyrocketed: Bioinformatics screams out for ML!

Applications to big biological data

  • Genomics
  • Proteomics
  • Microarray and RNA-Sequencing data
  • System Biology
  • Text Mining

ML in Genomics

Aim:

study the complete DNA of an organism

 

ML:

  1. ​Automatic Gene Prediction: determine the location of protein-encoding genes within a given DNA sequence >Extrinsic and intrinsic searches based on HMM
  2. Multiple Sequences Alignment: NP hard optimization problems, Consensus and iterative methods, HMM, genetic algorithms and simulated annealing. summary paper link.
  3. Detect and visualize genome rearrangements. Nature 2015.

 

ML in Proteomics

Protein folds into a 3-dim structure

 

Aim:

Protein secondary structure prediction


ML: 

amino acids of a protein sequence are classified in helix, sheet, or coil (structural classes) using DeepCNF (deep convolutional neural fields): it relies on artificial neural networks to achieve high accuracy (~ 84%) [paper link]

 

ML in Microarray and RNA-Seq data analysis

 

Aim:

monitoring the expression of genes within a genome (microarray)

reveal the presence and quantity of RNA in a biological sample (RNA-seq)

ML:

radial basis function networks, deep learning, Bayesian classification, decision trees, and random forest

 

ML in Microarray and RNA-Seq data analysis

Microarrays VS RNA-SEQ

ML in Microarray and RNA-Seq data analysis

Bulk VS Single Cell RNA-SEQ

Single Cell RNA-seq

Bulk RNA-seq

ML in System Biology

Aim: study of complex interactions in biological systems of simple biological components (e.g. DNA, RNA, proteins, and metabolites) in a system. 

 

ML: aid in the modelling of these complex interactions  in domains such as genetic networks, signal transduction networks, and metabolic pathways. Methods: Probabilistic graphical models, transcription factor binding sites using Markov chain optimization, Genetic algorithms

ML in Text Mining

Aim: knowledge extraction -> searching through and compiling all the relevant available information on a given topic across all sources.

 

ML: knowledge extraction task using techniques such as natural language processing and Text Nailing 

 

Examples of applications: automatic annotation of the function of genes and proteins, determination of the subcellular localization of a protein, analysis of DNA-expression arrays, large-scale protein interaction analysis, and molecule interaction analysis.

ML in Bioinformatics

Example of pipelines:

Bioinformatics Awareness Days

link to BAD days

Luisa Cutillo and Research Software Engineer (RSE) team, University of Sheffield

Discussion

  • What is your field of interest?
  • Can you share your experience?
  • Questions or doubts?

WIML Biological Applications Round Table

By Luisa Cutillo

WIML Biological Applications Round Table

Overview of Machine Learning for Biological Application

  • 1,061