EVOLVE

 



Single and Multi-Objective Genetic Algorithm for Molecular Design





Nicholas Browning




 Outline 

Introduction to Evolutionary Algorithms (EAs)

Introduction to Single Objective Genetic Algorithms (SOGAs)

  • Genetic Representation
  • Algorithm Overview
  • Selection

  • Crossover

  • Mutation

Introduction to Multi-Objective Genetic Algorithms (MOGAs)

  • Algorithm Overview

Applications

Future Work








 Evolutionary Algorithms 


Population-based meta-heuristic optimization algorithms (AI)

Uses mechanisms inspired by biological evolution, i.e

  • Reproduction
  • Mutation
  • Recombination
  • Natural Selection


Examples: Swarm Particle Optimization,
Genetic Algorithm, Gaussian Adaptation

 EA Application 

Travelling Salesman Problem










Good combinatorial / permutational problem solvers!

 BiologY 












 BIOLOGY 

 Biology 












 BIOLOGY 




"One general  law, leading to the advancement of all organic beings, namely, multiply, vary, let the strongest live and the weakest die."
-Charles Darwin

 Single Objective Genetic Algorithm 


Population of parent and child candidate solutions

Each solution contains a "chromosome" which fully defines it in terms of the property to be optimized

3 stochastic operators used:
  • Crossover Operator - Reproduction, Recombination
  • Mutation Operator - Mutation 
  • Selection Operator - Natural Selection 


 Algorithm Overview 

 Natural Selection 



 SELECTION 


  • Operator designed to select the parents of the next generation of candidate solutions
  • Chance of being selected is proportional to fitness
  • Creates a "mating pool"
  • Essential operator to reach near-global minimum
  • Many operators available

 Roulette 


wheel spun [population size] times
Likely to introduce bias



 UNIVERSAL Stochastic 


  • Same as roulette wheel, but wheel only spun once.
  • Wheel divided into n equally spaced portions.
  • random starting position generated.
  • move around wheel in equidistant steps, sampling at every point visited
  • less bias


 tournament 


  • Individual solutions compete in duels to enter mating pool
  • Repeated until mating pool filled



  • User defines the number of individuals in each fight
  • Encourages diversity
  • May not give true representation of fitness ranking in mating pool

 Truncation 


  • Solutions in the population sorted according to fitness (or some performance criteria) 
  • allocate S copies to the top N/S individuals






  • Fast convergence
  • Problems with diversity


 Reproduction - Recombination 



 Crossover 


  • Solutions in mating pool perform pairwise recombination/reproduction
  • Essential operator to reach near-global minimum
  • Number of operators available
  • Recommended crossover probability = 50 - 70 %

 Single / two Point 


one point


two point

crossing points randomly selected

 Uniform 



Coin flipped on each gene

Probability of per-gene exchange chosen in input file

 SIMULATED BINARY 


  • Polynomial distribution used.
  • "Width" of envelope defines how close children are to parents
  • "eta" set in input file



 Self-adapting simulated binary 


  • Same as simulated binary crossover
  • eta modified each generation depending on children performance
  • Found to be very good in a number of complex fitness landscapes
  • WIP!

 MUTATION 


  • Operator to produce genetic mutations in chromosome to modify solution
  • Necessary to escape local minima
  • Affects convergence
  • Recommended (total) probability = 20 - 50 %
  • Number of operators available

 SELECTIVE 


Randomly select one of the genes and replace it with a random variables sampled uniformly between specified gene ranges



 GENEWISE


To each of the gene add a value sampled from a normal distribution using a user specified standard deviation



Polynomial


Based on simulated binary crossover


 MULTI-OBJECTIVE GENETIC ALGORITHM 


  • With more than one objective, two solutions may not be better than one another
  • Generates "pareto fronts" of solutions, with each front having an associated "rank"

 MOGA - Pareto FRONT 


 MOGA ALGORITHM 


  • problems arise when you 
    decide which solutions should
    reproduce, and which should
     enter the next generation 
  • requires different approach


  •  MOGA ALGORITHM 


    Solution: use non-dominated sorting and crowding comparison 


    Procedure:

    • sort the organisms into pareto fronts
    •   crowding distance is a metric to determine  how crowded each solution is in a particular front
    • Rank dominates followed by crowding comparison
    • Algorithm prefers less crowded solutions to ensure diversity

     Comparison against Existing Code 


    • Unified and generalized molecular editing through OpenBabel
    • Vastly improved computation time (3-7 days vs 5 months, project dependent) 
    • Modular, bounded fitness functions
    • Greater control over mutation and crossover operators
    • A number of bug fixes
    • Better memory management
    • Now possible to link in to other optimization techniques, e.g swarm-particle optimization, in "one-pot"

     TEST CASE/APPLICATIONS 


    • Optimize a small poly-ALA protein (20AAs) against a complete rotamer library (100 + structures) for all 20 amino acids at different dielectrics

    ACE - AAAAAA - XXXXXXXX - AAAAAA - NME

    • Fitness Function - Classical MM Minimized Energy


     Results 







     RESults 


    • Lowest energy structure found thus far: pure poly-TRP protein
    • Fitness: -37.6 kcal/mol
    • Sequence: W04 W04 W04 W03 W04 W04 W05 W04
    • Many other low energy structures found



     Future APPLICATIONS / DEVELOPMENTS 


    Code
    • Scripting interface for user-defined fitness functions
    • Structures for approximate fitness functions
    • Parallelisation for highly-distributed/GPU architecture

    Near Future Applications
    • Optimize the molecular structure of an organic dye for use in solar cells





    EVOLVE - Single/Multi-objective Genetic Algorithm for Molecular Design

    By Nick Browning

    EVOLVE - Single/Multi-objective Genetic Algorithm for Molecular Design

    • 1,241