Parallelization of a BioInformatics

program in Python

Participant:

```
Zaika Vladyslav 
```

Supervisors:

```
Denis Pallez
```
```
Claude Pasquier
```

miRAI beat cancer

microRNA - nucleotides, that regulate thousands of human genes

miRNA desregulations related to development of various diseases (CANCER)

miRAI predicts associations between miRNA and diseases.

Problem

miRAI uses many parameters (37) to perform predictions

all combination of parameters represents:

Computation of one case takes ~ 4min - 4hours

2^{37} = 137438953472 cases

2^{37} = 137438953472 cases

Solution

Distribute miRAI computations on cluster

Genetic algorithms to accelarate computations

Assigned tasks

1. Select parallel evolutionary python fw

2. Perform computations on one node

3. Configure cluster nodes

4. Distribute computation to cluster

5. Compare frameworks

6. Propose improvements

Inspyred

FW for evolutionary computations

Connected with PP module

Inspyred

Adapted for local networks

Quick bootstrap

Scheduler issue

on the nodes:
node-1> ./ppserver.py -a
node-2> ./ppserver.py -a

final_pop = ea.evolve(generator=generate, evaluator=inspyred.ec.evaluators.parallel_evaluation_pp,
pp_evaluator=evaluate,

pp_servers=("*",),
                          pp_dependencies=(my_squaring_function,),
                          pp_modules=("math",),
                          pop_size=8,
                          bounder=inspyred.ec.Bounder(-5.12, 5.12),
                          maximize=False,
                          max_evaluations=256,
                          num_inputs=3)

DEAP

FW created specially for parallel evaluation executions

Uses SCOOP for parallelism

Quebec, Laval university project

DEAP

Connects to any machine

No tuning on nodes

Hard to configure

Manual config of hosts

from scoop import futures

toolbox.register("map", futures.map)

python -m scoop --hostfile hosts program.py

hostname_or_ip 4
other_hostname
third_hostname 2

Cluster

Inspyred:

fast start
minimum code
auto configuration
single cluster
scheduling

DEAP:

no scripts on nodes
scalability
still active
manual config
hard to setup

PP vs SCOOP

SCOOP

Scheduling

PP scheduler assign all the tasks in the begining

SCOOP scheduler wait until current task finishes

SCOOP + Inspyred

Dev scoop parallelism for inspyred

final_pop = my_ec.evolve(generator=generate,
evaluator=parallel_evaluator_scoop,
scoop_evaluator=evaluate,
pop_size=1,
maximize=True,
max_generations=5,
num_elites=_NumberOfElite,
seeds=None,
dimension_bits=_NumberOfBits
)

def evaluate(candidates, args):
fitness = []
for cs in candidates:
fit = miRAI.evaluate(params)
fitness.append(fit)
return fitness

def generate(random, args):
size = args.get('dimension_bits', 10)
return [random.choice((0,1)) for i in range(size)]

def parallel_evaluator_scoop(candidates, args):
evaluator = args['scoop_evaluator']
results = list(futures.map(evaluator, candidates, args))
return results

Benchmarking

max execution time on node
mean execution time on node
check Amdahl's law:

T(p)=Ts+Tp/p

T(p)=Ts+Tp/p

Future steps

Test SCOOP + Inspyred
Put everything to cluster
Calculate miRAI algorithm
Proceed the results

OAR

OAR - task manager

jdoe@idpot:~$ oarsub -I -l /nodes=3/core=1

jdoe@idpot5:~$ cat $OAR_NODEFILE
idpot5.grenoble.grid5000.fr
idpot8.grenoble.grid5000.fr
idpot9.grenoble.grid5000.fr

#!/bin/bash

python3 insp_script.py

Conclusion

Test Inpyred vs DEAP
OAR configuration
Detect weak sides
SCOOP + Inspyred adhoc
Benchmarks

Thank you

vlad@nowinfinity.com.au

Publications

Scientific paper: "Prediction of miRNA-disease associations with vector space model."

miRAI beat cancer

Problem

Solution

Assigned tasks

Inspyred

Inspyred

DEAP

DEAP

Cluster

PP vs SCOOP

Scheduling

SCOOP + Inspyred

Benchmarking

Future steps

OAR

Conclusion

Thank you

Publications

PFE presentation

PFE presentation

Vladyslav Zaika

miRAI beat cancer

Problem

Solution

Assigned tasks

Inspyred

Inspyred

DEAP

DEAP

Cluster

PP vs SCOOP

Scheduling

SCOOP + Inspyred

Benchmarking

Future steps

OAR

Conclusion

Thank you

Publications

PFE presentation

More from Vladyslav Zaika