Parallelization of a BioInformatics
program in Python
Participant:
Zaika Vladyslav
Supervisors:
Denis Pallez
Claude Pasquier
microRNA - nucleotides, that regulate thousands of human genes
miRNA desregulations related to development of various diseases (CANCER)
miRAI predicts associations between miRNA and diseases.
miRAI uses many parameters (37) to perform predictions
all combination of parameters represents:
Computation of one case takes ~ 4min - 4hours
Distribute miRAI computations on cluster
Genetic algorithms to accelarate computations
1. Select parallel evolutionary python fw
2. Perform computations on one node
3. Configure cluster nodes
4. Distribute computation to cluster
5. Compare frameworks
6. Propose improvements
FW for evolutionary computations
Connected with PP module
Adapted for local networks
Quick bootstrap
Scheduler issue
on the nodes: node-1> ./ppserver.py -a node-2> ./ppserver.py -a
final_pop = ea.evolve(generator=generate, evaluator=inspyred.ec.evaluators.parallel_evaluation_pp,
pp_evaluator=evaluate,
pp_servers=("*",),
pp_dependencies=(my_squaring_function,),
pp_modules=("math",),
pop_size=8,
bounder=inspyred.ec.Bounder(-5.12, 5.12),
maximize=False,
max_evaluations=256,
num_inputs=3)
FW created specially for parallel evaluation executions
Uses SCOOP for parallelism
Quebec, Laval university project
Connects to any machine
No tuning on nodes
Hard to configure
Manual config of hosts
from scoop import futures
toolbox.register("map", futures.map)
python -m scoop --hostfile hosts program.py
hostname_or_ip 4
other_hostname
third_hostname 2
Inspyred:
DEAP:
PP
SCOOP
PP scheduler assign all the tasks in the begining
SCOOP scheduler wait until current task finishes
Dev scoop parallelism for inspyred
final_pop = my_ec.evolve(generator=generate,
evaluator=parallel_evaluator_scoop,
scoop_evaluator=evaluate,
pop_size=1,
maximize=True,
max_generations=5,
num_elites=_NumberOfElite,
seeds=None,
dimension_bits=_NumberOfBits
)
def evaluate(candidates, args):
fitness = []
for cs in candidates:
fit = miRAI.evaluate(params)
fitness.append(fit)
return fitness
def generate(random, args):
size = args.get('dimension_bits', 10)
return [random.choice((0,1)) for i in range(size)]
def parallel_evaluator_scoop(candidates, args):
evaluator = args['scoop_evaluator']
results = list(futures.map(evaluator, candidates, args))
return results
OAR - task manager
jdoe@idpot:~$ oarsub -I -l /nodes=3/core=1
jdoe@idpot5:~$ cat $OAR_NODEFILE
idpot5.grenoble.grid5000.fr
idpot8.grenoble.grid5000.fr
idpot9.grenoble.grid5000.fr
#!/bin/bash
python3 insp_script.py
vlad@nowinfinity.com.au
Scientific paper: "Prediction of miRNA-disease associations with vector space model."