background in mathematics
post-doc researcher at biomedical genomics lab
focus: mutational processes and tumor evolution
interests: modelling, statistics, programming
long term goal: accomplish AoC in Haskell
2 yrs ago
Today
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Either pipe maintenance or fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Paint corridor or fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
paint the corridor
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
I don't really care, I hate these meetings
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
B>A=C
B=C>A
A>B=C
A=B>C
A=B=C
B=C>A
A>B=C
B>A=C
A=B>C
A=B=C
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
A=B=C
A=B>C
B>A=C
B=C>A
A>B=C
...
voting
?
update
voting rights
+3%
+2%
-2%
-1%
-2%
impact
Living cells run on operating systems know as genomes
Genomes are written in a suitable extension of the ACGT-language
time
cell population
Specific changes in specific genes
known as cancer drivers genes
www.intogen.org
print('hello, world')
Translocation
Copies
print('hella, world')
print('worlo, helld')
print('hello, world, world')
print('hell, world')
Substitution
Deletion
statistical model
ranking genes:
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. ...
Cohort
what the model expects
what we observe
TP53
TP53
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. PIK3CA
3. CDH1
...
1. PIK3CA
2. MAP2K4
3. TP53
4. SETD2
5. MLL3
6. CDH1
...
1. TP53
2. PIK3CA
3. CDH1
4. MAP3K1
5. ARID1A
...
Fisher
Stouffer-Liptak
Brown
...
Inconsistent rankings
Use of different scales of embarrassment
Many false positives as number of methods increase
Real data does not follow assumptions
Markus Schulze
Social Choice and Welfare, 2011, 36 (2), 267–303
TP53 = PIK3CA > PTEN > GATA3 > ...
PIK3CA> MAP2K4 > TP53 > SETD2 > ...
TP53 > MLL3 > CDH1 = FOXA1 > MAP2K4 > ...
...
TP53 > PIK3CA > MAP2K4 > PTEN > ...
step 1
voters = {v1, v2, v3, v4}
candidates = {c1, c2, c3, c4, c5}
Valid Ballots
weight matrix
= how many voters prefer over ?
step 2
step 2
How many voters prefer over ?
step 2
How many voters prefer over ?
step 3
M defines a directed weighted graph G
Max
Min
We want to give higher voting rights to methods that contribute more to a better outcome (!)
+3%
+2%
-2%
-1%
-2%
https://cancer.sanger.ac.uk/census
manually curated dataset of bona fide known cancer genes
Given a single ranking , define an enrichment score:
: proportion of CGC genes up to rank
: weighting for rank
Enrichment of bona fide known drivers in the top positions of the consensus ranking
preferences of voter can be scaled with a factor
...
...
...
step 1: Schulze
step 2: enrichment score
step 1 + step 2 together define a function:
formulated as an
Composite rule based on:
Schulze voting:
numpy: http://www.numpy.org/ cython: http://cython.org/
Graph representation:
networkx: https://networkx.github.io/
code that computes all the max flow paths of the weight directed graph: Floyd's algorithm
def strongest_path(long size, double [:] pref, double [:] spath):
for i in range(size):
for j in range(size):
if i != j:
if pref[i*size + j] > pref[j*size + i]:
spath[i*size + j] = pref[i*size + j]
for i in range(size):
for j in range(size):
if i != j:
for k in range(size):
if (i != k) and (j != k):
spath[j*size + k] = max(spath[j*size + k],
min(spath[j*size + i], spath[i*size + k]))
Optimization with constraints:
scipy: https://www.scipy.org/
scipy.optimize
...array of different optimization methods
Overkill attempts:
pyopt: http://www.pyopt.org/
ALPSO (Augmented Lagrangian Particle Swarm Optimizer)
scikit-optimize: https://scikit-optimize.github.io/
Bayesian optimization
Python package to experiment with these ideas
https://bitbucket.org/ferran_muinos/
features:
random ballot generator
computes consensus ranking with Schulze
with customizable voting rights
computation of weights and strength
graph plots
enrichment-based voting rights optimization
requires:
cython, networkx, scipy
TO BE RELEASED
SOON!
www.intogen.org
Schulze
*
update voting rights
optimization strategy
CGC enrichment
from neighbor politics to driver discovery
+3%
+2%
-2%
-1%
-2%
Joint work in close collaboration with: Francisco Martínez-Jiménez
IntOGen working group: Loris Mularoni, Carlota Rubio-Perez, Jordi Deu-Pons, Inés Sentís, Iker Reyes-Salazar, David Tamborero, Abel Gonzalez-Perez, Núria López-Bigas
Iker
Inés
Jordi
Núria
Carlota
Loris
Fran
Abel
Loris Mularoni
Robert W. Floyd Algorithm 97 (Shortest Path) Commun ACM, 6(5), 1962, 345
Markus Schulze A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method
Social Choice and Welfare, 2011, 36 (2), 267–303
A path with strength is any sequence of candidates satisfying:
The strength between two candidates is the max strengths for all paths joining them: