using votes to combine rankings
about me

background in mathematics

postdoc researcher at biomedical genomics lab

focus: mutational processes and tumor evolution

interests: modelling, statistics, programming

long term goal: accomplish AoC in Haskell
the story:
2 yrs ago
Today
neighbors
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Either pipe maintenance or fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
Fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
Paint corridor or fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
paint the corridor
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
I don't really care, I hate these meetings
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
B>A=C
B=C>A
A>B=C
A=B>C
A=B=C
B=C>A
A>B=C
B>A=C
A=B>C
A=B=C
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
neighbors
reward the good!
A=B=C
A=B>C
B>A=C
B=C>A
A>B=C
...
voting
?
update
voting rights
+3%
+2%
2%
1%
2%
impact
 definition of choices
 feasibility of impact evaluation
 ethics of mutable voting rights
many caveats in reallife social choice
does it make sense in other contexts?
cancer genomics problem:
discovery of genes that drive tumor evolution
genomics
Living cells run on operating systems know as genomes
Genomes are written in a suitable extension of the ACGTlanguage
cancer genomics
 In healthy multicellular organism genomes have evolved to cooperate
 Cancer arises when genome modifications lead to unhealthy growth and expansion of a cell population
Genomes change
tumor evolution: massive trial and error
time
cell population
Specific changes in specific genes
known as cancer drivers genes
cancer genes
www.intogen.org
genome modifications
print('hello, world')
Translocation
Copies
print('hella, world')
print('worlo, helld')
print('hello, world, world')
print('hell, world')
Substitution
Deletion
statistical methods guess which genes drive
statistical model
ranking genes:
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. ...
Cohort
pvalues
what the model expects
what we observe
 pvalues
 how embarrassed is the model after observing the data
 the lower the pvalue, the higher the embarrassment!
TP53
TP53
statistical methods guess which genes drive
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. PIK3CA
3. CDH1
...
1. PIK3CA
2. MAP2K4
3. TP53
4. SETD2
5. MLL3
6. CDH1
...
1. TP53
2. PIK3CA
3. CDH1
4. MAP3K1
5. ARID1A
...
combining pvalues
Fisher
StoufferLiptak
Brown
...
combining pvalues:
a few caveats

Inconsistent rankings

Use of different scales of embarrassment

Many false positives as number of methods increase

Real data does not follow assumptions
 consistent ranking
 systematic allocation of credibility
 interpretable and statistically sound
we want a consensus of driver discovery...
ranking consistency: Schulze voting
Markus Schulze
Social Choice and Welfare, 2011, 36 (2), 267–303
how it works
 Ranking consistency essentially means "Condorcet"
 ...yet it remains fast to compute
TP53 = PIK3CA > PTEN > GATA3 > ...
PIK3CA> MAP2K4 > TP53 > SETD2 > ...
TP53 > MLL3 > CDH1 = FOXA1 > MAP2K4 > ...
...
TP53 > PIK3CA > MAP2K4 > PTEN > ...
how it works
step 1
voters = {v1, v2, v3, v4}
candidates = {c1, c2, c3, c4, c5}
 candidates are given ranks by voters
 not any rank assignment is valid
Valid Ballots
 some candidate gets 1st
 rank(c) = # {s  rank(s) < rank(c)} + 1
how it works
weight matrix
= how many voters prefer over ?
step 2
how it works
step 2
How many voters prefer over ?
how it works
step 2
How many voters prefer over ?
how it works
step 3
M defines a directed weighted graph G
Max
Min
allocation of credibility:
We want to give higher voting rights to methods that contribute more to a better outcome (!)
+3%
+2%
2%
1%
2%
https://cancer.sanger.ac.uk/census
manually curated dataset of bona fide known cancer genes
enrichment score
Given a single ranking , define an enrichment score:
: proportion of CGC genes up to rank
: weighting for rank
Enrichment of bona fide known drivers in the top positions of the consensus ranking
voting rights
preferences of voter can be scaled with a factor
...
...
...
step 1: Schulze
step 2: enrichment score
step 1 + step 2 together define a function:
allocation of credibility
formulated as an
optimization problem
in practice:
what is left?
gene selection
Composite rule based on:
 Each gene ranked by ranking combination
 Credibility leads to more accurate pvalue combination
the implementation:
Schulze voting:
numpy: http://www.numpy.org/ cython: http://cython.org/
Graph representation:
networkx: https://networkx.github.io/
key chunk of code
code that computes all the max flow paths of the weight directed graph: Floyd's algorithm
def strongest_path(long size, double [:] pref, double [:] spath):
for i in range(size):
for j in range(size):
if i != j:
if pref[i*size + j] > pref[j*size + i]:
spath[i*size + j] = pref[i*size + j]
for i in range(size):
for j in range(size):
if i != j:
for k in range(size):
if (i != k) and (j != k):
spath[j*size + k] = max(spath[j*size + k],
min(spath[j*size + i], spath[i*size + k]))
the implementation:
Optimization with constraints:
scipy: https://www.scipy.org/
scipy.optimize
...array of different optimization methods
Overkill attempts:
pyopt: http://www.pyopt.org/
ALPSO (Augmented Lagrangian Particle Swarm Optimizer)
scikitoptimize: https://scikitoptimize.github.io/
Bayesian optimization
package
Python package to experiment with these ideas
https://bitbucket.org/ferran_muinos/
features:

random ballot generator

computes consensus ranking with Schulze

with customizable voting rights

computation of weights and strength

graph plots

enrichmentbased voting rights optimization
requires:

cython, networkx, scipy
TO BE RELEASED
SOON!
IntOGen
www.intogen.org
summary:
Schulze
*
update voting rights
optimization strategy
CGC enrichment
from neighbor politics to driver discovery
+3%
+2%
2%
1%
2%
credit and thanks
Joint work in close collaboration with: Francisco MartínezJiménez
IntOGen working group: Loris Mularoni, Carlota RubioPerez, Jordi DeuPons, Inés Sentís, Iker ReyesSalazar, David Tamborero, Abel GonzalezPerez, Núria LópezBigas
Iker
Inés
Jordi
Núria
Carlota
Loris
Fran
Abel
credit and thanks
Loris Mularoni
references
Robert W. Floyd Algorithm 97 (Shortest Path) Commun ACM, 6(5), 1962, 345
Markus Schulze A new monotonic, cloneindependent, reversal symmetric, and condorcetconsistent singlewinner election method
Social Choice and Welfare, 2011, 36 (2), 267–303
how it works: backup
A path with strength is any sequence of candidates satisfying:
The strength between two candidates is the max strengths for all paths joining them:
ranking combination
By Ferran Muiños
ranking combination
Presenting a ranking combination method that makes use of a voting system alongside optimization. Schulze voting, pvalue combination statistics and cancer genomics featuring in the same talk. Presented at the PyCon Nove meeting (April 2018).
 378