background in mathematics
post-doc researcher at biomedical genomics lab (IRB)
focus: mutational processes and tumor evolution
interests: math models, statistics, programming
long term goal in life: Advent of Code in Haskell
2 yrs ago
Today
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Either pipe maintenance or fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
Fix the front
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
I don't really care, I hate these meetings
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
B>A=C
B=C>A
A>B=C
A=B>C
A=B=C
B=C>A
A>B=C
B>A=C
A=B>C
A=B=C
A. Paint corridor
B. Fix Front of Building
C. Pipe Maintenance
A=B=C
A=B>C
B>A=C
B=C>A
A>B=C
...
voting
?
update
voting rights
+3%
+2%
-2%
-1%
-2%
impact
Living cells run on operating systems know as genomes
Genomes are written in a suitable extension of the ACGT-language
Specific genes are hacked
cancer drivers genes
www.intogen.org
print('hello, world')
Translocation
Copies
print('hella, world')
print('worlo, helld')
print('hello, world, world')
print('hell, world')
Substitution
Deletion
statistical model
ranking genes:
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. ...
Cohort
what the model expects
(background model)
what we observe
TP53
TP53
1. TP53
2. PIK3CA
3. PTEN
4. GATA3
5. RUNX1
6. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. MLL3
3. CDH1
4. FOXA1
5. MAP2K4
...
1. TP53
2. PIK3CA
3. CDH1
...
1. PIK3CA
2. MAP2K4
3. TP53
4. SETD2
5. MLL3
6. CDH1
...
1. TP53
2. PIK3CA
3. CDH1
4. MAP3K1
5. ARID1A
...
Fisher
Stouffer-Liptak
Brown
...
Inconsistent rankings
Use of different scales of embarrassment
Many false positives as number of methods increase
Real data does not follow assumptions
Markus Schulze
Social Choice and Welfare, 2011, 36 (2), 267–303
TP53 = PIK3CA > PTEN > GATA3 > ...
PIK3CA> MAP2K4 > TP53 > SETD2 > ...
TP53 > MLL3 > CDH1 = FOXA1 > MAP2K4 > ...
...
TP53 > PIK3CA > MAP2K4 > PTEN > ...
step 1
voters = {v1, v2, v3, v4}
candidates = {c1, c2, c3, c4, c5}
Valid Ballots
weight matrix
= how many voters prefer over ?
step 2
step 2
How many voters prefer over ?
step 2
How many voters prefer over ?
step 3
M defines a directed weighted graph G
Max
Min
A path in the weights graph is a sequence of nodes
has strength if is the maximum satisfying:
The strength between candidates x, y is the max strength among all paths joining them:
Theorem:
The set of candidates equipped with the relation gives a partially ordered set.
We want to give higher voting rights to methods that contribute more to a good outcome (!)
+3%
+2%
-2%
-1%
-2%
https://cancer.sanger.ac.uk/census
manually curated dataset of bona fide known cancer genes
Given a single ranking , define an enrichment score:
: proportion of CGC genes up to rank
: weighting for rank
Enrichment of bona fide known drivers in the top positions of the consensus ranking
scale rankings with weights
(voting rights or credibility)
...
...
...
step 1: Schulze
step 2: enrichment score
step 1 + step 2 together define a function:
...
...
...
step 1: Schulze
step 2: enrichment score
Optimize (with constraints) to find most credible voting rights
Composite rule based on:
Schulze voting:
numpy: http://www.numpy.org/ cython: http://cython.org/
Graph representation:
networkx: https://networkx.github.io/
code that computes all the max flow paths of the weight directed graph: Floyd's algorithm
def strongest_path(long size, double [:] pref, double [:] spath):
for i in range(size):
for j in range(size):
if i != j:
if pref[i*size + j] > pref[j*size + i]:
spath[i*size + j] = pref[i*size + j]
for i in range(size):
for j in range(size):
if i != j:
for k in range(size):
if (i != k) and (j != k):
spath[j*size + k] = max(spath[j*size + k],
min(spath[j*size + i], spath[i*size + k]))
Python package to experiment with Schulze's voting algorithm
https://bitbucket.org/ferran_muinos/consensus
features:
random ballot generator
computes consensus ranking with Schulze
with customizable voting rights
computation of weights and strength
graph plots
special dependencies:
cython, networkx
Schulze
*
update voting rights
optimization strategy
CGC enrichment
from neighbor politics to driver discovery
+3%
+2%
-2%
-1%
-2%
Iker
Inés
Jordi
Núria
Carlota
Loris
Fran
Abel
Oriol
www.intogen.org
Robert W. Floyd Algorithm 97 (Shortest Path) Commun ACM, 6(5), 1962, 345
Markus Schulze A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method
Social Choice and Welfare, 2011, 36 (2), 267–303