Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
AI Therapeutics
Guilford Connecticut
December 21, 2018
Slides at slides.com/dhimmel/ai-thera
http://www.greenelab.com/
too simple
single node type
single relationship type
networks with multiple node or relationship types
multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network
A 2012 Study identified 26 different names for this type of network:
hetnet
Visualizing Hetionet v1.0
What's the best software for storing and querying hetnets?
dhimmel/hetio | |
---|---|
136 | |
18 | |
6 |
neo4j/neo4j |
---|
53,793 |
4,727 |
1,283 |
GitHub stats from 2018-02-21
Details at doi.org/brsc
MATCH path =
// Specify the type of path to match
(n0:Disease)-[e1:ASSOCIATES_DaG]-(n1:Gene)-[:INTERACTS_GiG]-
(n2:Gene)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE
// Specify the source and target nodes
n0.name = 'multiple sclerosis' AND
n3.name = 'retina layer formation'
// Require GWAS support for the
// Disease-associates-Gene relationship
AND 'GWAS Catalog' in e1.sources
// Require the interacting gene to be
// upregulated in a relevant tissue
AND exists(
(n0)-[:LOCALIZES_DlA]-(:Anatomy)-[:UPREGULATES_AuG]-(n2))
RETURN path
More queries at thinklab.com/d/220
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/10.7554/eLife.26726
features = metapaths
observations =
compound–disease pairs
positives = treatments
negatives =
non-treatments
Browse at het.io/repurpose/metapaths.html
Compound–causes–SideEffect–causes–Compound–treats–Disease
Compound–binds–Gene–binds–Compound–treats–Disease
Compound–binds–Gene–associates–Disease
Compound–binds–Gene–participates–Pathway–participates–Disease
MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
(n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
AND n4.name = 'nicotine dependence'
AND n1 <> n3
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n2)),
size((n2)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n3)),
size((n3)-[:ASSOCIATES_DaG]-()),
size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
path,
reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10
Cypher query to find the top CbGbPWaD paths
Try at https://neo4j.het.io
Browse all predictions at het.io/repurpose. Discuss at thinklab.com/d/224
Discuss at thinklab.com/d/224#5
Discuss at thinklab.com/d/230#14
Kyle Kloster
@kkloste
Michael Zietz
@zietzm
https://github.com/greenelab/hetmech
the hetnet search engine
supported by
https://zietzm.github.io/Vagelos2017/
days to seconds
metapath_id | path_count | dwpc | p_value | source_degree | target_degree | n_dwpcs | n_nonzero_dwpcs | nonzero_mean | nonzero_sd |
---|---|---|---|---|---|---|---|---|---|
DaGpBPpG | 435 | 2.8 | 0.0000% | 373 | 32 | 29,000 | 29,000 | 2.1 | 0.12 |
DaGeAeG | 6,204 | 2.0 | 0.0000% | 373 | 28 | 53,000 | 53,000 | 1.9 | 0.02 |
DpSpDaG | 25 | 4.4 | 0.0134% | 17 | 6 | 101,000 | 100,994 | 2.4 | 0.45 |
DrDaG | 3 | 5.1 | 0.2442% | 5 | 6 | 181,800 | 32,414 | 3.9 | 0.51 |
DlAlDaG | 42 | 3.7 | 0.7010% | 33 | 6 | 20,200 | 20,200 | 2.7 | 0.38 |
DpSpDdG | 5 | 3.4 | 2.7443% | 17 | 2 | 1,065,000 | 1,009,309 | 1.9 | 0.67 |
DrDuGiG | 1 | 1.1 | 3.2758% | 5 | 2 | 2,885,400 | 124,913 | 2.0 | 1.16 |
DdGuDdG | 4 | 3.8 | 4.3339% | 45 | 2 | 213,000 | 142,687 | 2.9 | 0.56 |
DdGeAeG | 739 | 1.6 | 6.1179% | 45 | 28 | 53,000 | 53,000 | 1.5 | 0.05 |
DdGcGr>G | 1 | 2.9 | 6.8995% | 45 | 7 | 69,600 | 6,278 | 3.7 | 1.00 |
DaG | 1 | 5.3 | 8.8886% | 373 | 6 | 20,200 | 3,591 | 5.3 | 0.00 |
DlAdGcG | 11 | 2.0 | 9.7265% | 33 | 6 | 115,400 | 105,130 | 1.1 | 0.63 |
Most enriched types of paths connecting FTO and obesity
metapath_id path_count dwpc p_value source_degree target_degree n_dwpcs n_nonzero_dwpcs nonzero_mean nonzero_sd
DaGpBPpG 435 2.8 0.0000% 373 32 29,000 29,000 2.1 0.12
DaGeAeG 6,204 2.0 0.0000% 373 28 53,000 53,000 1.9 0.02
DpSpDaG 25 4.4 0.0134% 17 6 101,000 100,994 2.4 0.45
DrDaG 3 5.1 0.2442% 5 6 181,800 32,414 3.9 0.51
DlAlDaG 42 3.7 0.7010% 33 6 20,200 20,200 2.7 0.38
DpSpDdG 5 3.4 2.7443% 17 2 1,065,000 1,009,309 1.9 0.67
DrDuGiG 1 1.1 3.2758% 5 2 2,885,400 124,913 2.0 1.16
DdGuDdG 4 3.8 4.3339% 45 2 213,000 142,687 2.9 0.56
DdGeAeG 739 1.6 6.1179% 45 28 53,000 53,000 1.5 0.05
DdGcGr>G 1 2.9 6.8995% 45 7 69,600 6,278 3.7 1.00
DaG 1 5.3 8.8886% 373 6 20,200 3,591 5.3 0.00
DlAdGcG 11 2.0 9.7265% 33 6 115,400 105,130 1.1 0.63
https://github.com/greenelab/snorkeling
David Robinson
@danich1
@dhimmel
0000-0002-3012-7446
By Daniel Himmelstein
Presentation on 2018-12-20 at AI Therapeutics in Guilford Connecticut. This presentation is released under a CC BY 4.0 License.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.