Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
Seminar Group on Big and Scientific Data
University of Pennsylvania
February 10, 2017
DSL Conference Room
Moore 102
11:00 am – 12:00 pm
By Daniel Himmelstein
@dhimmel
Slides at slides.com/dhimmel/big-data-seminar
http://www.greenelab.com/
There are many graph databases. I'm most familiar with Neo4j which is:
Graphs are composed of:
Nodes / relationships have type:
Source: From Relational to Neo4j
Source: From Relational to Neo4j
Limitations:
Source: From Relational to Neo4j
Source: From Relational to Neo4j
Source: From Relational to Neo4j
From Meet openCypher: The SQL for Graphs. Neo4j Blog
multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network
networks with multiple node or relationship types
A 2012 Study identified 26 different names for this type of network:
hetnet
What's the best software for storing and querying hetnets?
dhimmel/hetio | |
---|---|
86 | |
5 | |
2 |
neo4j/neo4j |
---|
42,498 |
3,071 |
1,007 |
GitHub stats from 2016-10-09
Visualizing Hetionet v1.0
https://github.com/greenelab/snorkeling
Details at doi.org/brsc
MATCH path =
// Specify the type of path to match
(n0:Disease)-[e1:ASSOCIATES_DaG]-(n1:Gene)-[:INTERACTS_GiG]-
(n2:Gene)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE
// Specify the source and target nodes
n0.name = 'multiple sclerosis' AND
n3.name = 'retina layer formation'
// Require GWAS support for the
// Disease-associates-Gene relationship
AND 'GWAS Catalog' in e1.sources
// Require the interacting gene to be
// upregulated in a relevant tissue
AND exists(
(n0)-[:LOCALIZES_DlA]-(:Anatomy)-[:UPREGULATES_AuG]-(n2))
RETURN path
More queries at thinklab.com/d/220
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
bioRxiv. 2016. DOI: 10.1101/087619
Compound–causes–SideEffect–causes–Compound–treats–Disease
Compound–binds–Gene–binds–Compound–treats–Disease
Compound–binds–Gene–associates–Disease
Compound–binds–Gene–participates–Pathway–participates–Disease
MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
(n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
AND n4.name = 'nicotine dependence'
AND n1 <> n3
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n2)),
size((n2)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n3)),
size((n3)-[:ASSOCIATES_DaG]-()),
size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
path,
reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10
Cypher query to find the top CbGbPWaD paths
(browse all predictions at het.io/repurpose)
Discuss at thinklab.com/d/224
Discuss at thinklab.com/d/224#5
Discuss at thinklab.com/d/224#5
Discuss at thinklab.com/d/230#14
Methotrexate treats 19 diseases and hypertension is treated by 68 compounds. Methotrexate received a 79.6% prior probability of treating hypertension, whereas a compound and disease that both had only one treatment received a prior of 0.12%.
Slides at slides.com/dhimmel/big-data-seminar
https://github.com/cognoma/cognoma
Advertisement: Cognoma Meetup with DataPhilly & Code for Philly
By Daniel Himmelstein
Presentation for the Seminar/Reading Group on Big and Scientific Data at Penn (http://www.cis.upenn.edu/~zives/datascience/) on February 10, 2017.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.