Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
DataPhilly Meetup
Papadakis Building, Room 120
June 6, 2017
Slides at slides.com/dhimmel/dataphilly
http://www.greenelab.com/
DataPhilly Talk Abstract:
Hetnets are networks with multiple node and relationship types. He will discuss when hetnets are the right tool for integrating and analyzing diverse types of data. Specifically, he'll showcase Project Rephetio, which predicts new uses for existing drugs. This project created Hetionet, a network with 2.25 million relationships of 24 types, allowing researchers to ask questions that span the many realms of biomedical knowledge.
About Daniel Himmelstein:
Daniel Himmelstein is a "digital craftsman of the biodata revolution" who works in the Greene Lab at Penn. In 2016, he received his PhD in Biological & Medical Informatics from UCSF. Daniel leads the Cognoma (DataPhilly) Datathon and was a finalist for "Scientist of the Year" in the 2016 Philly Geek Awards. His research focuses on integrating open data to uncover the secrets of human health.
https://www.meetup.com/DataPhilly/events/240213100/
Graphs are composed of:
Nodes / relationships have type:
networks with multiple node or relationship types
multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network
A 2012 Study identified 26 different names for this type of network:
hetnet
Source: From Relational to Neo4j
Source: From Relational to Neo4j
Limitations:
Source: From Relational to Neo4j
What's the best software for storing and querying hetnets?
dhimmel/hetio | |
---|---|
86 | |
5 | |
2 |
neo4j/neo4j |
---|
42,498 |
3,071 |
1,007 |
GitHub stats from 2016-10-09
Visualizing Hetionet v1.0
>>> import phd
Details at doi.org/brsc
MATCH path =
// Specify the type of path to match
(n0:Disease)-[e1:ASSOCIATES_DaG]-(n1:Gene)-[:INTERACTS_GiG]-
(n2:Gene)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE
// Specify the source and target nodes
n0.name = 'multiple sclerosis' AND
n3.name = 'retina layer formation'
// Require GWAS support for the
// Disease-associates-Gene relationship
AND 'GWAS Catalog' in e1.sources
// Require the interacting gene to be
// upregulated in a relevant tissue
AND exists(
(n0)-[:LOCALIZES_DlA]-(:Anatomy)-[:UPREGULATES_AuG]-(n2))
RETURN path
More queries at thinklab.com/d/220
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
bioRxiv. 2016. DOI: 10.1101/087619
features = metapaths
observations =
compound–disease pairs
positives = treatments
negatives =
non-treatments
Compound–causes–SideEffect–causes–Compound–treats–Disease
Compound–binds–Gene–binds–Compound–treats–Disease
Compound–binds–Gene–associates–Disease
Compound–binds–Gene–participates–Pathway–participates–Disease
MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
(n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
AND n4.name = 'nicotine dependence'
AND n1 <> n3
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n2)),
size((n2)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n3)),
size((n3)-[:ASSOCIATES_DaG]-()),
size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
path,
reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10
Cypher query to find the top CbGbPWaD paths
Try at https://neo4j.het.io
Browse all predictions at het.io/repurpose. Discuss at thinklab.com/d/224
Discuss at thinklab.com/d/224#5
Discuss at thinklab.com/d/230#14
Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. …
I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.
One network to rule them all
We have completed an initial version of our network. …
Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.
Recommendations:
https://github.com/cognoma/cognoma
Advertisement: Cognoma Meetup with DataPhilly & Code for Philly
Next meetup:
June 27
By Daniel Himmelstein
Presentation at the 2017-06-06 DataPhilly meetup. Details at https://meetu.ps/39DxjV. This presentation is released under a CC BY 4.0 License.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.