Cite as: eLife 2020;9:e52614 DOI: 10.7554/eLife.52614

Wikidata

  • unlimited scope
  • structured text
  • 750 million statements
  • 61 million items
  • 12,000 active users
  • 100 active computational bots: imports of large structured databases

Wikipedia

  • unlimited scope
  • mostly free text

FAIR

  • Findable
  • Accessible
  • Interoperable
  • Reusable

Data integration

Knowledge integration

query individual databases

    stability?

harmonizing

    formats

    licensing

😊

😟

anyone can publish data
(following guidelines)

centralized resources

difficult

expensive

common data model

good queries

Data integration

Knowledge integration

Wikidata
community of contributors

  • domain experts
  • bot developers
  • queryable SPARQL

👏

👍

😊

heterogeneous knowledge graph

  • genes and proteins
  • genetic variants
  • chemical compounds (e.g. drugs)
  • pathways
  • diseases
  • references

bot automation

retrieve, transform, normalize, upload
Wikidata Integrator (WDI): Jenkins

identifier translation with SPARQL

  • limitless scope
  • high performance
  • high availability
  • real-time community editing

integrative queries

crowdsourced curation

Wikidata vs Disease Ontology

  • Late 2015: seeded Wikidata with items from Disease Ontology
  • 2018: start comparison
    • 2030 new cross references GARD + MeSH
      • 98.9% correct

BOQA analysis of suspected cases of the disease Congenital Disorder of Deglycosylation (CDDG)

Human Phenotype Ontology

-derived annotations (of 273 extra)

Rephetio algorithm on Wikidata knowledge graph

biomedically-focused subgraph of Wikidata

  • 19 node types
  • 41 edge types

FAIR

  • Findable ✅
  • Accessible ✅
  • Interoperable ✅
  • Reusable ✅

question? comments? memes?

Wikidata

By Trang Le

Wikidata

2021-03-26

  • 485