Semantic MediaWiki as a platform for lab management and biological annotation

Toni Hermoso Pulido (@toniher)

Bioinformatics Core Facility

Centre for Genomic Regulation (BCN)



Work in laboratories or

core facilities


LIMS: Lab Information Management System

Proteomics Unit, CRG




Form input

Mail communication

  • Based on Semantic Tasks extension
  • Asking user for action (bring samples to the lab)
  • Informing user about request status
  • Users can opt out verbose communication

User satisfaction tracking

  • When request closed
  • Email sent. User directed to a Special Page form
  • Valid for a limited time (e. g., 2 weeks max)
  • Only editable a few times (or only once)

User satisfaction tracking

Lab operators extra input

  • Wiki-way. Flexible. Some info structured, some not
    • Documentation
    • Standard Operation Procedures (SOP)
    • Informal instrument queue

Biocore Wiki

Task management system

Bioinformatics Unit, CRG

Task input

Task view

Hour & costs list

Example of biological data Content Management System (CMS)

VastDB, Manuel Irimia's lab (CRG)

Biological data CMS


VastDB overview

Different data handling in MediaWiki as a CMS

  • User import via specific extensions
  • Using modified External data extension
  • Extensions accessing file system
    • Mirror of PDB structures

Semantic Data Import

Data from CSV input

Output view handled with

Output view handled with

Rickshaw (D3.js)

CouchDB + Lucene

Making search faster

  • CouchDB: NoSQL Document DBMS
  • Lucene: Information retrieve library. ElasticSearch or Solr based on it
  • Mapping SMW Templates to JSON documents
  • Indexing for coordinates and full-text search
  • It might be ported to ElasticSearch

Coordinate search

Full-text search

Genome Annotation

Wiki framework


Genome Annotation


Import and export formats

  • FASTA files (sequences)
  • GFF or GTF (feature, relationship, location)
  • Others: chromosome sizes, etc.
  • Raw text files
  • When convenient external tools:
    • NCBI-Blast
    • SAMTools
    • etc.

Import and export formats

##gff-version 3
##sequence-region ctg123 1 1497228
ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001
ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1

Integrating a genome browser

Integrating a genome browser

Linking pages,

conceptual hierarchies

  • By using specific properties
  • SMWParent extension
    • Quick retrieval of linked elements
      • Parent, ancestors
      • Children, descendants
      • Number of hops
      • Filter by another property value

Linking pages,

