Semantic MediaWiki as a platform for lab management and biological annotation

Toni Hermoso Pulido (@toniher)

Bioinformatics Core Facility

Centre for Genomic Regulation (BCN)

https://biocore.crg.eu

 

Context

Work in laboratories or

core facilities

ProteoWiki

LIMS: Lab Information Management System

Proteomics Unit, CRG

ProteoWiki

ProteoWiki

ProteoWiki

Form input

Mail communication

  • Based on Semantic Tasks extension
  • Asking user for action (bring samples to the lab)
  • Informing user about request status
  • Users can opt out verbose communication

User satisfaction tracking

  • When request closed
  • Email sent. User directed to a Special Page form
  • Valid for a limited time (e. g., 2 weeks max)
  • Only editable a few times (or only once)

User satisfaction tracking

Lab operators extra input

  • Wiki-way. Flexible. Some info structured, some not
    • Documentation
    • Standard Operation Procedures (SOP)
    • Informal instrument queue

Biocore Wiki

Task management system

Bioinformatics Unit, CRG

Biocore Wiki

Biocore Wiki

Task input

Biocore Wiki

Task view

Biocore Wiki

Hour & costs list

Example of biological data Content Management System (CMS)

VastDB, Manuel Irimia's lab (CRG)

Biological data CMS

VastDB

Biological data CMS

VastDB

VastDB overview

Different data handling in MediaWiki as a CMS

  • User import via specific extensions
  • Using modified External data extension
  • Extensions accessing file system
    • Mirror of PDB structures

Semantic Data Import

Data from CSV input

Output view handled with

handsontable.com

Semantic Data Import

Output view handled with

Rickshaw (D3.js)

CouchDB + Lucene

Making search faster

  • CouchDB: NoSQL Document DBMS
  • Lucene: Information retrieve library. ElasticSearch or Solr based on it
  • Mapping SMW Templates to JSON documents
  • Indexing for coordinates and full-text search
  • It might be ported to ElasticSearch

CouchDB + Lucene

Coordinate search

CouchDB + Lucene

Full-text search

Genome Annotation

Wiki framework

AnnoWiki

Genome Annotation

AnnoWiki

Import and export formats

  • FASTA files (sequences)
  • GFF or GTF (feature, relationship, location)
  • Others: chromosome sizes, etc.
  • Raw text files
  • When convenient external tools:
    • NCBI-Blast
    • SAMTools
    • etc.

Import and export formats

Import and export formats

FASTA

Import and export formats

GFF

##gff-version 3
##sequence-region ctg123 1 1497228
ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001
ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1

Integrating a genome browser

Integrating a genome browser

Linking pages,

conceptual hierarchies

  • By using specific properties
  • SMWParent extension
    • Quick retrieval of linked elements
      • Parent, ancestors
      • Children, descendants
      • Number of hops
      • Filter by another property value

Linking pages,

conceptual hierarchies

Acknowledgements

Biocore Wiki

Carlos Company

Julia Ponomarenko

Luca Cozzuto

Sarah Bonnin

Guglielmo Roma

et al.

ProteoWiki

Eduard Sabidó

Francesco Mancuso

Cristina Chiva

Eva Borràs

Guadalupe Espadas

et al.

VastDB

Manuel Irimia

Javier Tapial

Luca Cozzuto

AnnoWiki

Luca Cozzuto

Carlos Company

... and all involved open-source community

Questions?