Some good practices in Open Science

Toni Hermoso Pulido (@toniher)

Bioinformatics Core Facility

Centre for Genomic Regulation (BCN)

https://biocore.crg.eu


Text-content license: CC-BY 4.0
Related slides: Open Science. Good practices in Bioinformatics

Open Science

Open Science

Document

Write it down or ...

it didn't happen!

Document: Why?

  • Organise ideas
  • Understanding code and steps in the future for you and others
  • Fixing errors
  • Help in future publication

Document: Where?

  • File System (e.g. README or TODO files)
  • Control Version System
    • Git, SVN, etc.
  • Content Management System
    • Wiki CMS, Drupal, etc.

Document: How?

  • Plain text
  • Format

Markdown

Tag and track

I never said so!

Tag and track: Why?

 

  • Convenient backup
  • Error tracking and reversion
  • Checking history
  • Allowing collaboration on different time points
  • Publication of specific snapshots

Tag and track: Where?

 

  • Code, documentation:
  • Data, files
    • Document Management Systems

Git: collaboration

Tag and track: Publish

 

  • Working and executable code
    • Docker & Singularity hubs
  • Identify Content & Code (DOI)

Reproduce

Run it again, Sam!

Reproduce: Why?

  • Nowadays not only textual statements but also code and data
  • Peers and collaborators should be able to reproduce by themselves
    • Check errors
    • Improve code, data
    • Test in different conditions

 

Standing on the shoulders of giants

Reproduce: How?

  • Code requirements, recipes
    • Scripts
    • Test frameworks
    • Package managers (e.g. Conda)
    • Jupyter
  • Virtualisation

Reproduce: Jupyter

  • Former IPython Notebook
  • Combines in a single notebook documentation (Markdown), comments and executable code with its output
  • Can be exported into PDF, HTML, etc.

Reproduce: Jupyter

Reproduce: DevOps

Reproduce: Containers

Reproduce: Docker & Singularity

Pipelines & Workflows

Guilty by association

Pipelines & Workflows: Why?

 

  • Write programs that do one thing and do it well.
  • Write programs to work together.
  • Write programs to handle text streams, because that is a universal interface.

Unix Philosophy

D. McIlroy, P.H.Salus

Pipelines & Workflows: How?

 

  • Fast prototyping
  • Polyglot (any programming language can be included)
  • Highly scalable and portable (many HPC and cloud environments)
  • Reproducible (native support containers)
  • Continuous checkpoints / resuming. Expanding pipelines

 

Questions?

Comments?