Reproducibility and dissemination

Faical Yannick P. Congo

faical.congo@nist.gov

Thursday, October 26, 2017

Computational Methods Talks - NIST

http://bit.ly/2ivhPtG 

what it takes and what is out there ?

Buckle up for this Journey

BUT BEFORE...

LET'S START WITH SOME FACTS

reproducibility?

dissemination?

tools, web?

container technology

literate programming

web services

why is this important?

IN...

3 Scenarios

Things change fast in the open source world

I have to

cope with obsolescence

and new paradigms

better data structures

SOPHISTICATED computational methods

more optimization techniques

How do i keep up with all this while doing research?

Rapid prototyping

Research environment timeline is dynamic

week

day

month

year

semerter

update lib1

system update

software patch

hardware swap

system upgrade

...

How do i keep my previous research working?

during a study timeline, the environment will change for different reasons!

flexible packaging/change agnostic layer

using existing publications materials

publication

extract these

reconstruct

execute

corroborate

research involves others using your study in theirs and you doing the same!

How can i improve my publication content so that others can come to the same conclusions easily?

STRAIGHTFORWARD documentation/automatic reconstruction

How do we approach reproducibility now?

there are two main ways

what are they?

literate programming

containerization

YES: They are two very different things!

tell a human being what the computer should do

tell the computer what to do

wrap the code with guided explanation (science)

wrap the code with automation (reconstruction)

make the scientist understand enough to deal with the changes

make the automation POWERFUL enough so the scientist do not feel the changes.

literate programming

containarization

YES: you are probably already using at least an older paradigm

each time that you place a code block after a scientific explanation.

each time that you run another operating system on top of your default one: Windows on mac/ linux on mac/linux on windows/windows on linux

each time that you place a result or a graph after a code block.

each time that you open a terminal on your computer or login to your account on a cluster.

publications

virtual machines/session

So what is literate programming?

According to DONAlD knuth Most of our problems in computer science with research comes from the fact that we CONVENTIONALLY try to tell the computer what to do.

YET!

According to HIM It is more sustainable not to put all our trust in the computer but instead focus more in trying to tell to a human being what we want the computer to do.

thus

literate programming is an adequate mixture of code and a guided explanation of what we intent to do with it.

And

data/results visualization in addition now

what is the state of the art in terms of tools for literate programming?

there are mainly two approaches

executable papers

NOTEBOOKS

what is an executable paper?

it is a paper in which all results are actually outputs of executed code provided in the paper.

when accessed from an executable paper server, all the results can be re-executed by the user.

what is a notebook?

it is a digital interactive web page that can ingest text, data and code.

in addition to just ingesting them, it allows various formatting features for text.

it support various programming language thus can run the code.

it all provides some visualization features 

executable papers tools

introduction and demo in three perspectives

a high level perspective

an admin perspective

a user perspective

high

https://worksheets.codalab.org/

admin

https://github.com/codalab/codalab-worksheets/wiki/Server-Setup

user

https://github.com/codalab/codalab-worksheets/wiki/Executable-Papers

high

credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier

admin

user

high

credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier

user

admin

high

credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier

admin

user

high

credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier

admin

https://www.vistrails.org/usersguide/v2.2/html/vistrails_server.html

user

https://www.vistrails.org/usersguide/v2.2/html/latex.html

Notebooks tools

introduction and demo in three perspectives

a high level perspective

an admin perspective

a user perspective

high

http://jupyter.org/

admin

Available on:

github

dockerhub

apt-get install npm nodejs-legacy
npm install -g configurable-http-proxy
pip3 install jupyterhub
pip3 install --upgrade notebook
jupyterhub
Got to: https://localhost:8000
docker pull jupyter/jupyterhub
docker run -d jupyterhub jupyterhub
Go to: http://localhost:8000

user

http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/Notebook/Index.ipynb

high

http://zeppelin.apache.org/

admin

https://zeppelin.apache.org/docs/0.7.3/install/install.html

user

high

http://beakernotebook.com/

admin

http://beakernotebook.com/getting-started

user

http://beakernotebook.com/videos

18:00

high

https://nteract.io/

admin

https://nteract.io/desktop

user

high

http://www.joelotter.com/kajero/

admin

https://github.com/JoelOtter/kajero

user

http://www.joelotter.com/kajero/blank

what is the state of the art in terms of tools for containerization?

there are mainly two focus

container systems

hpc frameworks

what is a container system?

a container is sandbox that isolate application+data from each other.

communication and sharing is highly restricted.

within a container, an application has all its dependencies and required configurations (environment) to run properly.

what is an hpc framework?

an hpc framework is a system that automates and simplifies tasks orchestration.

schedulers such as slurm, grid engine, moab and condor are hpc frameworks.

we are talking here about hpc frameworks specialized in dealing with containers.

these two are very complimentary

creation of containers specifications

run containers

building and management of containers

orchestrate containers executions

containers images storage

SOPHISTICATED scalability and availability

containers systems

hpc frameworks

container systems/runtimes

introduction and demo in three perspectives

a high level perspective

an admin perspective

a user perspective

high

https://www.docker.com/

admin

https://www.docker.com/get-docker

user

https://docs.docker.com/get-started/part2/#define-a-container-with-a-dockerfile

high

https://coreos.com/rkt/

admin

https://github.com/rkt/rkt/blob/master/Documentation/distributions.md

user

https://coreos.com/rkt/docs/latest/getting-started-guide.html

high

https://linuxcontainers.org/

admin

https://linuxcontainers.org/lxc/getting-started/

user

high

https://www.opencontainers.org/

https://github.com/opencontainers/runc

admin

https://github.com/opencontainers/runc

user

hpc frameworks

introduction and demo in three perspectives

a high level perspective

a user perspective

high

https://github.com/docker/swarm

admin

https://docs.docker.com/swarm/plan-for-production/#swarm-manager-ha

high

https://kubernetes.io/

admin

https://kubernetes.io/docs/concepts/cluster-administration/cluster-administration-overview/

high

https://mesos.apache.org/

admin

https://mesos.apache.org/documentation/latest/architecture/

high

https://www.openstack.org/software/

admin

https://docs.openstack.org/developer/devstack/

web services you should know about

they provide open source solutions TO DISSEMINATE:

literate programming

containarization

nbviewer

we do not need to have one!

https://nbviewer.jupyter.org/

gitlab

we have one at nist!

https://gitlab.nist.gov

Github

we have usnistgov!

https://github.com/usnistgov

dockerhub

it is open access and free!

https://hub.docker.com/explore/

reproducible dissemination

By Faical Yannick Congo