Reproducibility and dissemination
Faical Yannick P. Congo
faical.congo@nist.gov
Thursday, October 26, 2017
Computational Methods Talks - NIST
http://bit.ly/2ivhPtG
what it takes and what is out there ?
Buckle up for this Journey
BUT BEFORE...
LET'S START WITH SOME FACTS
reproducibility?
dissemination?
tools, web?
container technology
literate programming
web services
why is this important?
IN...
3 Scenarios
Things change fast in the open source world
I have to
cope with obsolescence
and new paradigms
better data structures
SOPHISTICATED computational methods
more optimization techniques
How do i keep up with all this while doing research?
Rapid prototyping
Research environment timeline is dynamic
week
day
month
year
semerter
update lib1
system update
software patch
hardware swap
system upgrade
...
How do i keep my previous research working?
during a study timeline, the environment will change for different reasons!
flexible packaging/change agnostic layer
using existing publications materials
publication
extract these
reconstruct
execute
corroborate
research involves others using your study in theirs and you doing the same!
How can i improve my publication content so that others can come to the same conclusions easily?
STRAIGHTFORWARD documentation/automatic reconstruction
How do we approach reproducibility now?
there are two main ways
what are they?
literate programming
containerization
YES: They are two very different things!
tell a human being what the computer should do
tell the computer what to do
wrap the code with guided explanation (science)
wrap the code with automation (reconstruction)
make the scientist understand enough to deal with the changes
make the automation POWERFUL enough so the scientist do not feel the changes.
literate programming
containarization
YES: you are probably already using at least an older paradigm
each time that you place a code block after a scientific explanation.
each time that you run another operating system on top of your default one: Windows on mac/ linux on mac/linux on windows/windows on linux
each time that you place a result or a graph after a code block.
each time that you open a terminal on your computer or login to your account on a cluster.
publications
virtual machines/session
So what is literate programming?
According to DONAlD knuth Most of our problems in computer science with research comes from the fact that we CONVENTIONALLY try to tell the computer what to do.
YET!
According to HIM It is more sustainable not to put all our trust in the computer but instead focus more in trying to tell to a human being what we want the computer to do.
thus
literate programming is an adequate mixture of code and a guided explanation of what we intent to do with it.
And
data/results visualization in addition now
what is the state of the art in terms of tools for literate programming?
there are mainly two approaches
executable papers
NOTEBOOKS
what is an executable paper?
it is a paper in which all results are actually outputs of executed code provided in the paper.
when accessed from an executable paper server, all the results can be re-executed by the user.
what is a notebook?
it is a digital interactive web page that can ingest text, data and code.
in addition to just ingesting them, it allows various formatting features for text.
it support various programming language thus can run the code.
it all provides some visualization features
executable papers tools
introduction and demo in three perspectives
a high level perspective
an admin perspective
a user perspective
high
https://worksheets.codalab.org/
admin
https://github.com/codalab/codalab-worksheets/wiki/Server-Setup
user
https://github.com/codalab/codalab-worksheets/wiki/Executable-Papers
high
credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier
admin
user
high
credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier
user
admin
high
credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier
admin
user
high
credit: Anita de Waard, Vice President, Research Data Collaborations at Elsevier
admin
https://www.vistrails.org/usersguide/v2.2/html/vistrails_server.html
user
https://www.vistrails.org/usersguide/v2.2/html/latex.html
Notebooks tools
introduction and demo in three perspectives
a high level perspective
an admin perspective
a user perspective
high
http://jupyter.org/
admin
Available on:
github
dockerhub
apt-get install npm nodejs-legacy
npm install -g configurable-http-proxy
pip3 install jupyterhub
pip3 install --upgrade notebook
jupyterhub
Got to: https://localhost:8000
docker pull jupyter/jupyterhub
docker run -d jupyterhub jupyterhub
Go to: http://localhost:8000
user
http://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/Notebook/Index.ipynb
high
http://zeppelin.apache.org/
admin
https://zeppelin.apache.org/docs/0.7.3/install/install.html
user
high
http://beakernotebook.com/
admin
http://beakernotebook.com/getting-started
user
http://beakernotebook.com/videos
18:00
high
https://nteract.io/
admin
https://nteract.io/desktop
user
high
http://www.joelotter.com/kajero/
admin
https://github.com/JoelOtter/kajero
user
http://www.joelotter.com/kajero/blank
what is the state of the art in terms of tools for containerization?
there are mainly two focus
container systems
hpc frameworks
what is a container system?
a container is sandbox that isolate application+data from each other.
communication and sharing is highly restricted.
within a container, an application has all its dependencies and required configurations (environment) to run properly.
what is an hpc framework?
an hpc framework is a system that automates and simplifies tasks orchestration.
schedulers such as slurm, grid engine, moab and condor are hpc frameworks.
we are talking here about hpc frameworks specialized in dealing with containers.
these two are very complimentary
creation of containers specifications
run containers
building and management of containers
orchestrate containers executions
containers images storage
SOPHISTICATED scalability and availability
containers systems
hpc frameworks
container systems/runtimes
introduction and demo in three perspectives
a high level perspective
an admin perspective
a user perspective
high
https://www.docker.com/
admin
https://www.docker.com/get-docker
user
https://docs.docker.com/get-started/part2/#define-a-container-with-a-dockerfile
high
https://coreos.com/rkt/
admin
https://github.com/rkt/rkt/blob/master/Documentation/distributions.md
user
https://coreos.com/rkt/docs/latest/getting-started-guide.html
high
https://linuxcontainers.org/
admin
https://linuxcontainers.org/lxc/getting-started/
user
high
https://www.opencontainers.org/
https://github.com/opencontainers/runc
admin
https://github.com/opencontainers/runc
user
hpc frameworks
introduction and demo in three perspectives
a high level perspective
a user perspective
high
https://github.com/docker/swarm
admin
https://docs.docker.com/swarm/plan-for-production/#swarm-manager-ha
high
https://kubernetes.io/
admin
https://kubernetes.io/docs/concepts/cluster-administration/cluster-administration-overview/
high
https://mesos.apache.org/
admin
https://mesos.apache.org/documentation/latest/architecture/
high
https://www.openstack.org/software/
admin
https://docs.openstack.org/developer/devstack/
web services you should know about
they provide open source solutions TO DISSEMINATE:
literate programming
containarization
nbviewer
we do not need to have one!
https://nbviewer.jupyter.org/
gitlab
we have one at nist!
https://gitlab.nist.gov
Github
we have usnistgov!
https://github.com/usnistgov
dockerhub
it is open access and free!
https://hub.docker.com/explore/
reproducible dissemination
By Faical Yannick Congo
reproducible dissemination
- 668