Stian Soiland-Reyes (helped by Robin Long & Michael Crusoe)

eScience lab, The University of Manchester

@soilandreyes

https://orcid.org/0000-0001-9842-9718

https://slides.com/soilandreyes/

BioExcel Virtual Training
2020-04-29

This work has been done as part of the BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contracts H2020-INFRAEDI-02-2018-823830, H2020-EINFRA-2015-1-675728

Virtual Training:
Common Workflow Language

This tutorial follows the CWL User Guide

https://www.commonwl.org/user_guide/

 

For convenience we will use Virtual Machines from the

BioExcel Cloud Portal

https://bioexcel.ebi.ac.uk/

 

We will connect using Visual Studio Code and SSH.

 

Note: Cloud portal VMs are only available during Virtual Training session.

 

If you are following this tutorial at a later point,

you will need to use a local computer

where you can install cwltool and Docker

BioExcel Cloud Portal

Access to portal was available to participants subscribed to the Virtual Training session.

If you are reading this later, skip to
step #0 installation

Make sure you are on the team
CWL Training

Send questions in GoToTraining chat
Send to:Organizers

Visual Studio Code

CWL can be written using any text editor.

 

Here we'll use Visual Studio Code

as it can connect remotely to our virtual machine.

 

Download from https://code.visualstudio.com/download

and install on your machine.

 

Old-skool alternative: Use ssh to VM, run vim from shell

SSH keys

Create the virtual machine

Note: You do not need Virtual Machines to run CWL;
here we use VMs to ensure a consistent training experience.

Follow the BioExcel Cloud Portal instructions

https://longr.github.io/cwl-virtual-tutorial/build_vm.html

 

Paste in your public SSH key from before., e.g.:

ssh-rsa AAAAB3NzaC1...vr0kA2L mchssss4@ds@vm-RNNK3N1

 

Wait for VM to be deployed (5-10 mins), then verify using ssh

https://longr.github.io/cwl-virtual-tutorial/accessing_vm.html

 

 

Send questions in GoToTraining chat
Send to:Organizers

Connect to VM
from Visual Studio Code
using SSH

To access the remote machine from VS Code, install and use the extension Remote - SSH

 

Follow instructions on

https://longr.github.io/cwl-virtual-tutorial/connecting_via_vscode.html

 

Open folder
/home/ubuntu

#0 Installing locally

  • You will need a laptop or workstation where you have the privileges to install software.
    • Unprivileged Windows with domain login? Request your administrator to install Miniconda for Windows (with Python 3.7), which allows user-space installation of tools
ubuntu@tsi1588147782483-1:~/training$ python3 --version
Python 3.6.8

Install Python 3

Tip: Not all CWL implementations use Python.

cromwell use Scala

CWLEXEC use Java

 


ubuntu@tsi1588147782483-1:~/training$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Install Docker

ubuntu@tsi1588147782483-1:~/training$  pip3 install cwlref-runner

Processing ./.cache/pip/wheels/f2/5f/5f/8fc64a099199682669af6a9088cfb2f161d60570a840b0cf9e/cwlref_runner-1.0-py3-none-any.whl
Collecting cwltool
  Using cached cwltool-3.0.20200324120055-py3-none-any.whl (800 kB)
Collecting typing-extensions
  Using cached typing_extensions-3.7.4.2-py3-none-any.whl (22 kB)
Collecting pathlib2!=2.3.1
  Using cached pathlib2-2.3.5-py2.py3-none-any.whl (18 kB)
..
Collecting decorator>=4.3.0
  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Installing collected packages: typing-extensions, six, pathlib2, decorator, networkx, pyparsing, isodate, rdflib, python-dateutil, lxml, prov, bagit, idna, certifi, chardet, urllib3, requests, lockfile, rdflib-jsonld, mistune, CacheControl, ruamel.yaml.clib, ruamel.yaml, schema-salad, mypy-extensions, humanfriendly, coloredlogs, psutil, shellescape, cwltool, cwlref-runner
Successfully installed CacheControl-0.11.7 bagit-1.7.0 certifi-2020.4.5.1 chardet-3.0.4 coloredlogs-14.0 cwlref-runner-1.0 cwltool-3.0.20200324120055 decorator-4.4.2 humanfriendly-8.2 idna-2.9 isodate-0.6.0 lockfile-0.12.2 lxml-4.5.0 mistune-0.8.4 mypy-extensions-0.4.3 networkx-2.4 pathlib2-2.3.5 prov-1.5.1 psutil-5.7.0 pyparsing-2.4.7 python-dateutil-2.8.1 rdflib-4.2.2 rdflib-jsonld-0.5.0 requests-2.23.0 ruamel.yaml-0.16.5 ruamel.yaml.clib-0.2.0 schema-salad-5.0.20200416112825 shellescape-3.4.1 six-1.14.0 typing-extensions-3.7.4.2 urllib3-1.25.9wl_bioexcel/lib/python3.6/site-packages (from networkx->prov==1.5.1->cwltool->cwlref-runner) (4.4.2)

ubuntu@tsi1588147782483-1:~/training$ cwltool --version
/home/ubuntu/cwl_bioexcel/bin/cwltool 2.0.20200126090152

ubuntu@tsi1588147782483-1:~/training$ cwl-runner --version
/home/ubuntu/cwl_bioexcel/bin/cwl-runner 2.0.20200126090152

Install reference runner cwltool

Text

Note: Other CWL engines (e.g. toil) installed the same Python environment may lock cwltool to an older version

ubuntu@tsi1588147782483-1:~/training$ pip3 install toil[cwl]
Collecting toil[cwl]
  Using cached toil-4.0.0-py3-none-any.whl (464 kB)
Collecting pytz>=2012
  Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting docker==2.5.1
  Using cached docker-2.5.1-py2.py3-none-any.whl (111 kB)
Collecting pathlib2==2.3.2
  Using cached pathlib2-2.3.2-py2.py3-none-any.whl (16 kB)
Processing ./.cache/pip/wheels/6e/9c/ed/4499c9865ac1002697793e0ae05ba6be33553d098f3347fb94/future-0.18.2-py3-none-any.whl
Processing ./.cache/pip/wheels/01/63/4e/4513b03a36916a4988ba9dd0c0483e30f4973cc4b4ba56fb53/addict-2.2.0-py3-none-any.whl
Processing ./.cache/pip/wheels/a1/d9/f2/b5620c01e9b3e858c6877b1045fda5b115cf7df6490f883382/psutil-5.7.0-cp36-cp36m-linux_x86_64.whl
Collecting six>=1.10.0
  Using cached six-1.14.0-py2.py3-none-any.whl (10 kB)
Collecting requests<3,>=2
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting decorator>=4.3.0
  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Installing collected packages: pytz, six, websocket-client, docker-pycreds, certifi, chardet, idna, urllib3, requests, docker, pathlib2, future, addict, psutil, dill, python-dateutil, markupsafe, pyparsing, packaging, boltons, repoze.lru, routes, docutils, pyyaml, webencodings, bleach, galaxy-util, galaxy-containers, galaxy-tool-util, bagit, humanfriendly, coloredlogs, lxml, isodate, rdflib, decorator, networkx, prov, ruamel.yaml.clib, ruamel.yaml, shellescape, typing-extensions, mistune, rdflib-jsonld, lockfile, CacheControl, schema-salad, mypy-extensions, cwltool, toil
Successfully installed CacheControl-0.11.7 addict-2.2.0 bagit-1.7.0 bleach-3.1.4 boltons-20.1.0 certifi-2020.4.5.1 chardet-3.0.4 coloredlogs-14.0 cwltool-2.0.20200126090152 decorator-4.4.2 dill-0.2.7.1 docker-2.5.1 docker-pycreds-0.4.0 docutils-0.16 future-0.18.2 galaxy-containers-19.9.0 galaxy-tool-util-19.9.1 galaxy-util-19.9.0 humanfriendly-8.2 idna-2.9 isodate-0.6.0 lockfile-0.12.2 lxml-4.5.0 markupsafe-1.1.1 mistune-0.8.4 mypy-extensions-0.4.3 networkx-2.4 packaging-20.3 pathlib2-2.3.2 prov-1.5.1 psutil-5.7.0 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2020.1 pyyaml-5.3.1 rdflib-4.2.2 rdflib-jsonld-0.5.0 repoze.lru-0.7 requests-2.23.0 routes-2.4.1 ruamel.yaml-0.16.5 ruamel.yaml.clib-0.2.0 schema-salad-5.0.20200416112825 shellescape-3.4.1 six-1.14.0 toil-4.0.0 typing-extensions-3.7.4.2 urllib3-1.25.9 webencodings-0.5.1 websocket-client-0.57.0

ubuntu@tsi1588147782483-1:~/training$ toil --version
4.0.0

ubuntu@tsi1588147782483-1:~$ toil-cwl-runner --version
4.0.0

or toil with CWL support

ubuntu@tsi1588147782483-1:~$ virtualenv -p python3 ~/toil
Running virtualenv with interpreter /home/ubuntu/cwl_bioexcel/bin/python3
Using real prefix '/usr'
Path not in prefix '/home/ubuntu/cwl_bioexcel/include/python3.6m' '/usr'
New python executable in /home/ubuntu/toil/bin/python3
Also creating executable in /home/ubuntu/toil/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

ubuntu@tsi1588147782483-1:~$ . ~/toil/bin/activate
(toil) ubuntu@tsi1588147782483-1:~$


(toil) ubuntu@tsi1588147782483-1:~$ pip3 install toil[cwl]
Collecting toil[cwl]
  Using cached toil-4.0.0-py3-none-any.whl (464 kB)
Collecting pytz>=2012
  Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
...
Collecting requests<3,>=2
  Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB)
  Collecting decorator>=4.3.0
  Using cached decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Installing collected packages: pytz, six, websocket-client, docker-pycreds, certifi, chardet, idna, urllib3, requests, docker, pathlib2, future, addict, psutil, dill, python-dateutil, markupsafe, pyparsing, packaging, boltons, repoze.lru, routes, docutils, pyyaml, webencodings, bleach, galaxy-util, galaxy-containers, galaxy-tool-util, bagit, humanfriendly, coloredlogs, lxml, isodate, rdflib, decorator, networkx, prov, ruamel.yaml.clib, ruamel.yaml, shellescape, typing-extensions, mistune, rdflib-jsonld, lockfile, CacheControl, schema-salad, mypy-extensions, cwltool, toil
Successfully installed CacheControl-0.11.7 addict-2.2.0 bagit-1.7.0 bleach-3.1.4 boltons-20.1.0 certifi-2020.4.5.1 chardet-3.0.4 coloredlogs-14.0 cwltool-2.0.20200126090152 decorator-4.4.2 dill-0.2.7.1 docker-2.5.1 docker-pycreds-0.4.0 docutils-0.16 future-0.18.2 galaxy-containers-19.9.0 galaxy-tool-util-19.9.1 galaxy-util-19.9.0 humanfriendly-8.2 idna-2.9 isodate-0.6.0 lockfile-0.12.2 lxml-4.5.0 markupsafe-1.1.1 mistune-0.8.4 mypy-extensions-0.4.3 networkx-2.4 packaging-20.3 pathlib2-2.3.2 prov-1.5.1 psutil-5.7.0 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2020.1 pyyaml-5.3.1 rdflib-4.2.2 rdflib-jsonld-0.5.0 repoze.lru-0.7 requests-2.23.0 routes-2.4.1 ruamel.yaml-0.16.5 ruamel.yaml.clib-0.2.0 schema-salad-5.0.20200416112825 shellescape-3.4.1 six-1.14.0 toil-4.0.0 typing-extensions-3.7.4.2 urllib3-1.25.9 webencodings-0.5.1 websocket-client-0.57.0

Separate Python environments using virtualenv

(cwl) ubuntu@tsi1588147782483-1:~$ . ~/toil/bin/activate

(toil) (cwl) ubuntu@tsi1588147782483-1:~$ type toil-cwl-runner 
toil-cwl-runner is /home/ubuntu/toil/bin/toil-cwl-runner
(base) ubuntu@tsi1588147782483-1:~$ conda create -n cwl cwltool
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
...
  xorg-xextproto     conda-forge/linux-64::xorg-xextproto-7.3.0-h14c3975_1002
  xorg-xproto        conda-forge/linux-64::xorg-xproto-7.0.31-h14c3975_1007
  xz                 conda-forge/linux-64::xz-5.2.5-h516909a_0
  zlib               conda-forge/linux-64::zlib-1.2.11-h516909a_1006
  zstd               conda-forge/linux-64::zstd-1.4.4-h6597ccf_3


Proceed ([y]/n)?  y


Downloading and Extracting Packages
libpng-1.6.37        | 308 KB    | ######################################################## | 100% 
shellescape-3.4.1    | 7 KB      | ######################################################## | 100% 
libuuid-2.32.1       | 26 KB     | ######################################################## | 100% 
decorator-4.4.2      | 11 KB     | ######################################################## | 100% 
pathlib2-2.3.5       | 34 KB     | ######################################################## | 100% 
readline-8.0         | 441 KB    | ######################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate cwl

Separate environments using Conda

(base) ubuntu@tsi1588147782483-1:~$ conda activate cwl

(cwl) ubuntu@tsi1588147782483-1:~/training$ type cwltool
cwltool is /home/ubuntu/miniconda3/envs/cwl/bin/cwltool

(cwl) ubuntu@tsi1588147782483-1:~/training$ cwltool --version
/home/ubuntu/miniconda3/envs/cwl/bin/cwltool 3.0.20200317203547

(cwl) ubuntu@tsi1588147782483-1:~/training$ cwl-runner --version
/home/ubuntu/cwl_bioexcel/bin/cwl-runner 2.0.20200126090152

#1 YAML and CWL

CWL is written in a text file format called YAML

 

YAML is similar to JSON, in that it can make object structures of

  • key:value pairs
  • [lists]
  • "strings"
  • 1,2,3  numbers

 

YAML syntax is intended for writing rather than parsing, and so you can skip most of the JSON characters and use indentation blocks instead

 

It is therefore important that you pay attention to consistent indentation when working with this tutorial.

 

#2 Writing your first CWL tool

Exercise

  1. Create and run the example with 1st-tool.cwl inp-job.yml
  2. Try to run just cwltool 1st-tool.cwl (without  inp-job.yml)
    1. Can you figure out another way to run the workflow?

Challenge

  1. Make 1st-tool.cwl executable using chmod and try to run it directly
  2. Copy and modify 1st-tool.cwl to run "ls" for a given folder path
    1. What potential issues do you see for running this workflow using containers or cloud instances?

#3 Adding parameters

Exercise

  1. Implement and run inp.cwl / inp-job.yml
    1. Modify and test inp-job2.yml to reference a File that does not exist
    2. Modify and test inp-job2.yml with a non-number for example_int
    3. The inp.cwl "cheated" as echo takes no particular parameters.
      Copy and modify the inp.cwl example to ps.cwl - running a different Unix command that takes different parameters:
      1. ps -e -f
      2. ps -u root
    4. Try to give the parameters more "user friendly" names.
      1. Verify with cwltool ps.cwl --help

Challenge

  1. Try to capture the stdout of running the workflow
    1. Why does it not look as expected?

#4 Capturing output

Exercise

  1. Implement tar.cwl and tar-job.yml from example
  2. Create the tar file for testing
  3. Run the workflow
  4. Try using cwltool --outdir to output files in a subdirectory

Challenge

  1. Capture stdout of running the workflow
    1.  what does it show?
    2. Why do you think the content of hello.txt is not shown on stdout?

#5 Capturing stdout

Exercise

  1. Implement and run stdout.cwl
  2. Copy stdout.cwl to grep.cwl, and modify to run a command line like
    grep pattern /filename
  3. Make and run a new job file for grep.cwl to find the line "path: whale.txt" in inp-job.yml

Break

 

Not tired?

You can either work on the Challenges before,
or follow step #0 to install locally

Continue user guide at own pace from https://www.commonwl.org/user_guide/07-containers/ onwards.

 

Take particular notice of how to write a workflow and iterate with scattering

2020-04-29 Virtual Training CWL tutorial

By Stian Soiland-Reyes

2020-04-29 Virtual Training CWL tutorial

BioExcel Virtual Training, 2020-04-29

  • 2,578