Stian Soiland-Reyes

eScience lab, The University of Manchester

@soilandreyes

https://orcid.org/0000-0001-9842-9718

https://slides.com/soilandreyes/2018-06-03-cwl

BioExcel/MolSSI workshop
2018-12-13, Barcelona

This work has been done as part of the BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contract H2020-EINFRA-2015-1-675728.

CWL considerations

- Quantify the domain where your workflow is applicable. Think of hard
metrics about the numbers your workflow operates under. e.g. How many users
can it support simultaneously, what is the throughput of jobs, how large
are the calculations you can run individually, how many of those
calculations can you run in parallel, could you run more calculations if
they were smaller.

- Tell the group which scientific areas would be interested in your
workflow. Are you better suited to methods developers or black box users?
How much overlap is there between people who develop on your workflow and
the end users?

Advantages

Interoperability: not married to one wf system on one compute platform ... even Windows!

Seamless move from laptop to cluster, cloud, HPC

 

Can reuse workflow snippets and tools from GitHub

(often lacking: attribution, license)

 

Encourages best practice workflow design (reproducibility, annotation)

.. makes it harder to cheat/hack
(even JavaScript is sandboxed, can only mutate single field)

 

 

Disadvantages

Learning curve: Moving from procedural scripts to "functional" dataflow paradigm

 

Many StackOverflow questions come down to learning common design patterns

 

Syntax (CWL in YAML) was designed for interchange, not user editing

Users want more syntactic sugar (should not affect model)

--> Move to "compiler" paradigm

 

Error handling: Differences in engine implementations.

E.g. handling nulls, default values, fallback, cascading errors --> CWL v2.0

 

Implementation zoo: Varying degree of complexity, usability, scalability

.. how to pick CWL engine for your compute needs?

Biggest challenges

What problems are your users facing in your software that they
have explicitly expressed. E.g. In person, GitHub Issues, Slack
communications, questions at conferences, etc.

- What are your sustainability plans? What is the "Bus Factor" (
https://en.wikipedia.org/wiki/Bus_factor) for your project? Does your
project have a designed termination?

- How do you promote yourself? whats your marketing strategy? What do your
users tell you about how they found you?

 

2018-12-13 CWL Considerations

By Stian Soiland-Reyes

2018-12-13 CWL Considerations

  • 1,925