https://esciencelab.org.uk/

1998–2004

https://soiland.no/archive-2002/

Trondheim: A lot of coffee, cables, coding, ...

LAN party, TG99

https://blogs.opera.com/news/2019/10/world-browser-day-oct-16-oslo-plaque/

2000: ..a lot more Web!

Working door-to-door with ground breaking Web start-ups

Linpro made open source Web solutions

Håkon Wium Lie at Opera invented CSS

TrollTech made QT incl. KHTML aka WebKit aka Chrome/Safari/Edge

Waldemar Thranes gate 98, Oslo

Web standards

2004–2006

Travelling and reproducibility

Trondheim → Birmingham → Manchester

As a computer science MSc student
I need to Reproduce a neural network model of a brain structure

so that I can use it in a robot that could learn sequences to navigating a maze

https://doi.org/10.1162/089892998563815

AI before it become a buzzword (again)

Computational reproducibility (~2004)

Paper: No code, only equations.

Implementations details missing (e.g. noise)

Can you really trust a C++ reimplementation?

def calc_GP(self, i):
  sum = 0.0
  for j in range(self.inputs*2):
    sum += self.w[i,j] * self.STN[j]
  noise = random.uniform(-0.25, 0.25)
  result = sum - self.effect * self.STR[i] + noise
  return self.sigmoid(result)

First time travel

It is the view of this author that the computer science community has an tendency for preferring mathematics in papers even when a source code extract could provide a better explanation, be more understandable and known to be correct.

The lack of included source code leads to incomplete publications that can be hard or impossible to reproduce. This problem is, to a great extent, more present in computer science than in other sciences.

Søiland (2006)

https://doi.org/11250/251073

Noy & Goble (2022):
https://doi.org/10.1145/3528574

Computer Science is still behind other disciplines in thinking about reproducibility and data sharing

2006–

Time travels in Manchester

2008

Needed domain name, got a new surname

http://soiland-reyes.com/

Carole Goble, Rishi, Donal

Stuart

Paolo

Mannie

Sean

Alex

Rob

Antoon

Jits

Alan

Katy

Pinar

Matt

Alasdair

Robert

Finn

Khalid

..before it became the "mega lab"

Shoaib

Workflow systems

Can we capture and share the computational method?

Taverna (2003—2020)

https://esciencelab.org.uk/products/taverna/

https://doi.org/10.48546/workflowhub.workflow.876.1

Workflows combine computational tools almost like LEGO blocks

Permalink: https://s.apache.org/existing-workflow-systems

Computational Data Analysis Workflow Systems

An incomplete list

Please add new entries at the bottom. Try to include links to website and/or source code and ideally DOI for publications.

Arvados - CWL-based distributed computing platform for data analysis on massive data sets. https://arvados.org/ https://github.com/arvados/arvados
Apache Taverna http://www.taverna.org.uk/ https://taverna.incubator.apache.org/
Galaxy http://galaxyproject.org/
SHIWA https://www.shiwa-workflow.eu/
Apache Oozie https://oozie.apache.org/
DNANexus ~~https://wiki.dnanexus.com/API-Specification-v1.0.0/IO-and-Run-Specifications~~ https://documentation.dnanexus.com/developer/api/running-analyses/io-and-run-specifications ~~https://wiki.dnanexus.com/API-Specification-v1.0.0/Workflows-and-Analyses~~ https://documentation.dnanexus.com/developer/workflows
BioDT http://www.biodatomics.com/ archived at https://web.archive.org/web/20180609011656/http://www.biodatomics.com/
Agave ~~http://agaveapi.co/live-docs/~~ https://docs.agaveapi.com/
DiscoveryEnvironment http://www.iplantcollaborative.org/ci/discovery-environment
Wings http://www.wings-workflows.org/
Knime https://www.knime.org/
make, rake, drake, ant, scons & many others. Software development relies heavily on tools to manage workflows related to compiling and packaging applications. For the most part these are file based and usually run on a single node, usually supporting parallel steps (make -j) and in some cases able to dispatch build steps to other machines https://code.google.com/p/distcc/ https://github.com/Factual/drake

Permalink: https://s.apache.org/existing-workflow-systems

Computational Data Analysis Workflow Systems

An incomplete list

Please add new entries at the bottom. Try to include links to website and/or source code and ideally DOI for publications.

In addition to this list, actively developed free/open-source systems should be registered at https://workflows.community/systems

Arvados - CWL-based distributed computing platform for data analysis on massive data sets. https://arvados.org/ https://github.com/arvados/arvados
Apache Taverna http://www.taverna.org.uk/ https://taverna.incubator.apache.org/
Galaxy http://galaxyproject.org/
SHIWA https://www.shiwa-workflow.eu/
Apache Oozie https://oozie.apache.org/
DNANexus ~~https://wiki.dnanexus.com/API-Specification-v1.0.0/IO-and-Run-Specifications~~ https://documentation.dnanexus.com/developer/api/running-analyses/io-and-run-specifications ~~https://wiki.dnanexus.com/API-Specification-v1.0.0/Workflows-and-Analyses~~ https://documentation.dnanexus.com/developer/workflows
BioDT http://www.biodatomics.com/ archived at https://web.archive.org/web/20180609011656/http://www.biodatomics.com/
Agave ~~http://agaveapi.co/live-docs/~~ https://docs.agaveapi.com/
DiscoveryEnvironment http://www.iplantcollaborative.org/ci/discovery-environment
Wings http://www.wings-workflows.org/
Knime https://www.knime.org/
make, rake, drake, ant, scons & many others. Software development relies heavily on tools to manage workflows related to compiling and packaging applications. For the most part these are file based and usually run on a single node, usually supporting parallel steps (make -j) and in some cases able to dispatch build steps to other machines https://code.google.com/p/distcc/ https://github.com/Factual/drake
Snakemake https://snakemake.github.io
BPipe http://bpipe.org http://docs.bpipe.org/ https://doi.org/10.1093/bioinformatics/bts167
Ruffus https://github.com/cgat-developers/ruffus
NextFlow http://nextflow.io
Luigi. Python package that helps you build complex pipelines of batch jobs http://github.com/spotify/luigi https://luigi.readthedocs.io
SciLuigi. Helper library built on top of Luigi to ease development of Scientific workflows in Luigi: http://github.com/pharmbio/sciluigi
Luigi Analysis Workflow (LAW) https://github.com/riga/law
GATK Queue https://www.broadinstitute.org/gatk/guide/topic?name=queue
Yabi https://ccg.murdoch.edu.au/yabi
seqware Workflows are written in Java and executed using the Oozie Workflow Engine on Hadoop or SGE clusters. Uses Zip64 files to group the workflow definition file, workflow itself, sample settings, and data dependencies in a single file that can be exchanged between SeqWare users or archived. https://seqware.github.io/ https://seqware.github.io/docs/6-pipeline/
Ketrew https://github.com/hammerlab/ketrew
Pegasus http://pegasus.isi.edu/
Apache Airflow https://github.com/apache/airflow
Couler https://github.com/couler-proj/couler - Unified interface for constructing and managing workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
Cosmos https://cosmos.hms.harvard.edu/documentation/index.html https://doi.org/10.1093/bioinformatics/btu385 Cosmos2: https://github.com/LPM-HMS/COSMOS2 http://cosmos.hms.harvard.edu/COSMOS2/
Pinball https://github.com/pinterest/pinball
bcbio https://bcbio-nextgen.readthedocs.org/en/latest/
Chronos https://github.com/mesos/chronos
Azkaban https://azkaban.github.io/
Apache NiFi https://nifi.apache.org/docs/nifi-docs/html/overview.html
flowr (R-based) http://docs.flowr.space/ https://github.com/sahilseth/flowr
Mistral https://github.com/arteria-project https://wiki.openstack.org/wiki/Mistral#What_is_Mistral.3F https://docs.openstack.org/mistral/latest/user/wf_lang_v2.html
nipype http://nipy.org/nipype/
End of Day https://github.com/joestubbs/endofday
BioDSL https://github.com/maasha/BioDSL
bds (BigDataScript) https://pcingola.github.io/bds/ https://doi.org/10.1093/bioinformatics/btu595
Omics Pipe: uses Ruffus http://sulab.scripps.edu/omicspipe/
Ensembl Hive https://github.com/Ensembl/ensembl-hive
QuickNGS http://bifacility.uni-koeln.de/quickngs/web
GenePattern http://www.broadinstitute.org/cancer/software/genepattern/
Chipster http://chipster.csc.fi/
The Genome Modeling System https://github.com/genome/gms
Cuneiform, A Functional Workflow Language https://github.com/joergen7/cuneiform http://www.cuneiform-lang.org/ https://doi.org/10.1017/S0956796817000119
Anvaya https://doi.org/10.1142/s0219720012500060 http://webapp.cabgrid.res.in/biocomp/Anvaya/ANVAYA_Main.html#HOWTO_INSTALL_ANVAYA
Makeflow http://ccl.cse.nd.edu/software/makeflow/
Apache Airavata http://airavata.apache.org/
Pyflow https://github.com/Illumina/pyflow
Cluster Flow http://clusterflow.io
Unipro UGENE http://ugene.net/ https://doi.org/10.7717/peerj.644
CloudSlang http://www.cloudslang.io/
Stacks http://catchenlab.life.illinois.edu/stacks/
Leaf http://www.francesconapolitano.it/leaf/index.html
omictools http://omictools.com/
Job Description Language. The Job Description Language, JDL, is a high-level, user-oriented language based on Condor classified advertisements for describing jobs and aggregates of jobs such as Direct Acyclic Graphs and Collections. https://edms.cern.ch/ui/file/590869/1/WMS-JDL.pdf
YAWL yet another workflow language https://doi.org/10.1016/j.is.2004.02.002 http://www.yawlfoundation.org/
Triquetrum https://projects.eclipse.org/projects/technology.triquetrum https://github.com/eclipse/triquetrum/
Kronos https://github.com/jtaghiyar/kronos
qsubsec https://github.com/alastair-droop/qsubsec https://doi.org/10.1093/bioinformatics/btv698
YesWorkflow http://yesworkflow.org
gwf - Grid WorkFlow https://github.com/gwforg/gwf http://gwf.readthedocs.io/
Fireworks. https://github.com/materialsproject/fireworks https://github.com/materialsproject/fireworks https://doi.org/10.1002/cpe.3505
NGLess: NGS with less work http://ngless.rtfd.io
pypipegraph https://github.com/TyberiusPrime/pypipegraph
Cromwell https://github.com/broadinstitute/cromwell
Dagobah - Simple DAG-based job scheduler in Python. https://github.com/thieman/dagobah
sushi https://github.com/uzh/sushi
Clinical Trial Processor - A program for processing clinical trials data. http://mircwiki.rsna.org/index.php?title=MIRC_CTP
Noodles http://nlesc.github.io/noodles/
Swift & Swift/T http://swift-lang.org/main/ http://swift-lang.org/Swift-T/
Consonance (runs SeqWare & CWL) https://github.com/Consonance/consonance/wiki
Dog https://github.com/dogtools/dog
Produce https://github.com/texttheater/produce
LONI Pipeline http://pipeline.loni.usc.edu/
Cpipe https://github.com/MelbourneGenomics/cpipe
AWE https://github.com/MG-RAST/AWE
(Py)COMPSs https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/
KLIKO https://github.com/gijzelaerr/kliko
SoS Workflow https://github.com/vatlab/SoS https://vatlab.github.io/sos-docs/ https://doi.org/10.1093/bioinformatics/bty405 https://doi.org/10.1371/journal.pcbi.1006843
XNAT Pipeline Engine https://wiki.xnat.org/display/XNAT/Pipeline+Engine https://wiki.xnat.org/display/XNAT/XNAT+Pipeline+Development+Schema
Metapipe https://github.com/TorkamaniLab/metapipe
OCCAM (Open Curation for Computer Architecture Modeling) https://occam.cs.pitt.edu/
Copernicus http://www.copernicus-computing.org
iRODS Rule Language https://github.com/samuell/irods-cheatsheets/blob/master/irods-rule-lang-full-guide.md
VisTrails https://www.vistrails.org
Bionode Watermill https://github.com/bionode/bionode-watermill
BIOVIA Pipeline Pilot Overview http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/
Dagman A meta-scheduler for HTCondor https://research.cs.wisc.edu/htcondor/dagman/dagman.html
UNICORE https://www.unicore.eu/docstore/workflow-7.6.0/workflow-manual.html#wf_dialect
Toil (A scalable, efficient, cross-platform and easy-to-use workflow engine in pure Python) https://github.com/BD2KGenomics/toil
Cylc (a workflow engine for cycling systems) https://cylc.github.io/
Autodesk Cloud Compute Canon https://github.com/Autodesk/cloud-compute-cannon
Civet https://github.com/TheJacksonLaboratory/civet
Cumulus https://github.com/Kitware/cumulus
High-performance integrated virtual environment (HIVE) https://hive.biochemistry.gwu.edu
Cloudgene http://cloudgene.uibk.ac.at/cloudgene-yaml
FASTR https://bitbucket.org/bigr_erasmusmc/fastr/ http://fastr.readthedocs.io/en/stable/
BioMake https://github.com/evoldoers/biomake https://doi.org/10.1101/093245
remake https://github.com/richfitz/remake
SciFloware http://www-sop.inria.fr/members/Didier.Parigot/pmwiki/Scifloware/
OpenAlea https://openalea.rtfd.io https://hal.archives-ouvertes.fr/hal-01166298/file/openalea-PradalCohen-Boulakia.pdf
COMBUSTI/O https://github.com/jarlebass/combustio http://hdl.handle.net/10037/9361
BioCloud https://github.com/ccwang002/biocloud-server-kai https://doi.org/10.6342/NTU201601295
Triana http://www.trianacode.org/ (website seems to have been taken over by SEO)
Kepler https://kepler-project.org/
Anduril http://anduril.org/site/
dgsh http://www.dmst.aueb.gr/dds/sw/dgsh/
EDGE bioinformatics: Empowering the Development of Genomics Expertise https://bioedge.lanl.gov/edge_ui/ http://edge.readthedocs.io/ https://lanl-bioinformatics.github.io/EDGE/
Pachyderm http://pachyderm.io/ http://pachyderm.readthedocs.io/en/stable/advanced/advanced.html
Digdag https://www.digdag.io/
Agua / Automated Genomics Utilities Agent http://aguadev.org
BioDepot Workflow Builder (BwB) https://github.com/BioDepot/BioDepot-workflow-builder https://doi.org/10.1101/099010
IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses http://r3lab.uni.lu/web/imp/ https://doi.org/10.1186/s13059-016-1116-8
Butler https://github.com/llevar/butler
adage / yadage https://github.com/diana-hep/adage https://github.com/diana-hep/yadage
HI-WAY: Execution of Scientific Workflows on Hadoop YARN https://github.com/marcbux/Hi-WAY https://openproceedings.org/2017/conf/edbt/paper-248.pdf
OpenMOLE https://github.com/openmole/openmole https://www.openmole.org/ https://doi.org/10.3389/fninf.2017.00021
Biopet https://github.com/biopet/biopet
Nephele https://nephele.niaid.nih.gov/
TOPPAS https://doi.org/10.1021/pr300187f
SBpipe https://pdp10.github.io/sbpipe/ https://github.com/pdp10/sbpipe https://doi.org/10.1186/s12918-017-0423-3
Dray http://dray.it/ (website seems to have been taken over by SEO)
GenomeVIP https://github.com/ding-lab/GenomeVIP https://doi.org/10.1101/gr.211656.116
GridSAM https://sourceforge.net/projects/gridsam/
Roddy https://github.com/eilslabs/Roddy
SciFlo (historical; doesn't seem to be maintained anymore) https://web.archive.org/web/20161118011409/https://sciflo.jpl.nasa.gov/SciFloWiki/FrontPage
GNU Guix Workflow Language https://git.roelj.com/guix/gwl.git#gnu-guix-workflow-language-extension https://github.com/UMCUGenetics/guix-workflows/blob/master/umcu/workflows/rnaseq.scm
Porcupine https://timvanmourik.github.io/Porcupine/
Parsl (a Parallel Scripting Library for Python) http://parsl-project.org https://doi.org/10.1145/3307681.3325400
ECFLOW (Workflow primarily for Meteorological Applications) https://software.ecmwf.int/wiki/display/ECFLOW/ecflow+home
Ophidia http://ophidia.cmcc.it/
WebLicht https://weblicht.sfs.uni-tuebingen.de/
GATE Cloud https://cloud.gate.ac.uk/
SCIPION http://scipion.cnb.csic.es/m/home/ https://github.com/I2PC/scipion/wiki/Creating-a-Protocol
Ergatis http://ergatis.sourceforge.net/
TIGR "Workflow" https://sourceforge.net/projects/tigr-workflow/ http://tigr-workflow.sourceforge.net/
Archivematica https://wiki.archivematica.org/Main_Page (A preservation workflow system that implements the ISO-OAIS standard using gearman/MCP)
Martian http://martian-lang.org/about/
BioMAJ http://genouest.github.io/biomaj/
Conveyor http://conveyor.cebitec.uni-bielefeld.de (retired). https://doi.org/10.1093/bioinformatics/btr040
Biopipe http://www.biopipe.org (appears to be defunct) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC403782/
Wildfire http://wildfire.bii.a-star.edu.sg/ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-69
BioWBI https://bioinformatics.hsanmartino.it/bits_library/library/00079.pdf
BioWMS https://bioinformatics.hsanmartino.it/bits_library/library/00568.pdf
BioMoby http://biomoby.open-bio.org/ https://doi.org/10.1186/1471-2105-7-523
SIBIOS http://ieeexplore.ieee.org/document/1309094/
NGSANE https://github.com/BauerLab/ngsane https://doi.org/10.1093/bioinformatics/btu036
Pwrake https://github.com/misshie/Workflows https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180464/
Nesoni https://github.com/Victorian-Bioinformatics-Consortium/nesoni
Skam http://skam.sourceforge.net/skam-intro.html
TREVA http://bioinformatics.petermac.org/treva/ https://doi.org/10.1371/journal.pone.0095217
EGene https://www.semanticscholar.org/paper/EGene-a-configurable-pipeline-generation-system-fo-Durham-Kashiwabara/4c0656195b5efcdd3aa7bdcb55fc95a957c150aa https://doi.org/10.1093/bioinformatics/btu366
WEP https://bioinformatics.cineca.it/wep/ https://doi.org/10.1186/1471-2105-14-S7-S11
Microbase http://www.microbasecloud.com/
e-Science Central http://www.esciencecentral.co.uk/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538293/
Cyrille2 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-96
PaPy https://code.google.com/archive/p/papy/ https://doi.org/10.1186/1471-2105-12-61
JobCenter https://github.com/yeastrc/jobcenter https://doi.org/10.1186/1751-0473-7-8
CoreFlow https://doi.org/10.1016/j.jprot.2014.01.023
dynamic-pipeline https://code.google.com/archive/p/dynamic-pipeline/
XiP http://xip.hgc.jp/wiki/en/Main_Page https://doi.org/10.1093/bioinformatics/bts630
Eoulsan http://www.outils.genomique.biologie.ens.fr/eoulsan/ https://doi.org/10.1093/bioinformatics/bts165
CloudDOE http://clouddoe.iis.sinica.edu.tw/
BioPig https://github.com/JGI-Bioinformatics/biopig https://doi.org/10.1093/bioinformatics/btt528
SeqPig https://github.com/HadoopGenomics/SeqPig https://doi.org/10.1093/bioinformatics/btt601
zymake http://www-personal.umich.edu/~ebreck/code/zymake/
JMS https://github.com/RUBi-ZA/JMS https://doi.org/10.1371/journal.pone.0134273
CLC Genomics Workbench https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/
NG6 http://ng6.toulouse.inra.fr/ https://doi.org/10.1186/1471-2164-13-462
VIBE http://www.incogen.com/vibe/
WDL (Workflow Description Language) https://github.com/broadinstitute/wdl
SciFlow https://github.com/kaizhang/SciFlow (not to be confused with SciFloware and SciFlo).
Bioshake https://github.com/PapenfussLab/bioshake https://doi.org/10.7717/peerj.7223
SciPipe http://scipipe.org
Kapacitor / TICKscripts https://docs.influxdata.com/kapacitor/v1.3/tick/
AiiDA: Automated Interactive Infrastructure and Database for Computational Science http://www.aiida.net/ https://doi.org/10.1016/j.commatsci.2015.09.013
Reflow: a language and runtime for distributed, integrated data processing in the cloud https://github.com/grailbio/reflow
Resolwe: an open source dataflow package for Django framework https://github.com/genialis/resolwe
Yahoo! Pipes (historical) https://en.wikipedia.org/wiki/Yahoo!_Pipes
Walrus https://github.com/fjukstad/walrus
Apache Beam https://beam.apache.org/
CLOSHA https://closha.kobic.re.kr/ https://www.bioexpress.re.kr/go_tutorial http://docplayer.net/19700397-Closha-manual-ver1-1-kobic-korean-bioinformation-center-kogun82-kribb-re-kr-2016-05-08-bioinformatics-workflow-management-system-in-bio-express.html https://doi.org/10.1186/s12859-018-2019-3
WopMars https://github.com/aitgon/wopmars http://wopmars.readthedocs.io/
flowing-clj https://github.com/stain/flowing-clj
Plumbing and Graph https://github.com/plumatic/plumbing
LabView http://www.ni.com/en-us/shop/labview.html
MyOpenLab http://myopenlab.org/
Max/MSP https://cycling74.com/products/max/
NoFlo https://noflojs.org/
Flowstone http://www.dsprobotics.com/flowstone.html
HyperLoom https://code.it4i.cz/ADAS/loom https://code.it4i.cz/ADAS/loom
Dask http://dask.pydata.org/en/latest/ https://github.com/dask/dask
Stimela https://github.com/SpheMakh/Stimela https://github.com/SpheMakh/Stimela/wiki https://www.acru.ukzn.ac.za/~cosmosafari2017/wp-content/uploads/2017/02/makhathini.pdf
JTracker https://jtracker.io/ https://github.com/jtracker-io
PipelineDog http://pipeline.dog/ https://github.com/zhouanbo/pipelinedog https://doi.org/10.1093/bioinformatics/btx759
DALiuGE https://arxiv.org/abs/1702.07617 https://github.com/ICRAR/daliuge https://daliuge.readthedocs.io/
Overseer https://github.com/framed-data/overseer
Squonk https://squonk.it/
GC3Pie https://github.com/uzh/gc3pie
Fractalide https://github.com/fractalide/fractalide
TOGGLe http://toggle.southgreen.fr/ https://doi.org/10.1101/245480
Askalon http://www.askalon.org
Eclipse ICE (The Integrated Computational Environment) https://www.eclipse.org/ice
Sandia Analysis Workbench (SAW) http://www.sandia.gov/saw/
dispel4py https://github.com/dispel4py/dispel4py
Jobber https://pypi.python.org/pypi/Jobber/0.1.4
NeatSeq-Flow http://neatseq-flow.readthedocs.io/
S4M https://bitbucket.org/uqokorn/s4m_base/wiki/Home
Loom http://med.stanford.edu/gbsc/loom.html https://github.com/StanfordBioinformatics/loom http://loom.readthedocs.io/en/latest/templates.html
Watchdog - XML Schema-based bioinformatics workflow system https://www.bio.ifi.lmu.de/software/watchdog/ https://doi.org/10.1186/s12859-018-2107-4 https://github.com/klugem/watchdog
phpflo https://github.com/phpflo/phpflo
BASTet: Berkeley Analysis and Storage Toolkit https://openmsi.nersc.gov/openmsi/client/bastet.html https://biorack.github.io/BASTet/ https://doi.org/10.1109/TVCG.2017.2744479
Tavaxy: Pattern based workflow system for the bioinformatics domain http://www.tavaxy.org/
Ginflow: Decentralised adaptive workflow engine https://ginflow.inria.fr/
SciApps: A cloud-based platform for reproducible bioinformatics workflows https://doi.org/10.1093/bioinformatics/bty439 https://www.sciapps.org/
Stoa: Script Tracking for Observational Astronomy https://github.com/petehague/Stoa
Collective Knowledge (CK) framework http://cknowledge.org/
QosCosGrid (QCG) http://www.qoscosgrid.org/ http://www.qoscosgrid.org/trac/qcg-broker/wiki/qcg-advanced-client%20
High-Throughput Binding Affinity Calculator (HTBAC) https://htbac.readthedocs.io/en/latest/ https://github.com/radical-cybertools/htbac https://arxiv.org/abs/1801.01174
BioWorkbench (Swift-based) https://arxiv.org/abs/1801.03915 https://github.com/mmondelli/bioworkbench
ENVI Task Engine https://gbdxdocs.digitalglobe.com/docs/envi-task-engine https://www.harrisgeospatial.com/Learn/Whitepapers/TabId/2359/ArtMID/10212/ArticleID/17299/Workflow-Tools-in-ENVI.aspx https://envi-py-engine.readthedocs.io/en/latest/index.html
Pypeline https://github.com/cgarciae/pypeln
mpipe http://vmlaker.github.io/mpipe/
idseq-dag https://github.com/chanzuckerberg/idseq-dag
Piper (based upon GATK Queue) https://github.com/NationalGenomicsInfrastructure/piper
Apache Object Oriented Data Technology (OODT) http://oodt.apache.org/
JX Workflow (DSL for Makeflow) https://ccl.cse.nd.edu/software/manuals/jx-quick.html http://ccl.cse.nd.edu/research/papers/jx-escience-2018.pdf
The Adaptable IO System (ADIOS), ADIOS using applications can be the orchestrated into a workflow http://csmd.ornl.gov/adios
GenPipes https://bitbucket.org/mugqic/genpipes https://doi.org/10.1101/459552
Argo https://argoproj.github.io/ https://github.com/argoproj/argo https://github.com/argoproj/argo/blob/master/examples/README.md
Reana https://reana.readthedocs.io/en/latest/ https://github.com/reanahub/reana
Cuisine Framework https://www.astron.nl/~renting/cuisine.html
Niassa https://github.com/oicr-gsi/niassa https://oicr-gsi.github.io/niassa-docs/
pypeFLOW https://github.com/PacificBiosciences/pypeFLOW
Tiny Cloud Engine http://ka.cb.k.u-tokyo.ac.jp/tce/
Xbowflow https://github.com/ChrisSuess/Project-Xbow/tree/master/xbowflow
AdaptiveMd https://github.com/markovmodel/adaptivemd
Meshroom https://github.com/alicevision/meshroom
LSST Data Management https://github.com/lsst/pipe_base
CGAT-core https://github.com/cgat-developers/cgat-core
Prefect https://docs.prefect.io/
Apache SCXML engine https://commons.apache.org/proper/commons-scxml/guide/core-engine.html https://commons.apache.org/proper/commons-scxml/guide/scxml-documents.html
IceProd https://github.com/WIPACrepo/iceprod
AnADAMA2 http://huttenhower.sph.harvard.edu/anadama2
Luna https://luna-lang.org/
Passerelle https://code.google.com/archive/a/eclipselabs.org/p/passerelle
Kurator-Akka https://github.com/kurator-org/kurator-akka
Jug https://doi.org/10.5334/jors.161
Node-RED https://nodered.org/
Databolt Flow https://github.com/d6t/d6tflow
Frictionless Data Package Pipelines https://github.com/frictionlessdata/datapackage-pipelines
DataFlows https://github.com/datahq/dataflows
Volcano https://github.com/volcano-sh/volcano
DataJoint https://datajoint.io/
DIRAC3 (LHCb grid software) https://doi.org/10.1088/1742-6596/219/6/062029
Orange https://orange.biolab.si/ http://jmlr.org/papers/v14/demsar13a.html
Ensemble Toolkit (EnTK) https://radicalentk.readthedocs.io/en/latest/entk.html https://arxiv.org/abs/1602.00678v2
BioQueue http://www.bioqueue.org/ https://github.com/liyao001/BioQueue https://doi.org/10.1093/bioinformatics/btx403
mlr3pipelines https://mlr3pipelines.mlr-org.com/
Kedro - Production-Ready Data & ML Pipelines https://github.com/quantumblacklabs/kedro
DATAVIEW - DATAVIEW is a big data workflow management system. https://github.com/shiyonglu/DATAVIEW
SecDATAVIEW - SecDATAVIEW is a security oriented big data workflow management system compatible with the heterogeneous computing environments. https://github.com/shiyonglu/SecDATAVIEW
Giraffe Tools https://giraffe.tools/
CERAMICCA - Cloud Engine Resource for Accelerated Medical Image Computing for Clinical Applications: https://ceramicca.ensc.sfu.ca
uap - Universal Analysis Pipeline. A workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. https://github.com/yigbt/uap
signac https://signac.io/ https://github.com/glotzerlab/signac-flow
cwltool - reference implementation of Common Workflow Language (CWL) https://github.com/common-workflow-language/cwltool/
CWLEXEC - CWL executor for IBM Spectrum LSF clusters https://github.com/IBMSpectrumComputing/cwlexec
drmr https://github.com/ParkerLab/drmr/ https://drmr.readthedocs.io/
Autosubmit - a versatile tool to manage Weather and Climate Experiments in diverse Supercomputing Environments. https://www.bsc.es/research-and-development/software-and-apps/software-list/autosubmit https://autosubmit.readthedocs.io/en/latest/
JUDI https://doi.org/10.1093/bioinformatics/btz956 https://github.com/ncbi/JUDI
Sumatra - a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research https://neuralensemble.org/sumatra/
Netflix Conductor: https://netflix.github.io/conductor/
Pipengine https://doi.org/10.21105/joss.00341 https://github.com/fstrozzi/bioruby-pipengine
MyQueue https://doi.org/10.21105/joss.01844 https://gitlab.com/myqueue/myqueue
Drake R package https://doi.org/10.21105/joss.00550 https://github.com/ropensci/drake (superseded by [targets]{.title-ref})
MaDaTS https://10.21105/joss.00830 https://github.com/dghoshal-lbl/madats
Cadence https://cadenceworkflow.io/
Merlin https://merlin.readthedocs.io/
Janis https://janis.readthedocs.io/en/latest/index.html
AlphaSQL https://github.com/Matts966/alphasql
zeebe https://github.com/zeebe-io/zeebe
durabletask https://github.com/Azure/durabletask
Illumina State Language https://iap-docs.readme.io/docs/wes_illumina-states-language
Moteur http://www.i3s.unice.fr/~johan/publis/MOTEUR-poster-A4.pdf https://indico.cern.ch/event/286/contributions/654974/attachments/530521/731563/userForumEGEE_MOTEUR.pdf
SimStack http://www.simstack.de/
Maestro Workflow Conductor https://github.com/LLNL/maestrowf
HyWare https://doi.org/10.1007/s41060-020-00237-x
HyperFlow https://github.com/hyperflow-wms/
BRANE Framework https://onnovalkering.github.io/brane/
ApolloWF https://apollowf.github.io/
IS-EPOS Platform https://ieeexplore.ieee.org/document/9308147 https://tcs.ah-epos.eu/
pyinvoke: Pythonic task management & command execution, uses python as workflow language. http://www.pyinvoke.org/
targets R package https://cran.r-project.org/package=targets https://doi.org/10.21105/joss.02959
Compi https://doi.org/10.7717/peerj-cs.593 https://github.com/sing-group/compi https://www.sing-group.org/compi
TriggerFlow: Event-based Orchestration of Serverless Workflows https://github.com/triggerflow/triggerflow
Google Cloud Workflows: Orchestrate and automate Google Cloud and HTTP-based API services with serverless workflows. https://cloud.google.com/workflows/docs
PanDA Workflow Management System: https://doi.org/10.1051/epjconf/201921403050 https://doi.org/10.1051/epjconf/201610801003 https://github.com/PanDAWMS
Harvester: https://github.com/HSF/harvester
BD-Processor https://github.com/big-data-processor/bd-processor/
redun (yet another redundant workflow engine) https://github.com/insitro/redun
pyiron (The materials science IDE) https://pyiron.org/
looper (pipeline submitting engine) https://github.com/pepkit/looper
dagster (Python based data orchestration platform) https://dagster.io/
StackStorm (Devops automation engine) https://stackstorm.com/
Geoweaver (compose and execute full-stack deep learning workflows) https://esipfed.github.io/Geoweaver/
Popper: Container-native task automation engine https://github.com/getpopper/popper
Cloud Build: Build, test, and deploy on our serverless CI/CD platform https://cloud.google.com/build
Task/Taskfile: A task runner / simpler Make alternative written in Go https://github.com/go-task/task https://taskfile.dev/
pypyr: task runner for automation pipelines script sequential task workflow steps in yaml conditional execution, loops, error handling & retries https://pypyr.io/
SimTool/Sim2Ls: Jupyter notebook-based pipelines of Simulation Tools for the HUBzero platform lead by nanoHUB https://github.com/hubzero/simtool https://simtool.readthedocs.io/ https://doi.org/10.1371/journal.pone.0264492
SideIO: A Side I/O system framework for hybrid scientific workflow (no project/source code available) https://doi.org/10.1016/j.jpdc.2016.07.001
Flyte https://flyte.org/
StreamFlow https://streamflow.di.unito.it/
Jupyter Workflow https://jupyter-workflow.di.unito.it/
Nnodes: a simple workflow manager for Python functions and command line tools https://github.com/icui/nnodes
Orchest: A GUI for developing, running and managing container workflows https://github.com/orchest/orchest
Wasmflow: platform for building applications out of WebAssembly code containers https://github.com/wasmflow/wasmflow https://wasmflow.com/
HyperShell: cross-platform, high-performance computing utility for processing shell commands over a distributed, asynchronous queue. https://doi.org/10.1145/3491418.3535138 https://github.com/glentner/hyper-shell
Covalent: Pythonic distributed workflow tool used to prototype and run high performance classical and quantum software https://github.com/AgnostiqHQ/covalent
Icolos: workflow manager for structure-based workflows in computational chemistry https://github.com/MolecularAI/Icolos https://doi.org/10.26434/chemrxiv-2022-vqbxg
dwork: Task graph scheduler with a minimalistic API. https://github.com/frobnitzem/dwork
pmake: parallel make developed for use within batch jobs https://docs.olcf.ornl.gov/software/workflows/pmake.html#workflows-pmake
Texera: Collaborative Data Analytics Using Workflows https://github.com/Texera/texera
swif2: The Scientific Workflow Indefatigable Factotum https://scicomp.jlab.org/docs/swif2
Jobflow: jobflow is a library for writing computational workflows https://github.com/materialsproject/jobflow
Balsam: High throughput workflows and automation for HPC https://github.com/argonne-lcf/balsam
Task Vine: Dynamic workflows that run on HPC clusters, GPU clusters, and commercial clouds https://ccl.cse.nd.edu/software/taskvine/
Globus Compute: High Performance Function Serving for Science https://github.com/funcx-faas/funcX
mkite: Distributed computing for high-throughput materials simulation https://github.com/mkite-group
hpcflow: Automated simulate, process, archive workflows on HPC systems https://github.com/hpcflow/hpcflow-new, DOI: 10.12688/materialsopenres.17516.1
Data Version Control: DVC data pipelines as "Makefile" system for machine learning projects https://dvc.org/doc/start/data-management/data-pipelines
ZnFlow: General purpose framework for computational graphs https://github.com/zincware/ZnFlow
Globus Flows: Globus automation services provide secure and reliable task orchestration across a set of heterogeneous resources at scale. https://docs.globus.org/api/flows/
Kestra: Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines. https://github.com/kestra-io/kestra
Celery: Distributed Task Queue. https://github.com/celery/celery
dflow: Dflow is a Python framework for constructing scientific computing workflows (e.g. concurrent learning workflows) employing Argo Workflows as the workflow engine. https://github.com/deepmodeling/dflow
pydra: A simple dataflow engine with scalable semantics. https://github.com/nipype/pydra
funsies: A python library and execution engine to build reproducible, fault-tolerant, distributed and composable computational workflows. https://github.com/aspuru-guzik-group/funsies
Maize: A graph-based workflow manager for computational chemistry pipelines. https://github.com/MolecularAI/maize
doit: task management & automation tool. https://github.com/pydoit/doit
drama: asynchronous workflow executor engine based on Dramatiq and Apache Kafka https://github.com/KhaosResearch/drama
Titan: Semantic scientific workflow platform based on Drama, Apache Kafka, Avro and Spark https://titan.khaos.uma.es/
disBatch: Tool to distribute a list of computational tasks over a pool of compute resources. The pool can grow or shrink. https://github.com/flatironinstitute/disBatch
dawgz: Directed Acyclic Workflow Graph Scheduling. https://github.com/francois-rozet/dawgz
hummingqueue: Single tenant distributed scheduling. https://github.com/NablaChem/hummingqueue
Ewoks: Python-based meta workflow system for large-scale facilities, with multiple execution bindings https://ewoks.esrf.fr/ https://ewoks.readthedocs.io/
DAGonStar: simple Python-based workflow engine that runs jobs on everything from the local machine to distributed virtual HPC clusters hosted in private and public clouds. https://github.com/DagOnStar/dagonstar
MaRDIFlow: automation of abstracting meta-data embedded in an ontology of mathematical objects https://doi.org/10.48550/arXiv.2405.00028 https://doi.org/10.5281/zenodo.7863519
Kerblam: Rust-based pipeline system for scientific data analysis https://www.kerblam.dev/
PlantCV: Python-based imaging processing pipeline system for plant phenotyping https://plantcv.readthedocs.io/ https://doi.org/10.1016/j.molp.2015.06.005
Maestro: Netflix’s Workflow Orchestrator https://github.com/Netflix/maestro
BEE: Build and Execution Environment https://github.com/lanl/BEE
pipefunc: Lightweight function pipeline (DAG) creation in pure Python for scientific workflows with local and SLURM parallelization and QoL tools https://github.com/pipefunc/pipefunc
LEXIS: orchestration of complex scientific and industrial workflows which can be easily run on distributed HPC and Cloud resources https://lexis-project.eu/web/lexis-platform/
Perqueue: persistent queue. https://gitlab.com/asm-dtu/perqueue
Playbook Workflow Builder: Interactive Construction of Bioinformatics Workflows from a Network of Microservices https://playbook-workflow-builder.cloud/ https://github.com/MaayanLab/Playbook-Workflow-Builder https://doi.org/10.1101/2024.06.08.598037

https://xkcd.com/927/

https://www.commonwl.org/

https://doi.org/10.1145/3486897

2014: CWL idea at Bioinformatics Open Source Conference

2025: ~14 implementations

http://www.commonwl.org/

cwlVersion: v1.0
class: Workflow
inputs:
  inp: File
  ex: string

outputs:
  classout:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: inp
      extractfile: ex
    out: [example_out]

  compile:
    run: arguments.cwl
    in:
      src: untar/example_out
    out: [classfile]

CWL is a YAML-based standard for workflows

Reusable tool definitions

Executed in Docker containers

Explicit inputs and outputs - can be rewired

Workflow provenance

Recording computational processes

PROV: Standard for provenance

Photo from https://openprovenance.org/

Workflow description

Execution provenance

Time travel:
Common Workflow Language modelled on wfdesc, while CWL provenance came later

Identified the need for multiple implementation levels

Which PROV format?

<prov:wasGeneratedBy>
  <prov:entity prov:ref="ex:ent1"/>
  <prov:activity prov:ref="ex:act1"/>
  <prov:time>2017-10-26T21:32:52Z</prov:time>
  <ex:port>p1</ex:port>
</prov:wasGeneratedBy>

wasGeneratedBy(ent1, act1, 
  2017-10-26T21:32:52Z, ex:port="p1")

:ent1
  a prov:Entity;
  prov:wasGeneratedBy :act1;
  prov:generatedAtTime "2017-10-26T21:32:52Z"^^xsd:dateTime ;
  ex:port "p1" .

    "wasGeneratedBy": {
        "ex:gen1": {
            "prov:entity": "ent1",
            "prov:activity": "act1",
            "prov:time": "2017-10-26T21:32:52Z",
            "ex:port": "p1"
        },
    },

{ "@context": { .. }, 
  "@id": "ent1",
  "@type": "prov:Entity",
  "ex:port": "p1",
  "prov:generatedAtTime":  "2017-10-26T21:32:52Z",
  "prov:wasGeneratedBy": {
    "@id": "act1",
    "@type": "prov:Activity"
  } 
}

PROV-N

PROV-XML

PROV-JSON

PROV-O Turtle

PROV-O JSON-LD

https://www.w3.org/TR/prov-overview/

Linked Data

<Alice> <knows> <Bob> .

A triple express a relationship between two named resources.

<Alice>   <knows> <Bob> .
<Alice>   <knows> <Charlie> .
<Alice>   <plays> <Tennis> .
<Bob>     <knows> <Charlie> .
<Bob>     <plays> <Football> .
<Charlie> <plays> <Tennis> .

Multiple triples with shared resources form a graph

<http://example.com/Alice>   <knows> <http://example.com/Bob> .
<http://example.com/Alice>   <knows> <http://example.com/Charlie> .
<http://example.com/Alice>   <plays> <http://dbpedia.org/resource/Tennis> .
<http://example.com/Bob>     <knows> <http://example.com/Charlie> .
<http://example.com/Bob>     <plays> <http://dbpedia.org/resource/Football> .
<http://example.com/Charlie> <plays> <http://dbpedia.org/resource/Tennis> .

URLs on the Web can make global identifiers

<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Leadóg"@ga .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@sv .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@es .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "テニス"@ja .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@cs .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "网球"@zh .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Теннис"@ru .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@pl .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@de .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Teniso"@eo .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@en .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@eu .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tennis is a racket sport that is played either individually against a single opponent (singles) or between two teams of two players each (doubles). Each player uses a tennis racket that is strung with cord to strike a hollow rubber ball covered with felt over or around a net and into the opponent's court. The object of the game is to manoeuvre the ball in such a way that the opponent is not able to play a valid return. The player who is unable to return the ball validly will not gain a point, while the opposite player will. Tennis is an Olympic sport and is played at all levels of society and at all ages. The sport can be played by anyone who can hold a racket, including wheelchair users. The modern game of tennis originated in Birmingham, England, in the late 19th century as lawn tennis. It had close connections both to various field (lawn) games such as croquet and bowls as well as to the older racket sport today called real tennis. The rules of modern tennis have changed little since the 1890s. Two exceptions are that until 1961 the server had to keep one foot on the ground at all times, and the adoption of the tiebreak in the 1970s. A recent addition to professional tennis has been the adoption of electronic review technology coupled with a point-challenge system, which allows a player to contest the line call of a point, a system known as Hawk-Eye. Tennis is played by millions of recreational players and is a popular worldwide spectator sport. The four Grand Slam tournaments (also referred to as the majors) are especially popular: the Australian Open, played on hardcourts; the French Open, played on red clay courts; Wimbledon, played on grass courts; and the US Open, also played on hardcourts."@en .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Ténis (português europeu) ou tênis (português brasileiro) é um esporte de origem inglesa, disputado em quadras geralmente abertas e de superfícies sintéticas, cimento, saibro ou relva. Participam no jogo dois oponentes ou duas duplas de oponentes, podendo ser mistas (homens e mulheres) ou não. A quadra é dividida em duas meia-quadras por uma rede, e o objetivo do jogo é rebater uma pequena bola para além da rede (para a meia-quadra adversária) com ajuda de uma raquete. Para marcar um ponto é preciso que a bola toque no solo em qualquer parte dentro da quadra adversária incluindo as linhas que demarcam o campo do oponente, fazendo com que o adversário não consiga devolver a bola antes do segundo toque, ou que a devolva para fora dos limites da outra meia-quadra. O desporto assim possui aspectos de ataque (rebater bem a bola, dificultando a devolução do adversário) e defesa (bom posicionamento em quadra, antecipação do lance adversário etc). O tênis possui um intricado sistema de pontuação, que subdivide o jogo em games/jogospt e sets/partidas pt. Grosso modo, um game é um conjunto de pontos (15-30-40-game) e um set é um conjunto de games (1-2-3-4-5-set). Cada game tem um jogador responsável por recolocar a bola em jogo: fazer o serviço ou sacar. No tênis de competição, é comum que o jogador que serve fature o game, já que tem a vantagem do ataque e dita o ritmo do jogo. Desta forma, uma das estratégias de jogo é tentar inverter esta vantagem durante a troca de bola ou durante a defesa fazer com que o adversário, através de erros, perca os games em que está sacando. Ganha o jogobr/encontropt aquele que atingir um número de sets pré-definido — geralmente 2 sets, sendo de 3 sets para os grandes torneios masculinos."@pt .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Il tennis, il cui nome può essere tradotto in italiano con un termine arcaico non più in uso di “schiaffo palla” e storicamente noto col nome di \"pallacorda\", è uno sport della racchetta che vede opporsi due o quattro giocatori (due contro due, incontro di doppio) in un campo da gioco diviso in due metà da una rete alta 0,914 m al centro e 1,07 m ai lati. È considerato uno sport completo e armonico (sebbene fisicamente in parte asimmetrico perché un braccio viene escluso dal movimento del corpo) in quanto sono richieste al contempo ottime capacità fisiche (coordinazione, velocità e resistenza sulla distanza) e mentali (tecnica nei colpi, tattica, visione di gioco, riflessi e intuito)."@it .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenis (angl. tennis < angl. tenes, tenetz < fr. tenez! = berte, držte! (imperativ pl. slova držet)), označovaný také jako bílý sport, je míčová hra pro 2 nebo 4 hráče. Varianta se 2 hráči se nazývá , varianta se 4 hráči pak čtyřhra. Hraje se také smíšená čtyřhra, při které v každé dvojici hraje jedna žena a jeden muž. Tenisový míč Soupeři stojí proti sobě, na obdélníkovém hřišti (tenisovém dvorci) a pokoušejí se odrazit tenisový míček tenisovou raketou do pole tak, aby jej soupeř nemohl vrátit nebo aby se s míčkem trefil vedle tenisového dvorce (do autu)."@cs .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "كرة المضرب (أو التنس الأرضي في الترجمات الحرفية) نوع من رياضات الراح والتي يتنافس فيها لاعبان في مباريات فردية، أو فريقان مكونان من لاعبين في مباريات زوجية. كلٌ منهم يحمل مضربا ليستخدمه في ضرب الكرة فوق الشبكة نحو منطقة الخصم. وعدد الضربات ليس محددا، إنما النتيجة تحدد الرابح. كرة المضرب أو التنس هو رياضة أولمبية تُلعب على جميع مستويات المجتمع وفئاته ولجميع الأعمار، يمكن لأي شخص أن يمسك بمضرب ويضرب الكرة بما في ذلك ذوو الاحتياجات الخاصة ومستخدمو الكراسي المتحركة. نشأت لعبة التنس الحديثة في برمنغهام بإنجلترا في أواخر القرن التاسع عشر كرياضة تلعب على العشب، لذا فهي وثيقة الصلة بكل من الألعاب الميدانية كالكريكت والبولينغ، إضافة إلى رياضة المضرب القديمة. تغيرت قواعد كرة المضرب الحديثة قليلاً منذ تسعينيات القرن التاسع عشر، باستثناء قاعدتين: من 1908 إلى 1961 كان على اللاعب أن يضع قدمًا واحدة على الأرض في جميع أوقات المباراة، وفي السبعينيات اعتمد الشوط الفاصل (تاي بريك). ومن الإضافات الحديثة إلى اللعبة على المستوى الاحترافي اعتماد تقنية المراجعة الإلكترونية (عين الصقر) إلى جانب نظام تحدي النقاط . يلعب كرة المضرب الملايين من لاعبين، وهي رياضة مشهورة ومنتشرة في جميع أنحاء العالم، وتحظى البطولات الكبرى لكرة المضرب بمتابعة جماهيرية وإعلامية كبيرة، تضعها في مرتبة متقدمة بين الأحداث الرياضية الأكثر أهمية ومتابعة في العالم سنويا. وأهم هذه البطولات: رولان غاروس أو بطولة فرنسا المفتوحة على الملاعب الرملية، بطولة أستراليا المفتوحة على الملاعب الصلبة، بطولة ويمبلدون على الملاعب العشبية، وبطولة الولايات المتحدة المفتوحة على الملاعب الصلبة."@ar .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Teniso estas rakedsporto, kutime ludata de du homoj, sed ankaŭ foje de kvar homoj. Specifa para ludo estas la ambaŭseksa para ludo, kiam ĉiu teamo konsistas el unu viro kaj unu virino. Estas principe tri kategorioj de tenisejo, nome la gazona, malmola kaj batita terenoj. Dumlude, la ludantoj ĉiam devas revenigi la pilkon per siaj rakedoj al kontraŭa kampo, provante samtempe malebligi tion al oponanto(j)."@eo .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenisa Ingalaterran eta XIX. mendean sortutako kirola da, pilota batekin eta bi edo lau jokalarien artean jokatua. Tenis jokalariei tenislari ere deritze. Mende haren amaieran arautu zuten: Sare batek banatzen duen zelai batean jokatzen da, jokalariak erraketez baliatzen dira pilota sarearen gainetik aurkariaren zelaira botatzeko, partidek ez dute iraupen jakinik (beharrezko puntuazioa lortzean bukatzen da), prestakuntza fisikoa eta teknika behar dira tenisaren ezaugarriak eta arautegia betetzeko."@eu .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenis atau bola tampel adalah olahraga yang biasanya dimainkan antara dua pemain atau antara dua pasangan masing-masing dua pemain. Setiap pemain menggunakan raket untuk memukul bola karet. Tujuan permainan adalah memainkan bola dengan cara tertentu sehingga pemain lawan tidak dapat mengembalikan bola tersebut. Tenis adalah salah satu cabang olahraga Olimpiade dan dimainkan pada semua tingkat masyarakat di segala usia. Olahraga ini dapat dimainkan oleh siapa saja, termasuk orang-orang yang menggunakan kursi roda. Permainan tenis modern berasal dari Birmingham, Inggris pada akhir abad ke-19 sebagai \"tenis lapangan rumput\". Peraturan tenis berubah sedikit sejak 1890-an. Dua perubahan kecil adalah sejak 1908 hingga 1961 pemain yang melakukan service (pukulan pertama) harus menjaga salah satu kakinya tetap di tanah hingga service berpindah dan adopsi sistem tie-break pada 1970-an. Tambahan terakhir yang diterapkan pada tenis profesional adalah teknologi tinjauan ulang elektronik. Tenis dimainkan oleh jutaan orang sebagai olahraga rekreasi dan juga merupakan olahraga tontontan populer di seluruh dunia.Empat kejuaraan tenis terkemuka adalah Australia Terbuka yang dimainkan di lapangan keras, Prancis Terbuka yang dimainkan di lapangan tanah liat, Wimbledon yang dimainkan di lapangan rumput, dan AS Terbuka yang dimainkan juga di lapangan keras."@in .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/venue> <http://dbpedia.org/resource/Tennis_court> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Net_sport> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Tennis_ball> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Racket_(sports_equipment)> .

<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Bola tenis adalah bola yang dirancang untuk olahraga tenis. Bola tenis berwarna kuning fluoresens pada acara olahraga besar, tetapi dalam permainan rekreasi bisa berwarna apa saja. Bola tenis ditutupi dengan serat berserat yang memodifikasi sifat aerodinamisnya, dan masing-masing memiliki lekukan lonjong melengkung putih yang menutupinya. Bola tenis diisi dengan udara dan permukaannya ditumbuhi oleh kompon karet berserat seragam. Keterlambatan terasa pemisahan aliran di lapisan batas yang mengurangi hambatan aerodinamik dan memberi bola sifat terbang yang lebih baik."@in .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Tenisový míč či tenisák je vyrobený z pevné gumové duše, která je obalena meltonem a nahuštěná vzduchem. Jen míče značky Tretorn jsou zcela vyplněny pórovitou hmotou. Jedná se o míč určený pro tenis. Povrch je jednobarevný, dříve byl bílý či oranžový. Existují však i míčky vícebarevné, například zeleno-růžové. Barva dnešního míčku závisí na osvětlení a člověku, který se na míček dívá. Někteří ho vidí neonově žlutý, jiní světle zelený a další zeleno-žlutý."@cs .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "كرة التنس هي كرة مصممة لممارسة رياضة التنس. كرات التنس لونها صفراء مُتفتحة في الأحداث الرياضية الكبرى، ولكن في اللعب الترفيهي يمكن أن يكون أي لون تقريبًا. وتغطي كرات التنس بواسطة لباد ليفي الذي يعدل من مرور الكرات في الهواء. تمتلئ كرات التنس بالهواء وتظهر على السطح بواسطة مركب مطاطي مغطى باللباد. يؤخر فصل التدفق في الطبقة الحدودية مما يقلل من السحب الديناميكي الهوائي ويمنح الكرة خصائص طيران أفضل. غالبًا ما تحتوي الكرات على رقم بالإضافة إلى اسم العلامة التجارية. وهذا يساعد على تمييز مجموعة من الكرات عن مجموعة أخرى من نفس العلامة التجارية في ملعب مجاور."@ar .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "テニスボール（英: tennis ball）とは、狭義には、テニス（狭義のテニス、すなわち、硬式テニスともいうローンテニス）用に作られたボール。厚いに毛羽のあるフェルトを被せた球の形で、フェルトによる特有の反発性をもつ。art-antsを呼称する言葉でもある。 また、広義の「テニス」にはソフトテニス（en. 軟式テニス、軟式庭球）も含まれることから、ソフトテニスのボールも含む。"@ja .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Der Tennisball ist das Spielgerät, mit dem Tennis gespielt wird. Er wird mit Hilfe der Tennisschläger gespielt."@de .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Een tennisbal is een bal die wordt gebruikt bij tennis. Er wordt tegenaan geslagen met een tennisracket en met de bal worden punten gescoord. Er zijn verschillende soorten tennisballen voor verschillende baansoorten en weertypen. De meest voorkomende kleur is fluorescerend geel met een lichtgekleurde kromme streep."@nl .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Una pelota de tenis es un objeto esférico de pequeño diámetro (especialmente en comparación con la medida de otras pelotas) que se utiliza en la práctica del deporte homónimo. Para impulsarla se utiliza una raqueta. Un profesional puede hacer que llegue a coger una velocidad de 263 km/h. Se recubre con fieltro (en general de color amarillo fluorescente con rayas de separación blancas), puesto que es el «color más visible por el ojo humano» y permite una trayectoria más correcta, en comparación con una totalmente lisa. De media, una pelota de tenis pesa de 56 a 58 gramos."@es .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Tennisboll är en typ av boll som används inom sporten tennis. Officiella bollar skall vara vita eller gula (träningsbollar kan ha andra färger) och ha en diameter mellan 65,4–73,0 mm (2,58–2,87 tum). De skall väga mellan 56,0 och 59,4 gram. Bollarna ska studsa 135–147 cm efter att ha släppts från en höjd av 254 cm på ett betonggolv. Den moderna tennisbollen tillverkas av två likformade delar av format gummi som sammanfogats till ett ihåligt klot. Två hantelformade filtade delar av flanell eller nylon limmas utanpå bollen och ger den dess karakteristiska utseende."@sv .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "A tennis ball is a ball designed for the sport of tennis. Tennis balls are fluorescent yellow in organised competitions, but in recreational play can be virtually any color. Tennis balls are covered in a fibrous felt which modifies their aerodynamic properties, and each has a white curvilinear oval covering it."@en .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Теннисный мяч — предназначен для тенниса. Он примерно 6,7 см (2,63 дюйма) в диаметре. Теннисные мячи на крупных спортивных соревнованиях желтые, но могут быть практически любого цвета. Теннисные мячи покрыты войлоком, который модифицирует их аэродинамические свойства, и имеют белую волнистую линию, которая охватывает снаряд."@ru .

Linked Data: following the URLs to find more triples

Linked Data standards

..used by >50% of all websites?

https://w3techs.com/technologies/history_overview/structured_data/all/y

2010–: Research Objects

Model for Research Objects

Reused multiple ontologies

https://www.researchobject.org/specs/

Implemented by workflow systems and supporting RO services

https://doi.org/10.1038/sdata.2016.18

Data (and metadata) should be:

Findable
Accessible
Interoperable
Reusable

Aim of FAIR principles

Machine readable data & metadata

Community agreement on standards

https://doi.org/10.1038/sdata.2016.18

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

https://doi.org/10.1038/sdata.2016.18

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

https://doi.org/10.1038/sdata.2016.18

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

https://doi.org/10.1038/sdata.2016.18

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

https://doi.org/10.1038/sdata.2016.18

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

The FAIR Principles have propelled the global debate about better data stewardship in data-driven and open science, and they have triggered funding bodies (e.g., UKRI, Wellcome Trust, USA NIH, EC Horizon 2020) to mandate FAIR data in their data policies.

REF2021 Submission by University of Oxford

2019: My PhD journey starts

Supervisors: Paul Groth, Carole Goble

Peter Sefton at Open Repositories 2019

https://arkisto-platform.github.io/case-studies/paradisec/

Cultural heritage records

Research Object Crate

Aims of

FAIR Research Objects

Describe and package data collections, datasets, software etc. with their metadata

Platform-independent object exchange between repositories and services

Support reproducibility and analysis: link data with codes and workflows

Transfer of sensitive/large distributed datasets with persistent identifiers

Propagate provenance, citations and existing metadata

Reuse existing standards, but hide their complexity

Creative works: CreativeWork, Book, Movie, MusicRecording, Recipe, TVSeries ...
Embedded non-text objects: AudioObject, ImageObject, VideoObject
Event
Health and medical types: notes on the health and medical types under MedicalEntity.
Organization
Person
Place, LocalBusiness, Restaurant ...
Product, Offer, AggregateOffer
Review, AggregateRating
Action

Vocabulary: A set of types with properties

schema.org is used by almost half of all popular websites -- mainly to improve search engine ranking?

Explaining by example for developers: Linked Data by stealth

https://workflowhub.eu/

https://doi.org/10.48550/arXiv.2410.06941

https://doi.org/10.3897/biss.8.134757

Capturing workflows as FAIR Research Objects

Executions of workflows can be

tracked using a profile of RO-Crate

Interoperable reproducibility

of computational methods

Developed as a community task force, 7+ implementations

https://www.researchobject.org/workflow-run-crate/

https://doi.org/10.1371/journal.pone.0309210

Multiple implementation levels

Pushing the envelope of FAIR

https://satre-specification.readthedocs.io/

Five Safes principles

Green & Ritchie (2023)

https://doi.org/10.29012/jpc.831

Trusted Research Environments (TRE) ensure safe access to sensitive data

TREs make a locked down computational space (e.g. a VM) and enforce disclosure control before outputs leaves the premises.

Federated Analytics with Five Safes RO-Crate

https://w3id.org/5s-crate/

Federated Analytics with Five Safes RO-Crate

https://w3id.org/5s-crate/

RO-Crate used a common vocabulary, so it was easy to extend for human actions

FAIR Research Objects can be created in a pragmatic way

By using Web technologies and explaining by example,

Linked Data technology can be used by most developers

Profiles as a lightweight approach to ontologies, a "just enough" formalisation of usage patterns and domain vocabularies

Linked Data as basis:
formalisations, extension mechanisms, graph queries, validation, integration/mapping, ++

(only use these if you need to!)

The RO-Crate team is:

Peter Sefton https://orcid.org/0000-0002-3545-944X (co-chair)
Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 (co-chair)
Eoghan Ó Carragáin https://orcid.org/0000-0001-8131-2150 (emeritus
Oscar Corcho https://orcid.org/0000-0002-9260-0753
Daniel Garijo https://orcid.org/0000-0003-0454-7145
Raul Palma https://orcid.org/0000-0003-4289-4922
Frederik Coppens https://orcid.org/0000-0001-6565-5145
Carole Goble https://orcid.org/0000-0003-1219-2137
José María Fernández https://orcid.org/0000-0002-4806-5140
Kyle Chard https://orcid.org/0000-0002-7370-4805
Jose Manuel Gomez-Perez https://orcid.org/0000-0002-5491-6431
Michael R Crusoe https://orcid.org/0000-0002-2961-9670
Ignacio Eguinoa https://orcid.org/0000-0002-6190-122X
Nick Juty https://orcid.org/0000-0002-2036-8350
Kristi Holmes https://orcid.org/0000-0001-8420-5254
Jason A. Clark https://orcid.org/0000-0002-3588-6257
Salvador Capella-Gutierrez https://orcid.org/0000-0002-0309-604X
Alasdair J. G. Gray https://orcid.org/0000-0002-5711-4872
Stuart Owen https://orcid.org/0000-0003-2130-0865
Alan R Williams https://orcid.org/0000-0003-3156-2105
Giacomo Tartari https://orcid.org/0000-0003-1130-2154
Finn Bacall https://orcid.org/0000-0002-0048-3300
Thomas Thelen https://orcid.org/0000-0002-1756-2128
Hervé Ménager https://orcid.org/0000-0002-7552-1009
Laura Rodríguez-Navas https://orcid.org/0000-0003-4929-1219
Paul Walk https://orcid.org/0000-0003-1541-5631
brandon whitehead https://orcid.org/0000-0002-0337-8610
Mark Wilkinson https://orcid.org/0000-0001-6960-357X
Paul Groth https://orcid.org/0000-0003-0183-69

LJ Garcia Castro https://orcid.org/0000-0003-3986-0510
Karl Sebby https://orcid.org/0000-0001-6022-9825
Alexander Kanitz https://orcid.org/0000-0002-3468-0652
Ana Trisovic https://orcid.org/0000-0003-1991-0533
Gavin Kennedy https://orcid.org/0000-0003-3910-0474
Mark Graves https://orcid.org/0000-0003-3486-8193
Jasper Koehorst https://orcid.org/0000-0001-8172-8981
Simone Leo https://orcid.org/0000-0001-8271-5429
Marc Portier https://orcid.org/0000-0002-9648-6484
Paul Brack https://orcid.org/0000-0002-5432-2748
Milan Ojsteršek https://orcid.org/0000-0003-1743-8300
Bert Droesbeke https://orcid.org/0000-0003-0522-5674
Chenxu Niu https://orcid.org/0000-0002-2142-1731
Kosuke Tanabe https://orcid.org/0000-0002-9986-7223
Tomasz Miksa https://orcid.org/0000-0002-4929-7875
Marco La Rosa https://orcid.org/0000-0001-5383-6993
Cedric Decruw https://orcid.org/0000-0001-6387-5988
Andreas Czerniak https://orcid.org/0000-0003-3883-4169
Jeremy Jay https://orcid.org/0000-0002-5761-7533
Sergio Serra https://orcid.org/0000-0002-0792-8157
Ronald Siebes https://orcid.org/0000-0001-8772-7904
Shaun de Witt https://orcid.org/0000-0003-4196-3658
Shady El Damaty https://orcid.org/0000-0002-2318-4477
Douglas Lowe https://orcid.org/0000-0002-1248-3594
Xuanqi Li https://orcid.org/0000-0003-1498-6205
Sveinung Gundersen https://orcid.org/0000-0001-9888-7954
Muhammad Radifar https://orcid.org/0000-0001-9156-9478
Rudolf Wittner https://orcid.org/0000-0002-0003-2024
Oliver Woolland https://orcid.org/0000-0002-4565-9760
Paul De Geest https://orcid.org/0000-0002-8940-4946
Douglas Fils https://orcid.org/0000-0002-2257-9127

Florian Wetzels https://orcid.org/0000-0002-5526-7138
Raül Sirvent https://orcid.org/0000-0003-0606-2512
Abigail Miller https://orcid.org/0000-0001-9228-2882
Jake Emerson https://orcid.org/0000-0003-0617-9219
Davide Fucci https://orcid.org/0000-0002-0679-4361
Bruno P. Kinoshita https://orcid.org/0000-0001-8250-4074
Maciek Bąk https://orcid.org/0000-0003-1361-7301
Jens Hollunder https://orcid.org/0000-0003-3234-6762
Martin Weise http://orcid.org/0000-0003-4216-302X
Vartika Bisht https://orcid.org/0000-0002-1880-0597
Toshiyuki Nishiyama Hiraki https://orcid.org/0000-0001-6712-6335
Bram Ulrichts https://orcid.org/0000-0002-5934-8998
Michael Falk https://orcid.org/0000-0001-9261-8390
Eli Chadwick https://orcid.org/0000-0002-0035-6475
Daniel Bauer https://orcid.org/0000-0001-9447-460X
James Love https://orcid.org/0000-0001-7760-1240
Eleni Adamidi https://orcid.org/0000-0001-9925-1560
Josh Moore https://orcid.org/0000-0003-4028-811X
Lars Schöbitz https://orcid.org/0000-0003-2196-5015
Andreas Meier https://orcid.org/0009-0002-6541-4637
Erich Bremer https://orcid.org/0000-0003-0223-105

RO-Crate team

Thank you!!

https://esciencelab.org.uk/

{
  "@context": "https://w3id.org/ro/crate/1.1/context",
  "@type": "CreativeWork",
  "@id": "#talk",
  "name": "Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows",
  "description": "A presentation discussing the role of Linked Data and RO-Crate in the era of LLMs, emphasizing their necessity for FAIR computational workflows.",
  "author": {
    "@type": "Person",
    "name": "[Your Name]",
    "affiliation": "[Your Institution]",
    "orcid": "[Your ORCID]"
  },
  "datePublished": "2025-02-19",
  "keywords": ["RO-Crate", "Linked Data", "FAIR", "Computational Workflows"],
  "subjectOf": {
    "@type": "Event",
    "name": "[Conference or Workshop Name]",
    "startDate": "[Event Date]",
    "location": {
      "@type": "Place",
      "name": "[Event Location]"
    }
  },
  "relatedMaterial": [
    {
      "@type": "CreativeWork",
      "name": "RO-Crate Specification",
      "url": "https://www.researchobject.org/ro-crate/"
    }
  ]
}

{
  "@context": [
    "https://schema.org/",
    "https://w3id.org/ro/crate/1.1/context"
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "./"},
      "description": "RO-Crate Metadata for the presentation: Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Presentation RO-Crate",
      "hasPart": [
        {"@id": "presentation.pdf"},
        {"@id": "#presentation"}
      ],
      "license": {"@id": "https://creativecommons.org/licenses/by/4.0/"},
      "publisher": "FAIR Symposium 2023",
      "datePublished": "2023-10-15"
    },
    {
      "@id": "#presentation",
      "@type": ["PresentationDigitalDocument", "File"],
      "name": "Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows",
      "description": "A critical analysis of Linked Data practices in FAIR workflows and Research Objects.",
      "author": {"@id": "#alice"},
      "dateCreated": "2023-10-10",
      "keywords": ["FAIR", "Linked Data", "RO-Crate", "Workflows"],
      "about": [
        {"@id": "https://doi.org/10.0000/example"},
        {"@id": "#event"}
      ]
    },
    {
      "@id": "#alice",
      "@type": "Person",
      "name": "Alice Researcher",
      "affiliation": "FAIR Institute",
      "identifier": "https://orcid.org/0000-0000-0000-0000"
    },
    {
      "@id": "#event",
      "@type": "Event",
      "name": "FAIR Symposium 2023",
      "startDate": "2023-10-15",
      "location": "Virtual"
    },
    {
      "@id": "presentation.pdf",
      "@type": "File",
      "name": "Slide deck (PDF)",
      "encodingFormat": "application/pdf",
      "contentSize": "2.5 MB"
    }
  ]
}

Anticipated Question:

"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"

Answer Framework

Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.

Anticipated Question:

"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"

Answer Framework

Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.

Anticipated Question:

"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"

Answer Framework

Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.

What about the below JSON-LD of an RO-Crate, can you summarise it for me?
(..)

1. Overview of the Dataset

Title: "BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment - Local outputs and comparative analysis"
Description:
The dataset contains local outputs and comparative analysis from a federated study across multiple European regions (Aragon, Spain; Brussels and Wallonia, Belgium; Finland). It evaluates the effectiveness of SARS-CoV-2 vaccines in preventing infections using observational, retrospective longitudinal data.
Research Question:
"How effective have the SARS-CoV-2 vaccination programmes been in preventing SARS-CoV-2 infections?"
Study Design:
Emulates a hypothetical Target Trial (TT) using real-world data, with a federated analysis approach to ensure data privacy and security.

2. Key Components

Conforms To:
The RO-Crate adheres to multiple standards, including:
License: CC-BY-4.0
Publisher: Zenodo
DOI: 10.5281/zenodo.11209362
Temporal Coverage: March 1, 2020 – December 20, 2022
Spatial Coverage:
- Aragon, Spain (GeoNames 3336899)
- Brussels-Capital, Belgium (GeoNames 2800867)
- Wallonia, Belgium (GeoNames 3337387)
- Finland (GeoNames 660013)
Keywords: COVID-19, Comparative effectiveness, International comparison, Causal inference, SARS-CoV-2

Funding

Grant: BY-COVID - Beyond COVID (101046203)
Funder: European Commission

3. Workflow and Computational Details

Main Workflow:
- Title: "BY-COVID WP5 T5.2 Baseline Use Case"
- DOI: 10.48546/workflowhub.workflow.502.4
- Programming Language: Quarto Markdown
- Inputs: Vaccine effectiveness data (CSV files)
- Outputs: Interactive HTML reports and Excel files
- Containerization: Docker and Singularity containers were used for execution in secure environments.

Workflow Runs

Regional Workflows:
- Aragon, Spain: Executed manually in a Docker container due to secure environment constraints.
- Brussels and Wallonia, Belgium: Executed in a manually configured R environment.
- Finland: Executed in a Singularity container.
Comparative Analysis:
Aggregated results from regional workflows into an interactive HTML report using a Quarto script.

4. Files and Outputs

Local Outputs

Interactive HTML Reports:
Each region produced 6 interactive reports covering:
1. Data Quality Assessment (DQA)
2. Validation
3. Imputation
4. Matching
5. Descriptive Analysis
6. Survival Analysis
Excel Files:
Summarized results for each region (e.g., results-survival-analysis-ESP.xlsx).

Comparative Analysis

Interactive HTML Report:
Aggregated results from all regions into a single interactive report.
Quarto Script:
Used to generate the comparative analysis report.

Reproducible Environment

R Environment:
- renv/: R environment with activation scripts.
- renv.lock: Lockfile for reproducible package versions.
- .Rprofile: R profile for environment configuration.

5. Provenance and Execution

Provenance:
Detailed execution logs for each workflow run, including:
- Container Images: Docker and Singularity images used for execution.
- Execution Times: Start and end times for each run.
- Agents: Researchers who executed the workflows (e.g., Marjan Meurisse, Francisco Estupiñán-Romero).
Action Status: All workflows were completed successfully.

6. Linked Data and Interoperability

Linked Data:
The RO-Crate uses JSON-LD to link entities (e.g., workflows, datasets, authors, organizations) using standardized vocabularies like schema.org, EDAM, and PROV-O.
Interoperability:
The crate is designed to be interoperable with tools like WorkflowHub and Zenodo, ensuring FAIR (Findable, Accessible, Interoperable, Reusable) compliance.

7. Significance

FAIR Compliance:
The RO-Crate ensures that the research outputs are FAIR by providing structured metadata, provenance, and reusable workflows.
Reproducibility:
The use of containers (Docker, Singularity) and reproducible environments (R, renv) ensures that the analysis can be rerun in the future.
Privacy and Security:
The federated approach and secure processing environments (SPEs) ensure that sensitive health data is not exposed.

Conclusion

This RO-Crate exemplifies how structured data formats like JSON-LD and standards like RO-Crate enhance reproducibility, interoperability, and FAIRness in research. While LLMs can assist in generating or querying metadata, structured data remains essential for ensuring machine-actionability, provenance tracking, and long-term preservation of research outputs

3. Workflow and Computational Details

Main Workflow:
- Title: "BY-COVID WP5 T5.2 Baseline Use Case"
- DOI: 10.48546/workflowhub.workflow.502.4
- Programming Language: Quarto Markdown
- Inputs: Vaccine effectiveness data (CSV files)
- Outputs: Interactive HTML reports and Excel files
- Containerization: Docker and Singularity containers were used for execution in secure environments.

Workflow Runs

Regional Workflows:
- Aragon, Spain: Executed manually in a Docker container due to secure environment constraints.
- Brussels and Wallonia, Belgium: Executed in a manually configured R environment.
- Finland: Executed in a Singularity container.
Comparative Analysis:
Aggregated results from regional workflows into an interactive HTML report using a Quarto script.

4. Files and Outputs

Local Outputs

Interactive HTML Reports:
Each region produced 6 interactive reports covering:
1. Data Quality Assessment (DQA)
2. Validation
3. Imputation
4. Matching
5. Descriptive Analysis
6. Survival Analysis
Excel Files:
Summarized results for each region (e.g., results-survival-analysis-ESP.xlsx).

Comparative Analysis

Interactive HTML Report:
Aggregated results from all regions into a single interactive report.
Quarto Script:
Used to generate the comparative analysis report.

Reproducible Environment

R Environment:
- renv/: R environment with activation scripts.
- renv.lock: Lockfile for reproducible package versions.
- .Rprofile: R profile for environment configuration.

5. Provenance and Execution

Provenance:
Detailed execution logs for each workflow run, including:
- Container Images: Docker and Singularity images used for execution.
- Execution Times: Start and end times for each run.
- Agents: Researchers who executed the workflows (e.g., Marjan Meurisse, Francisco Estupiñán-Romero).
Action Status: All workflows were completed successfully.

6. Linked Data and Interoperability

Linked Data:
The RO-Crate uses JSON-LD to link entities (e.g., workflows, datasets, authors, organizations) using standardized vocabularies like schema.org, EDAM, and PROV-O.
Interoperability:
The crate is designed to be interoperable with tools like WorkflowHub and Zenodo, ensuring FAIR (Findable, Accessible, Interoperable, Reusable) compliance.

7. Significance

FAIR Compliance:
The RO-Crate ensures that the research outputs are FAIR by providing structured metadata, provenance, and reusable workflows.
Reproducibility:
The use of containers (Docker, Singularity) and reproducible environments (R, renv) ensures that the analysis can be rerun in the future.
Privacy and Security:
The federated approach and secure processing environments (SPEs) ensure that sensitive health data is not exposed.

Conclusion

Challenging Linked Data practices for FAIR Research Objects and Computational Workflows

By Stian Soiland-Reyes

Challenging Linked Data practices for FAIR Research Objects and Computational Workflows

Presented 2025-02-19 at Inaugural seminar at Department of Computer Science, The University of Manchester

a month ago
68

Stian Soiland-Reyes PRO

Technical architect at eScience Lab, Dept of Computer Science, The University of Manchester. Open Source research software engineer. Interest: Linked Data, Web, provenance, annotations, Open Science, reproducible research results.

1998–2004

2000: ..a lot more Web!

Web standards

2004–2006

AI before it become a buzzword (again)

Computational reproducibility (~2004)

First time travel

2006–

Time travels in Manchester

Workflow systems

Computational Data Analysis Workflow Systems

An incomplete list

Computational Data Analysis Workflow Systems

An incomplete list

Workflow provenance

PROV: Standard for provenance

Which PROV format?

Linked Data

Aim of FAIR principles

2019: My PhD journey starts

Cultural heritage records

Research Object Crate

Aims of

FAIR Research Objects

Capturing workflows as FAIR Research Objects

Pushing the envelope of FAIR

Five Safes principles

FAIR Research Objects can be created in a pragmatic way

Thank you!!

Anticipated Question:

Answer Framework

Anticipated Question:

Answer Framework

Anticipated Question:

Answer Framework

1. Overview of the Dataset

2. Key Components

Funding

3. Workflow and Computational Details

Workflow Runs

4. Files and Outputs

Local Outputs

Comparative Analysis

Reproducible Environment

5. Provenance and Execution

6. Linked Data and Interoperability

7. Significance

Conclusion

3. Workflow and Computational Details

Workflow Runs

4. Files and Outputs

Local Outputs

Comparative Analysis

Reproducible Environment

5. Provenance and Execution

6. Linked Data and Interoperability

7. Significance

Conclusion

Challenging Linked Data practices for FAIR Research Objects and Computational Workflows

More from Stian Soiland-Reyes