- Lots of experimental tools
- Lots of bash/R/python scripts
- Manual steps in data processing pipelines
- Hacks around software dependencies and requirements
- Monolithic code base for analysis tools
- Various programming languages used for various pipelines
- Inefficient steps in larger data processing pipelines.
Standards are more important than software.
Lots of headaches for people trying to reproduce or build on the work of others
Standards are better than Software.
Ways to standardize the flow of a data processing pipeline
Common Workflow Language
- Scripts become tools
- Predictable inputs and outputs
- Tools are chained in workflows
- Tools are packaged up utilizing container technologies
- Implicit dependencies
By Eugene de Beste