Computation and Data Management
Jason Coposky, Interim Executive Director
Goals
Trends in Data with respect to Computation
Drive towards "reproducible science" - just seen at MSST 2016
This is already a feature provided by iRODS implemented one of two ways:
Driven by the layer in the stack at which iRODS is integrated
A Health Science Institute
1. Instrument or user palces raw data into landing zone subdirectory
2. Staging rule moves data from landing zone to LTS
3. Metadata extraction rule pulls m/d from data AND organizes raw data into
project collections
4. Data is replicated to high performance storage for processing
5. Processing results replicated to long term storage
6. Trim rule periodically deletes replicas from high performance storage
Computational Resources
iRODS 'resource server' as a compute node - take the compute to the data
Derived from the iPlant project at the University of Arizona
https://docs.google.com/presentation/d/1gyYCU0YZGDO-MHNZxd41t5FcTlqhy7uMKQJsaz2qitc/edit?usp=sharing
Reference Implementations
Derive a collection of use cases from existing IBM customers?
Build a packaged reference implementation of 'science in a box' for:
Work with RENCI on a Proof of Concept and combine both approaches into a flexible reference architecture
iRODS and IBM