iRODS Metadata Templates

Working Group:

Building Blocks and Lessons Learned

May 28-31, 2024

iRODS User Group Meeting 2024

Amsterdam, Netherlands

Terrell Russell, Ph.D

Executive Director

iRODS Consortium

Metadata Templates Working Group

Founded mid-2018

 

 

Motivation

 

iRODS needs to help curators define and validate 'good' metadata for their pipelines and environments.

Pre-History

Applications - Boiling the Ocean

 

2014-2016

  • Metalnx
  • CloudBrowser
  • Yoda
  • DataHub
  • Dataverse
  • CyVerse

Pre-History - Metalnx

Pre-History - Metalnx

Pre-History - CloudBrowser

  • JSON Schema to define template
  • template themselves defined schema for metadata
  • stored in .irods collection
  • parser, validator, resolver, exporter
  • handled combining/merging templates into java object
  • Mike Conway, Cesar Garde, Terrell Russell

Pre-History - CloudBrowser and Jargon

Defining some of these endpoints in the web client and the Java client library led to discussion about a Swagger API (later known as OpenAPI).

 

This also restarted a conversation about a REST API for all of iRODS itself, but now to include some metadata template endpoints.

Pre-History - May 2017

TRiRODS

Rick Skarbez

Metadata Templates Working Group - Formed

June 2018

 

 

and by March 2019...

 

Metalnx metadata templates stored in the Metalnx database as jsonschema

 

 

Metadata Templates Working Group - June 2019

Maastricht and Utrecht demonstrated iRODS rules to provide a round trip from JSON to AVU to JSON

 

Non-Consortium implementations - some convergence appearing...

  • Utrecht / Yoda
  • Maastricht / DataHub
  • NIEHS / Data Commons
  • Arizona / CyVerse

 

CEDAR coming online as interface / home for editor

Metadata Templates Working Group - June 2019

Identified Five Elements

  • Definition / Representation of the Schema (CEDAR itself?, NIEHS)
  • Tools for template/schema creation / curation / versions / management (CEDAR itself?, NIEHS)
  • Tools for managing the data with relation to the templates (DataHub+, Yoda)
  • Translation from schema to AVUs and back (DataHub+, Yoda)
  • Multiple UIs / utilities handling the translation/presentation (Yoda, NIEHS)
  • (saved) Search queries and results, virtual collections

Metadata Templates Working Group - July 2019

Five Layers, reordered

Metadata Templates Working Group - Late 2019

September 2019

  • swimlanes, more separation of layers
  • first use of external schema applied to iRODS AVUs, Yoda
  • identification that atomic application of AVUs is more important than batch/multiple

 

October 2019

  • Operations in a Swagger API
    • Resolve MTs based on an object/collection
    • List attached MTs on an object/collection
    • Attach/Apply MT to an object/collection as required/optional
    • Remove MT from an object/collection
    • List overall available MT in the pool
    • Resolve JSON schema(s) that defines the metadata to be applied via template X to collection Y
    • POSSIBLE - Rasterize? Set of nested/attached schemas down into a single schema

Metadata Templates Working Group - Early 2020

February 2020

  • discussion of creation of 3-4 CEDAR JSON schemas for testing
  • discussion of using CEDAR as editor, then export to local defined schema
  • discussion of using API PEPs rather than database PEPs

 

April 2020 - Atomic AVUs merged into iRODS

 

July 2020

  • CEDAR as editor, but not publisher/host, needs to be elsewhere
  • investigation of schemas/json to xml/html/forms (jsonforms)

Metadata Templates Working Group - Late 2020

August 2020

  • Yoda has an atomic endpoint
  • discussion about aggregating templates recursively
  • ELEMENTS OF ARCHITECTURE
    • CREATION/DEFINITION of templates (punt to CEDAR / others)
    • HOSTING of templates (perhaps CEDAR, perhaps irods.org or github)
    • BINDING/MANAGEMENT of templates to collections/data (part of MTWG MVP)
    • USE of templates in GUI (part of MTWG MVP)
  • relevant API components
    • CLIENT/BROWSER: some javascript code to execute client side, wraps an Ajax POST call to the web server
    • WEB SERVER: passes the rule call onto iRODS
    • iRODS PYTHON RULE ENGINE: processes the api call


November 2020 - CEDAR moving to JSON-LD

Metadata Templates Working Group - Late 2020

August 2020

  • Yoda has an atomic endpoint
  • discussion about aggregating templates recursively
  • ELEMENTS OF ARCHITECTURE
    • CREATION/DEFINITION of templates (punt to CEDAR / others)
    • HOSTING of templates (perhaps CEDAR, perhaps irods.org or github)
    • BINDING/MANAGEMENT of templates to collections/data (part of MTWG MVP)
    • USE of templates in GUI (part of MTWG MVP)
  • relevant API components
    • CLIENT/BROWSER: some javascript code to execute client side, wraps an Ajax POST call to the web server
    • WEB SERVER: passes the rule call onto iRODS
    • iRODS PYTHON RULE ENGINE: processes the api call


November 2020 - CEDAR moving to JSON-LD

Metadata Templates Working Group - 2021

January 2021

  • gofair using jinja templates, mostly rendering/layout
  • assessment - maybe there is no 'one ring'
    • different applications will choose to handle rendering themselves
    • stick to the API
    • GUI asks for templates, renders it, sends filled information

 

February 2021

  • decision to be schema/application agnostic

 

July 2021

  • subject areas should drive this work
  • iRODS should not define or manage templates for anyone
  • iRODS should validate

Metadata Templates Working Group - 2022

February 2022

  • eResearchNZ validates this work
  • curators want to know/define what is required
  • and then enforcement / flagging for humans to come help

 

March 2022

  • KU Leuven building a portal (became ManGO)
  • templates / editor / required/optional, collection and data objects

 

June 2022 - We should have a working group whitepaper

 

August 2022

  • Community is 'ahead' of consortium
  • iRODS server should provide building blocks
  • Python now -> C++ later once agreed/good

 

October 2022

  • MIAME (Minimum Information About a Microarray Experiment)
  • machine actionable data management plans
  • validating - iRODS should be a consumer of these efforts

 

Metadata Templates Working Group - Early 2023

March 2023

  • RDA20 - Sweden - they are struggling with getting consensus
    • every discipline has own language/details
    • consistency is really hard / impossible
  • KU Leuven
    • wrote editor schema in javascript
    • working on versioning, new template affects old data
    • templates are namespaced, so no collisions
    • based on project-level management of associated templates

 

June 2023

  • KU Leuven - namespacing!

 

Metadata Templates Working Group - July 2023

  • MT get their own database table in iRODS?
  • KU Leuven - using template to render the form, not validate the metadata itself
    • two types of template?  form and metadata
  • IT4I - forcing users to only use a single schema
    • exporting to elastic for search / multiple zones
  • Microservices
    • Attach (type, schema, object_id)
      • Initially, type will just be 'url'
      • Could later be 'irods_schema' and store the id from the new table
      • Or 'form' for ManGO wrapper/form information
    • Detach (type, schema, object_id)
    • Validate (object_id, recursive)
      • Run gather (below) to build the effective json schema
      • Get and build json payload with all current AVUs
      • Run payload and schema through validatorReturn result (OK or failure/explanation)
    • Export/Collapse/Rasterize/Gather/Dump (object_id, recursive)
      • Find all associated schemas and construct effective schema
      • Recursive would check/gather all parents up to root
  • JSON Schema only, no JSON-LD

Metadata Templates Working Group - Late 2023

August 2023

  • Utrecht has done this separation
    • UI schema - react
    • metadata schema - JSON schema
    • research space - not required
    • vault space - requirements, full schema
  • schema information to be protected by metadata guard?

 

November 2023

  • initial Python rules
    • Attach and Detach - initial work done, need error checking
    • Gather - next
    • Validate - last, depends on Gather

Metadata Templates Working Group - 2024

January 2024

  • collection can have more than one schema

 

March 2024

  • gather - using AllOf to combine schemas or loop through all
  • users and groups and data objects and resources?   not for now

 

May 2024

  • implementation
    • attach
    • detach
    • gather - returns array of attached schemas, possible recursive
    • validate - data object
    • validate - collection (all data objects below)

Conclusions

  • Site-specific knowledge and interfaces are too diverse
  • Template management is too big a task for the server/policy

 

  • iRODS should focus on the capabilities and functionality
    • Rather than defining policy/schemas for applications and users
  • iRODS cannot / should not be defining the templates for anyone
    • Should provide PEPs / microservices / functions to validate
      • But not manage the templates themselves

 

  • Provide 70-80% of the original intent of metadata templates
  • Community to use/test/incorporate prototype Python functions
    • Once good... we port to C++ and ship with the server as microservices

Running Code

# attach a template
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_attach('*logical_path', '*schema_location', 'url')" \
    '*logical_path=/tempZone/home/rods/thedir%*schema_location=\
    https://raw.githubusercontent.com/fge/sample-json-schemas/master/jsonrpc2.0/jsonrpc-request-2.0.json' \
    ruleExecOut


# show AVU
$ imeta ls -C thedir
AVUs defined for collection /tempZone/home/rods/thedir:
attribute: irods::metadata_templates
value: https://raw.githubusercontent.com/fge/sample-json-schemas/master/jsonrpc2.0/jsonrpc-request-2.0.json
units: url


# detach a template
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_detach('*logical_path', '*schema_location', 'removeme')" \
    '*logical_path=/tempZone/home/rods/thedir%*schema_location=doesnotexist' \
    ruleExecOut
# attach a template
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_attach('*logical_path', '*schema_location', 'url')" \
    '*logical_path=/tempZone/home/rods/thedir%*schema_location=\
    https://raw.githubusercontent.com/fge/sample-json-schemas/master/jsonrpc2.0/jsonrpc-request-2.0.json' \
    ruleExecOut



# show AVU
$ imeta ls -C thedir
AVUs defined for collection /tempZone/home/rods/thedir:
attribute: irods::metadata_templates
value: https://raw.githubusercontent.com/fge/sample-json-schemas/master/jsonrpc2.0/jsonrpc-request-2.0.json
units: url



# detach a template
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_detach('*logical_path', '*schema_location', 'removeme')" \
    '*logical_path=/tempZone/home/rods/thedir%*schema_location=doesnotexist' \
    ruleExecOut

Running Code

# gather, print to stdout
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_gather('*logical_path', '*recursive', *schemas); \
    writeLine('stdout', *schemas)" \
    '*logical_path=/tempZone/home/rods/thedir%*recursive=0%*schemas=""' \
    ruleExecOut


# validate data object
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_gather('*logical_path', '*recursive', *schemas); \
    metadata_templates_data_object_validate('*data_object_path', *schemas, *rc); \
    writeLine('stdout', *rc)" \
    '*logical_path=/tempZone/home/rods/thedir%*recursive=0%*schemas=""%\
    *data_object_path=/tempZone/home/rods/thedir/a.txt%*rc=""' \
    ruleExecOut


# validate a collection
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance \
    "metadata_templates_collection_gather('*logical_path', '*recursive', *schemas); \
    metadata_templates_collection_validate('*logical_path', *schemas, *recursive, *errors); \
    writeLine('stdout', *errors)" \
    '*logical_path=/tempZone/home/rods/thedir%*recursive=0%*schemas=""%*errors=""' \
    ruleExecOut

Running Code

$ bash bats-core/bin/bats test_metadata_templates.bats

test_metadata_templates.bats

✓ collection - attach, gather, detach template

✓ attach bad schema

✓ validate data object

✓ validate collection

 

4 tests, 0 failures

Thank you!

Questions?

UGM 2024 - iRODS Metadata Templates Working Group: Building Blocks and Lessons Learned

By iRODS Consortium

UGM 2024 - iRODS Metadata Templates Working Group: Building Blocks and Lessons Learned

  • 216