Terrell Russell, Ph.D

Executive Director, iRODS Consortium

Director of Data Management, RENCI

iRODS Overview

and UGM2024

July 18, 2024

RENCI Lunch and Learn

Chapel Hill, NC

The iRODS Consortium

Founded in 2013 by RENCI, DICE, and DDN

The iRODS Consortium

Our Mission

  • Continuous Improvement
  • Grow the Community
  • Standardization
  • Show value to our Membership

Our Membership

Consortium

Member

Consortium

Member

Consortium

Member

Since iRODS UGM 2023

  • 22 Membership Renewals
  • 2 New Members
  • 2 New Service Contracts
  • Multiple Proofs of Concept
  • 15 Conferences and Events
  • 2 New Hires
  • 3 Internships

What is iRODS

Open Source

  • C++ client-server architecture
  • BSD-3 Licensed, install it today and try before you buy

 

Distributed

  • Runs on a laptop, a cluster, on premises or geographically distributed

 

Data Centric & Metadata Driven

  • Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer

Why use iRODS?

People need a solution for:

  • Managing large amounts of data across various storage technologies
  • Controlling access to data
  • Searching their data quickly and efficiently
  • Automation

 

The larger the organization, the more they need software like iRODS.

iRODS Core Competencies

The underlying technology categorized into four areas

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Projection of the Physical into the Logical

Logical Path

Physical Path(s)

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS supports automated and user-provided metadata which makes your data and infrastructure more discoverable, operational, and valuable.

Metadata Everywhere

Workflow Automation

Policy Enforcement Points (PEPs) are triggered by every operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine framework provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Dynamic Policy Enforcement

The iRODS rule may:

  • restrict access
  • log for audit and reporting
  • provide additional context
  • send a notification

Dynamic Policy Enforcement

A single API call expands to many plugin operations all of which may invoke policy enforcement

Plugin Interfaces:

  • Authentication
  • Database
  • Storage
  • Network
  • Rule Engine
  • Microservice
  • RPC API

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

What is a Policy

A Definition of Policy

 

 

A set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people...

 

 

So how does iRODS do this?

iRODS Policies

The reflection of real world data management decisions in computer actionable code.

 

(a plan of what to do in particular situations)

Possible Policies

  • Data Movement
  • Data Verification
  • Data Retention
  • Data Replication
  • Data Placement
  • Checksum Validation
  • Metadata Extraction
  • Metadata Application
  • Metadata Conformance
  • Replica Verification
  • Vault to Catalog Verification
  • Catalog to Vault Verification
  • ...

Policy Composition

Consider Storage Tiering:

 

  • Violating Object Identification
  • Data Movement
    • Data Replication
    • Data Verification
  • Data Retention
  • Packaged and supported solutions
  • Require configuration not code
  • Derived from the majority of use cases observed in the user community

iRODS Capabilities

Automated Ingest - Landing Zone

Automated Ingest - Filesystem Scanning

Storage Tiering

Core Competencies

Policy

Capabilities

Indexing

Core Competencies

Policy

Capabilities

Publishing

Filesystem Synchronization

Data to Compute

Compute to Data

Data Transfer Nodes

iRODS Clients

Protocol Plumbing - Presenting iRODS as other Protocols

  • WebDAV
  • FUSE
  • HTTP
  • NFS
  • SFTP
  • K8s CSI
  • S3

Over the last few years, the ecosystem around the iRODS server has continued to expand.

 

Integration with other types of systems is a valuable way to increase accessibility without teaching existing tools about the iRODS protocol or introducing new tools to users.

 

With some plumbing, existing tools get the benefit of visibility into an iRODS deployment.

16th Annual iRODS User Group Meeting

16th Annual iRODS User Group Meeting

2023-2024 Working Groups

Technology Working Group

  • Goal: To keep everyone up to date, provide a forum for roadmap discussion and collaboration opportunities
    • All iRODS Consortium Membership

 

Metadata Templates Working Group

  • Goal: To define a standardized process for the application and management of metadata templates by the iRODS Server
    • NIEHS, Utrecht, Maastricht, Arizona / CyVerse, KU Leuven

 

Authentication Working Group

  • Goal: To provide a more flexible authentication mechanism to the iRODS Server
    • SURF, NIEHS, Sanger, Arizona / CyVerse, IT4Innovation, Utrecht, KU Leuven

S3 Working Group

  • Goal: To develop tools to present iRODS as S3-compatible storage to existing S3 clients
    • Arizona / CyVerse, NIEHS, SURF

 

 

Imaging Working Group

  • Goal: To provide a standardized suite of imaging policies and practices for integration with existing tools and pipelines
    • New York University, Santa Clara University, UC San Diego, NIEHS, Harvard, Arizona / CyVerse, Open Microscopy Environment (OMERO), UNC Neuroscience Microscopy Core, KU Leuven, Maastricht, NYU Langone, UMass Medical, Netherlands Cancer Institute, Sanger, UCSC, Crick (UK), U. Osnabrück, CRS4 (Italy), RIKEN (Japan)

Organized community efforts to standardize protocols, technologies, and methodologies

UGM2024 - Core Development Team Talks

  • Separate Talks

    • Phil Owen

      • iRODS Build and Test v9: Automation via GitHub and Kubernetes

    • Markus Kitsinger

      • iRODS Build and Packaging: 2024 Update

    • Kory Draughn and Martin Flores

      • iRODS HTTP API v0.3.0 with OpenID Connect

    • Justin James

      • iRODS S3 API v0.2.0 with Multipart

    • Terrell Russell

      • DAViDD: Initial data management solution for UNC's READDI AViDD Center

      • iRODS Metadata Templates Working Group: Building Blocks and Lessions Learned

  • Included in the Technology Update

    • Kory Draughn

      • Server Updates
      • Indexing Capability Plugin
      • Python Rule Engine Plugin

      • NFSRODS

    • Derek Dong

      • Metadata Guard Rule Engine Plugin

    • Justin James

      • S3 Resource Plugin

      • Globus Connector

    • Daniel Moore

      • Python iRODS Client

UGM2024 - Other Selected Talks

  • Safeguard your sensitive data in iRODS using data encryption feature available in GoCommands (CyVerse / University of Arizona)

 

  • Streamlining iRODS: Kafka-based Data Pipeline (KU Leuven)

 

  • The Intersection Between Policy-Based Data Management and Emerging Health Science Data Standards (NIEHS)

 

  • iRODS-based system turbocharged next-gen sequencing analysis during pandemic and beyond (Dutch National Institute for Public Health and the Environment (RIVM))

 

  • iRODS Security Challenges Within an Enterprise Environment (Dow)

iRODS Internships - Summer 2024

Convert existing web applications to our new HTTP API (ReactJS + HTTP)
The relatively new iRODS Zone Management Tool is due for its first refactor. Originally built to talk with a REST API, it needs to be converted to talk to the new iRODS HTTP API. If this work on the administrator tool proves pretty straightforward, we are interested in evaluating a similar refactor for our user-level GUI, Metalnx (or to even start designing a new webapp from scratch).

 

Create new client libraries around our new HTTP API (Various Languages)
Our new iRODS HTTP API is making it easier for developers to interact with the iRODS ecosystem. We would like to help them even more by providing new client libraries in various languages that wrap their native or library-provided http calls. We are most interested in Java, Python, and Javascript, but any language will provide a learning opportunity and help map out the space for other languages.

Big Picture

Core

  • 4.3 - Focus on stability, bug fixes, plugins, clients

  • 5.0 - Modernize the deployment process, improve determinism, libstdc++

 

Clients

  • GUIs (ZMT, Metalnx, et al.)

  • Onboarding and Syncing (Automated Ingest)

  • File System Integration (NFSRODS, SFTP)

  • iRODS HTTP API

  • iRODS S3 API

 

Continue building out policy components (Capabilities).

 

We want installation and management of iRODS to become about policy design, composition, and configuration.

RENCI Lunch and Learn - iRODS Overview and UGM2024

By iRODS Consortium

RENCI Lunch and Learn - iRODS Overview and UGM2024

  • 78