iRODS Overview

June 13, 2016

Public Health England

London, England

Jason M. Coposky

@jason_coposky

Interim Executive Director

iRODS is open source software for…

• Working with data distributed across storage technologies

• Annotating and searching data with rich metadata

• Implementing access control, auditing, preservation, organization, and data movement policies

• Providing a single interface to share data between organizations

Data Virtualization

  • Standard file systems: Any mount point
  • Archival storage: HPSS, TSM
  • Object stores: Cleversafe, DDN WOS, Ceph/Rados
  • Cloud-based storage: Amazon S3
  • Separates Logical and Physical
    • ​Logical - entry in the catalog
    • Physical - a single replica on a storage resource

iRODS presents multiple separate storage technologies in a unified namespace.

Data Discovery

  • Metadata can be system- or user-generated.
  • Users can find data using features such as description, study ID, access date.
  • Metadata can be used to link processed results to raw data (i.e., tracking provenance).
  • Administrators can use metadata to control policy, such as archiving and access control policies.

iRODS provides a catalog, the iCAT, that links data and metadata.

Workflow Automation

  • API calls, database, resource and authentication operations
  • iRODS rule engines execute PEP implementations
  • PEP implementations can influence, deny or provide additional context to each operation

iRODS lets you use any operation within the system to trigger a programmatic action

Secure Collaboration

  • Described as a Federation of iRODS Zones
  • Users may access data in resources in other Zones anywhere
  • A user from a remote zone must be granted access after federation
  • A remote zone's data management policy is enforced for data accessed within that zone 

iRODS lets you share data across administrative units at any time after deployment

Questions?

iRODS Server Architecture

  • Metadata Catalog
    • Where we write everything down

 

  • Catalog Service Provider
    • Server which provides access to the metadata catalog

 

  • Catalog Service Consumer
    • Distributed nodes to provide access to storage and other resources

Catalog Service Consumer

Servers which provide access to storage resources

  • Connect to the Catalog Service Provider for
    • resource configuration
    • authentication
    • system metadata
    • user assigned metadata
  • Provide scalable access to iRODS services
  • May be geographically distributed
  • May have an arbitrary number of resources attached

Catalog Service Provider

Same capabilities as the Consumer with the addition of a database plugin

  • May serve storage capabilities
  • Provides access to the metadata catalog
  • May be placed in a High Availability configuration for failover and load balancing

The iRODS Metadata Catalog

  • Relational Database
    • postgres, mysql, or oracle
  • Single source of truth for the Zone
    • Holds users, groups, resources, system metadata, user metadata
  • Co-resident with iRODS or a clustered server farm
  • Referenced by a database plugin implemented with odbc

iRODS Data Flow

iRODS Clients

  • Command Line
    • iCommands

 

 

  • Web interfaces
    • Cloud Browser
    • Metalnx
  • Desktop
    • Kanki
    • Cyberduck

 

  • Services
    • NFS
    • WebDAV

Questions?

The iRODS Plugin Architecture

Made with Slides.com