iRODS Overview

October 5, 2016

RENCI

Chapel Hill, NC

Terrell Russell, Ph.D.

@terrellrussell

Acting Chief Technologist, iRODS Consortium

iRODS is open source software for…

• Working with data distributed across storage technologies

• Annotating and searching data with rich metadata

• Implementing access control, auditing, preservation, organization, and data movement policies

• Providing a single interface to share data between organizations

Data Virtualization

  • Standard file systems: Any mount point
  • Archival storage: HPSS, TSM
  • Object stores: Cleversafe, DDN WOS, Ceph/Rados
  • Cloud-based storage: Amazon S3
  • Separates Logical and Physical
    • ​Logical - entry in the catalog
    • Physical - a single replica on a storage resource

iRODS presents multiple separate storage technologies in a unified namespace.

Data Virtualization

Logical Path

 

 

 

Physical Path(s)

Data Virtualization

Logical Path /tempZone/home/rods/thefile.txt
Physical Path(s)
(replicas)
/var/lib/irods/iRODS/Vault/home/rods/thefile.txt
/tmp/u2vault/home/rods/thefile.txt
/tmp/u1vault/home/rods/thefile.txt
$ ils -L /tempZone/home/rods/thefile.txt
  rods              0 demoResc        29606 2016-10-05.09:05 & thefile.txt
        generic    /var/lib/irods/iRODS/Vault/home/rods/thefile.txt
  rods              1 repl;u2        29606 2016-10-05.09:06 & thefile.txt
        generic    /tmp/u2vault/home/rods/thefile.txt
  rods              2 repl;u1        29606 2016-10-05.09:06 & thefile.txt
        generic    /tmp/u1vault/home/rods/thefile.txt

 

Data Discovery

  • Metadata can be system- or user-generated.
  • Users can find data using features such as description, study ID, access date.
  • Metadata can be used to link processed results to raw data (i.e., tracking provenance).
  • Administrators can use metadata to control policy, such as archiving and access control policies.

iRODS provides a catalog, the iCAT, that links data and metadata.

Workflow Automation

  • API calls, database, resource and authentication operations
  • iRODS rule engines execute PEP implementations
  • PEP implementations can influence, deny or provide additional context to each operation

iRODS lets you use any operation within the system to trigger a programmatic action

Secure Collaboration

  • Described as a Federation of iRODS Zones
  • Users may access data in resources in other Zones anywhere
  • A user from a remote zone must be granted access after federation
  • A remote zone's data management policy is enforced for data accessed within that zone 

iRODS lets you share data across administrative units at any time after deployment

Glossary

Agent An Agent is an instance of a server process that handles application programming interface (API) requests. Each time a client connects to an iRODS server, the server spawns an agent and a network connection is established between the agent and the requesting client.
Catalog Service Consumer An iRODS Server within an iRODS Zone that does not hold the connection to the iCAT, but is employed for distributed data management.
Catalog Service Provider An iRODS Server within an iRODS Zone that holds the connection to (i.e., communicates with) the iCAT.
Collection A Collection is the logical representations of physical containers, similar to directories or folders that are found in a file system. A Collection can have sub-collections, and hence provides a hierarchical structure.
Composable Resources Composable Resources are plugins that allow you to manage storage and retrieval of data on storage devices. There are two types of composable resources: Coordinating and Storage.
Control Plane The Control Plane receives status updates from all servers, and issues commands to servers to pause, resume, shut down, etc.
Coordinating Resource A Coordinating Resource is a type of Composable Resource that actively makes decisions about which physical storage device will receive or serve up a Data Object.
Data Object A Data Object is the logical representation of data that maps to one or more physical instances of the data at rest in Storage Resources.
Delayed Execution Rule A Delayed Execution Rule is a rule that invokes the delay keyword (i.e., a reserved word), which places the rule script in the delayed execution queue rather than immediately executing the rule. 

Glossary

Grid The hardware, operating system, and other machinery that supports a Zone. 
iCAT The iCAT, or iRODS Metadata Catalog, is a database (e.g. PostgreSQL, MySQL, Oracle) that stores metadata about the Data Objects in an iRODS Zone. There is one iCAT per iRODS Zone.
iCommands The iCommands are Unix utilities that give users a command-line interface (CLI) to operate on data stored within iRODS.
Microservice A microservice is a small, well-defined C/C++ procedure that performs a server-side task and is either compiled into the iRODS server code or packaged independently as a shared object. Rules invoke Microservices to implement data management policies.
Policy Enforcement Point (PEP) A hook within the code of the iRODS Agent that invokes an interpreted rule script via the iRODS rule engine for the purpose of influencing a data management operation.
Replica An identical, physical copy of a Data Object.
Storage Resource A Storage Resource is the logical representation of—or pointer to—a physical storage device. They include the hostname and the directory path to the location of the Data Object on the storage device.
Vault The physical location of Data Objects on a storage device. For example, Vaults can be located on a Unix file system, a Ceph cluster, or on Amazon S3.
Workflow Some form of computation or action performed on Data Objects.  Can be defined by a series of rules.
Zone An iRODS deployment, specifically the logical aspect of iRODS serviced by the iRODS Remote Procedure Call (RPC) application programming interface (API).
Zone Report A snapshot of an iRODS Zone, retrieved by using the izonereport iCommand.

Questions?

iRODS Server Architecture

  • Metadata Catalog
    • Where we write everything down

 

  • Catalog Service Provider
    • Server which provides access to the metadata catalog

 

  • Catalog Service Consumer
    • Distributed nodes to provide access to storage and other resources

Catalog Service Consumer

Servers which provide access to storage resources

  • Connect to the Catalog Service Provider for
    • resource configuration
    • authentication
    • system metadata
    • user assigned metadata
  • Provide scalable access to iRODS services
  • May be geographically distributed
  • May have an arbitrary number of resources attached

Catalog Service Provider

Same capabilities as the Consumer with the addition of a database plugin

  • May serve storage capabilities
  • Provides access to the metadata catalog
  • May be placed in a High Availability configuration for failover and load balancing

The iRODS Metadata Catalog

  • Relational Database
    • postgres, mysql, or oracle
  • Single source of truth for the Zone
    • Holds users, groups, resources, system metadata, user metadata
  • Co-resident with iRODS or a clustered server farm
  • Referenced by a database plugin implemented with odbc

iRODS Data Flow

iRODS Clients

  • Command Line
    • iCommands

 

 

  • Web interfaces
    • Cloud Browser
    • Metalnx
  • Desktop
    • Kanki
    • Cyberduck

 

  • Services
    • NFS
    • WebDAV

Questions?

The iRODS Plugin Architecture

RENCI - iRODS Overview

By iRODS Consortium

RENCI - iRODS Overview

  • 1,926