iRODS Overview
November 2, 2016
Bibliothèque et Archives nationales du Québec
Québec, Montréal, Canada
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
iRODS is open source software for…
• Working with data distributed across storage technologies
• Annotating and searching data with rich metadata
• Implementing access control, auditing, preservation, organization, and data movement policies
• Providing a single interface to share data between organizations
Data Virtualization
iRODS presents multiple separate storage technologies in a unified namespace.
Data Virtualization
Logical Path
Physical Path(s)
Data Virtualization
Logical Path | /tempZone/home/rods/thefile.txt |
Physical Path(s) (replicas) |
/var/lib/irods/iRODS/Vault/home/rods/thefile.txt /tmp/u2vault/home/rods/thefile.txt /tmp/u1vault/home/rods/thefile.txt |
$ ils -L /tempZone/home/rods/thefile.txt rods 0 demoResc 29606 2016-10-05.09:05 & thefile.txt generic /var/lib/irods/iRODS/Vault/home/rods/thefile.txt rods 1 repl;u2 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u2vault/home/rods/thefile.txt rods 2 repl;u1 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u1vault/home/rods/thefile.txt
Data Discovery
iRODS provides a catalog, the iCAT, that links data and metadata.
Workflow Automation
iRODS lets you use any operation within the system to trigger a programmatic action
Secure Collaboration
iRODS lets you share data across administrative units at any time after deployment
Glossary
Agent | An Agent is an instance of a server process that handles application programming interface (API) requests. Each time a client connects to an iRODS server, the server spawns an agent and a network connection is established between the agent and the requesting client. |
Catalog Service Consumer | An iRODS Server within an iRODS Zone that does not hold the connection to the iCAT, but is employed for distributed data management. |
Catalog Service Provider | An iRODS Server within an iRODS Zone that holds the connection to (i.e., communicates with) the iCAT. |
Collection | A Collection is the logical representations of physical containers, similar to directories or folders that are found in a file system. A Collection can have sub-collections, and hence provides a hierarchical structure. |
Composable Resources | Composable Resources are plugins that allow you to manage storage and retrieval of data on storage devices. There are two types of composable resources: Coordinating and Storage. |
Control Plane | The Control Plane receives status updates from all servers, and issues commands to servers to pause, resume, shut down, etc. |
Coordinating Resource | A Coordinating Resource is a type of Composable Resource that actively makes decisions about which physical storage device will receive or serve up a Data Object. |
Data Object | A Data Object is the logical representation of data that maps to one or more physical instances of the data at rest in Storage Resources. |
Delayed Execution Rule | A
Delayed Execution Rule is a rule that invokes the delay keyword (i.e., a reserved word), which places the rule script in the delayed execution queue rather than immediately executing the rule.
|
Glossary
Grid | The hardware, operating system, and other machinery that supports a Zone. |
iCAT | The iCAT, or iRODS Metadata Catalog, is a database (e.g. PostgreSQL, MySQL, Oracle) that stores metadata about the Data Objects in an iRODS Zone. There is one iCAT per iRODS Zone. |
iCommands | The iCommands are Unix utilities that give users a command-line interface (CLI) to operate on data stored within iRODS. |
Microservice | A microservice is a small, well-defined C/C++ procedure that performs a server-side task and is either compiled into the iRODS server code or packaged independently as a shared object. Rules invoke Microservices to implement data management policies. |
Policy Enforcement Point (PEP) | A hook within the code of the iRODS Agent that invokes an interpreted rule script via the iRODS rule engine for the purpose of influencing a data management operation. |
Replica | An identical, physical copy of a Data Object. |
Storage Resource | A Storage Resource is the logical representation of—or pointer to—a physical storage device. They include the hostname and the directory path to the location of the Data Object on the storage device. |
Vault | The physical location of Data Objects on a storage device. For example, Vaults can be located on a Unix file system, a Ceph cluster, or on Amazon S3. |
Workflow | Some form of computation or action performed on Data Objects. Can be defined by a series of rules. |
Zone | An iRODS deployment, specifically the logical aspect of iRODS serviced by the iRODS Remote Procedure Call (RPC) application programming interface (API). |
Zone Report | A snapshot of an iRODS Zone, retrieved by using the izonereport iCommand. |
Questions?
iRODS Server Architecture
Catalog Service Consumer
Servers which provide access to storage resources
Catalog Service Provider
Same capabilities as the Consumer with the addition of a database plugin
The iRODS Metadata Catalog
iRODS Data Flow
iRODS Clients
Questions?
The iRODS Plugin Architecture