iRODS Overview
November 14, 2016
Supercomputing 2016
Salt Lake City, Utah
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
iRODS is open source software for…
• Working with data distributed across storage technologies
• Annotating and searching data with rich metadata
• Implementing access control, auditing, preservation, organization, and data movement policies
• Providing a single interface to share data between organizations
Data Virtualization
- Standard file systems: Any mount point
- Archival storage: HPSS, TSM
- Object stores: Cleversafe, DDN WOS, Ceph/Rados
- Cloud-based storage: Amazon S3
-
Separates Logical and Physical
- Logical - entry in the catalog
- Physical - a single replica on a storage resource
iRODS presents multiple separate storage technologies in a unified namespace.
Data Virtualization
Logical Path
Physical Path(s)
Data Virtualization
Logical Path | /tempZone/home/rods/thefile.txt |
Physical Path(s) (replicas) |
/var/lib/irods/iRODS/Vault/home/rods/thefile.txt /tmp/u2vault/home/rods/thefile.txt /tmp/u1vault/home/rods/thefile.txt |
$ ils -L /tempZone/home/rods/thefile.txt rods 0 demoResc 29606 2016-10-05.09:05 & thefile.txt generic /var/lib/irods/iRODS/Vault/home/rods/thefile.txt rods 1 repl;u2 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u2vault/home/rods/thefile.txt rods 2 repl;u1 29606 2016-10-05.09:06 & thefile.txt generic /tmp/u1vault/home/rods/thefile.txt
Data Discovery
- Metadata can be system- or user-generated.
- Users can find data using features such as description, study ID, access date.
- Metadata can be used to link processed results to raw data (i.e., tracking provenance).
- Administrators can use metadata to control policy, such as archiving and access control policies.
iRODS provides a catalog, the iCAT, that links data and metadata.
Workflow Automation
- API calls, database, resource and authentication operations
- iRODS rule engines execute PEP implementations
- PEP implementations can influence, deny or provide additional context to each operation
iRODS lets you use any operation within the system to trigger a programmatic action
Secure Collaboration
- Described as a Federation of iRODS Zones
- Users may access data in resources in other Zones anywhere
- A user from a remote zone must be granted access after federation
- A remote zone's data management policy is enforced for data accessed within that zone
iRODS lets you share data across administrative units at any time after deployment
Glossary
Agent | An Agent is an instance of a server process that handles application programming interface (API) requests. Each time a client connects to an iRODS server, the server spawns an agent and a network connection is established between the agent and the requesting client. |
Catalog Service Consumer | An iRODS Server within an iRODS Zone that does not hold the connection to the iCAT, but is employed for distributed data management. |
Catalog Service Provider | An iRODS Server within an iRODS Zone that holds the connection to (i.e., communicates with) the iCAT. |
Collection | A Collection is the logical representations of physical containers, similar to directories or folders that are found in a file system. A Collection can have sub-collections, and hence provides a hierarchical structure. |
Composable Resources | Composable Resources are plugins that allow you to manage storage and retrieval of data on storage devices. There are two types of composable resources: Coordinating and Storage. |
Control Plane | The Control Plane receives status updates from all servers, and issues commands to servers to pause, resume, shut down, etc. |
Coordinating Resource | A Coordinating Resource is a type of Composable Resource that actively makes decisions about which physical storage device will receive or serve up a Data Object. |
Data Object | A Data Object is the logical representation of data that maps to one or more physical instances of the data at rest in Storage Resources. |
Delayed Execution Rule | A
Delayed Execution Rule is a rule that invokes the delay keyword (i.e., a reserved word), which places the rule script in the delayed execution queue rather than immediately executing the rule.
|
Glossary
Grid | The hardware, operating system, and other machinery that supports a Zone. |
iCAT | The iCAT, or iRODS Metadata Catalog, is a database (e.g. PostgreSQL, MySQL, Oracle) that stores metadata about the Data Objects in an iRODS Zone. There is one iCAT per iRODS Zone. |
iCommands | The iCommands are Unix utilities that give users a command-line interface (CLI) to operate on data stored within iRODS. |
Microservice | A microservice is a small, well-defined C/C++ procedure that performs a server-side task and is either compiled into the iRODS server code or packaged independently as a shared object. Rules invoke Microservices to implement data management policies. |
Policy Enforcement Point (PEP) | A hook within the code of the iRODS Agent that invokes an interpreted rule script via the iRODS rule engine for the purpose of influencing a data management operation. |
Replica | An identical, physical copy of a Data Object. |
Storage Resource | A Storage Resource is the logical representation of—or pointer to—a physical storage device. They include the hostname and the directory path to the location of the Data Object on the storage device. |
Vault | The physical location of Data Objects on a storage device. For example, Vaults can be located on a Unix file system, a Ceph cluster, or on Amazon S3. |
Workflow | Some form of computation or action performed on Data Objects. Can be defined by a series of rules. |
Zone | An iRODS deployment, specifically the logical aspect of iRODS serviced by the iRODS Remote Procedure Call (RPC) application programming interface (API). |
Zone Report | A snapshot of an iRODS Zone, retrieved by using the izonereport iCommand. |
Questions?
iRODS Server Architecture
- Metadata Catalog
- Where we write everything down
- Catalog Service Provider
- Server which provides access to the metadata catalog
- Catalog Service Consumer
- Distributed nodes to provide access to storage and other resources
Catalog Service Consumer
Servers which provide access to storage resources
- Connect to the Catalog Service Provider for
- resource configuration
- authentication
- system metadata
- user assigned metadata
- Provide scalable access to iRODS services
- May be geographically distributed
- May have an arbitrary number of resources attached
Catalog Service Provider
Same capabilities as the Consumer with the addition of a database plugin
- May serve storage capabilities
- Provides access to the metadata catalog
- May be placed in a High Availability configuration for failover and load balancing
The iRODS Metadata Catalog
- Relational Database
- postgres, mysql, or oracle
- Single source of truth for the Zone
- Holds users, groups, resources, system metadata, user metadata
- Co-resident with iRODS or a clustered server farm
- Referenced by a database plugin implemented with odbc
iRODS Data Flow
iRODS Clients
- Command Line
- iCommands
- Web interfaces
- Cloud Browser
- Metalnx
- Desktop
- Kanki
- Cyberduck
- Services
- NFS
- WebDAV
Questions?
The iRODS Plugin Architecture
SC16 - iRODS Overview
By iRODS Consortium
SC16 - iRODS Overview
- 2,360