Justin James

Applications Engineer

iRODS Consortium

May 11, 2022

GlobusWorld 2022

Chicago, IL and Virtual

iRODS Globus Connector

iRODS Globus Connector

Quick iRODS Overview

iRODS - Integrated Rule Oriented Data System

  • Open source
  • Distributed
  • Metadata Driven

 

A flexible framework for the abstraction of infrastructure

iRODS as the Integration Layer

iRODS Overview - Data Virtualization

Logical Path

Physical Path(s)

- POSIX file systems

- Object Stores (S3)

- Tape

iRODS Overview - Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Metadata Everywhere

iRODS Overview - Workflow Automation

Integrated scripting language which is triggered by any operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.
 

Rules can be written in C++, Python, or the native rule language.

Dynamic Policy Enforcement Points

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

iRODS Globus Connector - Overview

  • Provides access to iRODS storage resources from a Globus Connect Server

    • ​iRODS appears as a filesystem to Globus despite the storage type used underneath

    • Based on work from EUDAT

      • European Data Collaborative with enhancements by UNC/RENCI and our Globus partners
         

  • The connector is a plugin for the Globus Connect Server and a client to the iRODS servers
     

  • All Globus requests use the iRODS C++ client API to get information and transfer data to iRODS

 

 

iRODS Globus Connector - Globus Plugin Interface

The iRODS Globus Connector implements the globus_gfs_storage_iface_t interface.


The connector implements the following interface functions:
 

  • INIT - Called when a new session is initiated.  Reads the user environment and calls clientLogin().

  • DESTROY - Called at the end of the session.  Cleans up and calls rcDisconnect().

  • SEND - Called when client requests to receive a file.  Calls rcDataObjRead().

  • RECEIVE - Called when client requests to transfer a file to the server.  Calls rcDataObjWrite().

  • COMMAND - Called when a client sends a command to the server (see next slide).

  • STAT - Called when the server needs information about the file.  Calls rcObjStat().

iRODS Globus Connector - Globus Client Commands

The following client commands are implemented:

 

  • GLOBUS_GFS_CMD_MKD - Creates a collection in iRODS.   Calls rcCollCreate().
     
  • GLOBUS_GFS_CMD_RMD - Removes a collection.  Calls rcRmColl().
     
  • GLOBUS_GFS_CMD_DELE - Deletes an object.  Calls rcDataObjUnlink().
     
  • GLOBUS_GFS_CMD_CKSM - Gets the checksum for an object (see next slide).

 

iRODS Globus Connector - Recent Enhancements / Fixes

  • Migrated the code from C to C++.
    • ​Used RAII concepts for memory and resource cleanup.
       
  • Changed how file hashing works (see next slide).
     
  • Improved upload and download performance (see last slide).
     
  • Support incremental directory listings for very large directories.
    • After either X number of entries are encountered or Y seconds have passed since the last partial listing, send additional entries via globus_gridftp_server_finished_stat_partial().
       
  • Implemented heartbeats for long running checksum operations.
     
  • Implemented the realpath feature so that alternate paths don't allow users to bypass path restrictions.
     
  • Fixed some memory leaks in existing code and other minor bug fixes / performance improvements.

iRODS Globus Connector - Checksum/Hashing

Original implementation relied on the default hashing scheme in iRODS.

iRODS only supports MD5 and SHA256 and the algorithm used is system wide.


To support client-requested hashing and a larger set of hashing algorithms, the hash files are now calculated by the iRODS Globus Connector and stored in metadata as follows:
 

  • AVU Name - Globus::<algorithm>
  • AVU Value - <checksum value>
  • AVU Units - epoch time when the checksum was calculated

 

If the file has been updated since the last hash has been calculated, a new hash will be calculated.

iRODS Globus Connector - File Transfer Improvements

The original connector did not use multithreaded transfers and had some performance bottlenecks.

 

  • iRODS mechanism for multithreaded transfer differs from Globus.
     
  • To implement multithreaded transfers, the iRODS Globus Connector needed to bridge these two methodologies. 

 

Average performance comparison for a 5G file using 3 threads using local storage:

  • Upload - Improved from 31.6 seconds to 9.9 seconds!
  • Download - Improved from 28 seconds to 11 seconds!
    • Download performance peaked at 555 MiB/s (9 seconds) using six threads.

Questions?

Thank you!

Justin James

Applications Engineer

iRODS Consortium

GlobusWorld 2022 - iRODS Globus Connector

By iRODS Consortium

GlobusWorld 2022 - iRODS Globus Connector

  • 575