Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
June 8-11, 2021
iRODS User Group Meeting 2021
Virtual Event
Technology Update
Technology Update
Philosophical Drivers
Plugin Architecture
core is generic - protocol, api, bookkeeping
plugins are specific
policy composition
Modern core libraries
standardized interfaces
refactor iRODS internals
ease of (re)use
fewer bugs
Replicas as first class entities
logical locking
Consolidation of data movement
dstreams all on 1247, no more high ports
In The Last Year
iRODS Release | Issues Closed |
---|---|
4.2.9 | 314 |
~/irods $ $ git shortlog --summary --numbered 4.2.8..4.2.9
129 Kory Draughn
125 Alan King
35 Markus Kitsinger (SwooshyCueb)
22 Terrell Russell
9 d-w-moore
8 Justin James
5 Jason Coposky
2 Ilari Korhonen
1 Martin Pollard
1 Matthew Vernon
1 Nick Hastings
In The Last Year
Plugins
Clients
Working Groups
Technology Working Group
Metadata Templates Working Group
Authentication Working Group
Working Groups
Imaging Working Group (Announcing Now!)
Last Year and Next Year
Internal Refactoring - New C++ iRODS Libraries
6 in 2019
9 in 2020
2021 - 11 New iRODS C++ Libraries
2021 - 4 New API Plugins
Verification mode performs the following operations:
Step 3 can be time consuming depending on the size of the replica. The server can be instructed to skip detection of mismatched checksums by passing the NO_COMPUTE_KW keyword.
The verification operations work for one or more replicas. However, when a specific replica is targeted, step 4 is not performed.
2021 - 1 Refactored API
Lookup/Update mode performs the following operations:
The following keyword(s) are now NOPs (in regards to rxDataObjChksum):
Operations that target multiple replicas will only affect replicas that are marked good. This means intermediate, locked, and stale replicas will be ignored.
Operations that target a specific replica are allowed to operate on stale replicas.
S3 Streaming Plugin - Review
S3 Streaming Plugin - Review (Read)
S3 Streaming Plugin - Review (Write)
S3 Streaming Plugin - Changes in the last year
We have had many partners test the streaming S3 plugin over the last year. Throughout this process bugs and limitations were identified and resolved. The following is a summary of some of the issues.
S3 Streaming Plugin - Changes in the last year
S3 Streaming Plugin - Rules for Cache File Use / Policy For Transfers
S3 Streaming Plugin - Rationale for the Parallel Transfer Contract
Example: 24 MB file with a 10 MB per-thread buffer and 3 transfer threads.
Scenario 1: Thread 0 sends 11 MB. Thread 1 sends 5 MB. Thread 2 sends 8 MB.
Scenario 2: Thread 0 sends 8 MB. Thread 1 sends 8 MB. Thread 2 sends 8 MB.
In both scenarios, thread 2 gets an offset of 16 MB, knows there are 3 threads, and the file size (24 MB).
What is thread 2's starting part number? In scenario 1 it would be part 4 because (thread 1 would require two parts) and in scenario 2 it would be part 3. How many bytes is it sending?
S3 Streaming Plugin - Upload Performance to Local MinIO Server
S3 Streaming Plugin - Download Performance to Local MinIO Server
Philosophy to Policy
With the new libraries and first class replicas, we can rewrite 90% of the internals, and then fix the things that depend on them later, with little expectation of regression, because the interfaces remain the same.
Internally
Externally
Continuation within the Rule Engine Plugin Framework allows administrators to break apart monolithic policy implementations into reusable components.
Active Development Work
iRODS Internships - Summer 2021
iRODS Server Async Facility (4.2.x)
The iRODS Server Process Model consists of a main long-running server, a child long-running Agent Factory, and many short-lived Agent processes to serve client requests. There are a number of 'background tasks' that would be nice to have running as well that could do clean up, bookkeeping, etc. This project would be to design and implement an asynchronous facility for the iRODS Server.
Automated Ingest Refactor (Python client)
The iRODS Automated Ingest tool can currently scan in parallel a local filesystem and an S3 bucket for new and updated files. We are interested in adding the ability to scan an iRODS path as well. This will allow the scanner to see when files are removed (the negative space). The current logic needs to be refactored to provide these different targets as separate scanning strategies. Separating these strategies would also allow us to scan a queue or log (Kafka, RabbitMQ, etc.).
iRODS delayServer w/ Implicit remote() (4.3.0)
Currently, jobs executed by the delay server are tied to the machine on which the delay server is running (which is the catalog provider in 4.2.x). The delay server should be able to execute delayed rules on any machine in a zone. This is currently possible by embedding a remote() block inside of a delay() block. The server should grow a configuration option to hold a list of eligible rule execution servers which would be chosen randomly to execute rules on the delay queue. If no such list is provided, the server would behave as it does now, defaulting to a list of one server.
Big Picture
Core
4.3.0 - Harden and Polish
Clients
GUIs (Metalnx, ZMT, et al.)
Onboarding and Syncing (Automated Ingest)
File System Integration (NFSRODS / SMBRODS)
iRODS Console (alongside existing iCommands)
C++ REST API
Continue building out policy components (Capabilities)
We want installation and management of iRODS to become about policy design, composition, and configuration.
Please share your:
use cases
pain points
hopes and dreams
Open Source Community Engagement
Get Involved
Working Groups
GitHub Issues
Pull Requests
Chat List
Consortium Membership
Tell Others
Publish, Cite, Advocate, Refer