Daniel Moore
Applications Engineer, iRODS Consortium
February 17, 2021
TRiRODS
Virtual
Python iRODS Client:
Atomic Metadata and
Parallel Transfer
Python iRODS Client:
Atomic Metadata and
Parallel Transfer
The Python iRODS Client Library - Improvements
Atomic Metadata
- allows a batch of add/remove operations in single transaction
- increased efficiency over individual operations
- Functionality / API exists in iRODS server since 4.2.8 release
-
Available in Python iRODS Client since 0.8.6 release
-
Responds to Consortium member / community requests
-
Yoda (Utrecht) web application
-
metadata templates
-
- instrument metadata (microscope, satellite, etc.)
-
The Python iRODS Client Library - Atomic Metadata
from irods.meta import (iRODSMeta, AVUOperation)
# ...
irods_object.metadata.apply_atomic_operations(
AVUOperation(operation = 'remove', avu = iRODSMeta('tag1','value1','units1')),
AVUOperation(operation = 'add', avu = iRODSMeta('tag2','value2','units2')),
# ...
)
operations = [ AVUOperation(operation = 'remove', avu = m) for m in irods_object.metadata.items() ]
operations += [ AVUOperation( operation = 'add', avu=iRODSMeta('flag','1',str(os.getpid()))) ]
irods_object.metadata.apply_atomic_operations( *operations )
We can also build the argument list programmatically.
To delete/replace all existing AVUs at once while adding a flag AVU:
Operate on collection, data object, user/group, or resource metadata:
The Python iRODS Client Library - Atomic Metadata - Usage
Live Demo
The Python iRODS Client Library - Atomic Metadata
Questions?
The Python iRODS Client Library - Atomic Metadata
Original
- Uses server coordination
- For large transfers (32MB by default)
- Destination server opens sockets on high ports
- Client may put/get data in parallel
- Never implemented within the Python client
- Python client has only had (single threaded) streaming
New
- Client spawns desired number of threads / connections
- Each transfers data through the connected iRODS server
- All on the main iRODS port (1247 by default)
- Will be available in Python client 0.9.0 alongside iRODS 4.2.9
The Python iRODS Client Library - Parallel Transfer
- client makes a request
- server redirects client to destination
- destination server opens up high ports
- client connects to destination
- data is moved in parallel
Limitations
- requires many open ports everywhere
- lacks durability / retries
- lacks control from client side
- lacks ability to provide a DMZ server outside a firewall
The Python iRODS Client Library - Original iRODS Parallel Transfer
- client makes a request for destination
- server (re)directs client to destination
- client connects N streams to destination
- data is moved in parallel over 1247
Benefits
- requires a single port
- clients may retry given any failures
- clients are in control of data movement
- data may be forwarded by a DMZ server
The Python iRODS Client Library - New iRODS Parallel Transfer
...
destination
request
(1)
destination
response
(2)
N direct
connections (3)
for
data movement (4)
Client
Connected
Server
Destination Server
...
potential
redirection
of data
Preliminary Performance Comparison
Transfer time (in seconds) of a single 2.7GB data object:
The Python iRODS Client Library - Parallel Transfer - Performance
Threads | PUT | GET | |
---|---|---|---|
iCommands | 1 | ~30 | ~30 |
Python | 1 | ~30 | ~30 |
iCommands | 3 | 24.7 | 24.2 |
Python | 3 | 24.9 | 24.5 |
A more complete performance analysis will be published soon.
1Gbps direct connection
- Multithreaded
- Seamlessly switches to parallel implementation at 32MB (configurable)
- Uses existing Python client put() and get() method calls
- Uses server APIs file_descriptor_info() and replica_close()
- Available in iRODS 4.2.9
- Re-uses existing server connection
- SSL credentials are preserved
Limitations / Future Work
- PUTs and GETs flow through the client's connected iRODS server
- To be resolved by redirects in the future
The Python iRODS Client Library - Parallel Transfer - Implementation
Live Demo
The Python iRODS Client Library - Parallel Transfer
Possible Enhancements
- Redirect to server with data when possible / appropriate
- Progress bar(s)
- Append mode
The Python iRODS Client Library - Parallel Transfer
Questions?
The Python iRODS Client Library - Parallel Transfer
TRiRODS February 2021 - Python iRODS Client: Atomic Metadata and Parallel Transfer
By iRODS Consortium
TRiRODS February 2021 - Python iRODS Client: Atomic Metadata and Parallel Transfer
- 1,180