Daniel Moore

Applications Engineer, iRODS Consortium

February 17, 2021

TRiRODS

Virtual

Python iRODS Client:

Atomic Metadata and

Parallel Transfer

Python iRODS Client:

Atomic Metadata and

Parallel Transfer

The Python iRODS Client Library - Improvements

 

Atomic Metadata

  • allows a batch of add/remove operations in single transaction
  • increased efficiency over individual operations

 

  • Functionality / API exists in iRODS server since 4.2.8 release
  • Available in Python iRODS Client since 0.8.6 release

 

  • Responds to Consortium member / community requests

    • Yoda (Utrecht) web application

      • metadata templates

    • ​instrument metadata (microscope, satellite, etc.)

The Python iRODS Client Library - Atomic Metadata

from irods.meta import (iRODSMeta, AVUOperation)

# ...

irods_object.metadata.apply_atomic_operations(
    AVUOperation(operation = 'remove', avu = iRODSMeta('tag1','value1','units1')),
    AVUOperation(operation = 'add',    avu = iRODSMeta('tag2','value2','units2')),
      # ...
  )

operations = [ AVUOperation(operation = 'remove', avu = m) for m in irods_object.metadata.items() ]

operations += [ AVUOperation( operation = 'add', avu=iRODSMeta('flag','1',str(os.getpid()))) ]

irods_object.metadata.apply_atomic_operations( *operations )

We can also build the argument list programmatically.

To delete/replace all existing AVUs at once while adding a flag AVU:

Operate on collection, data object, user/group, or resource metadata:

The Python iRODS Client Library - Atomic Metadata - Usage

Live Demo

The Python iRODS Client Library - Atomic Metadata

Questions?

The Python iRODS Client Library - Atomic Metadata

Original

  • Uses server coordination
  • For large transfers (32MB by default)
    • Destination server opens sockets on high ports
    • Client may put/get data in parallel
  • Never implemented within the Python client
    • Python client has only had (single threaded) streaming

 

New

  • Client spawns desired number of threads / connections
  • Each transfers data through the connected iRODS server
    • All on the main iRODS port (1247 by default)
  • Will be available in Python client 0.9.0 alongside iRODS 4.2.9

The Python iRODS Client Library - Parallel Transfer

  1. client makes a request
  2. server redirects client to destination
  3. destination server opens up high ports
  4. client connects to destination
  5. data is moved in parallel

 

 

Limitations

  • requires many open ports everywhere
  • lacks durability / retries
  • lacks control from client side
  • lacks ability to provide a DMZ server outside a firewall

The Python iRODS Client Library - Original iRODS Parallel Transfer

  1. client makes a request for destination
  2. server (re)directs client to destination
  3. client connects N streams to destination
  4. data is moved in parallel over 1247

 

 

Benefits

  • requires a single port
  • clients may retry given any failures
  • clients are in control of data movement
  • data may be forwarded by a DMZ server

The Python iRODS Client Library - New iRODS Parallel Transfer

...

destination

request

(1)

destination

response

(2)

N direct

connections (3)

for

data movement (4)

Client

Connected

Server

Destination Server

...

potential

redirection

of data

Preliminary Performance Comparison

Transfer time (in seconds) of a single 2.7GB data object:

The Python iRODS Client Library - Parallel Transfer - Performance

Threads PUT GET
iCommands 1 ~30 ~30
Python 1 ~30 ~30
iCommands 3 24.7 24.2
Python 3 24.9 24.5

A more complete performance analysis will be published soon.

1Gbps direct connection

  • Multithreaded
  • Seamlessly switches to parallel implementation at 32MB (configurable)
    • ​Uses existing Python client put() and get() method calls
  • Uses server APIs file_descriptor_info() and replica_close()
    • Available in iRODS 4.2.9
  • Re-uses existing server connection
    • SSL credentials are preserved

 

Limitations / Future Work

  • PUTs and GETs flow through the client's connected iRODS server
    • To be resolved by redirects in the future

The Python iRODS Client Library - Parallel Transfer - Implementation

Live Demo

The Python iRODS Client Library - Parallel Transfer

Possible Enhancements

  • Redirect to server with data when possible / appropriate
  • Progress bar(s)
  • Append mode

The Python iRODS Client Library - Parallel Transfer

Questions?

The Python iRODS Client Library - Parallel Transfer

TRiRODS February 2021 - Python iRODS Client: Atomic Metadata and Parallel Transfer

By iRODS Consortium

TRiRODS February 2021 - Python iRODS Client: Atomic Metadata and Parallel Transfer

  • 1,180