IOStreams and Filesystem

February 20, 2019

Renaissance Computing Institute

UNC-Chapel Hill

Kory Draughn

korydraughn@renci.org

Software Developer, iRODS Consortium

IOStreams and Filesystem

C API - iRODS and POSIX

iRODS provides a C API for managing data objects and collections.

 

The functions making up this API follow the POSIX standard as much as possible.  

 

However, unlike the standard POSIX C API, using the iRODS API to manipulate objects requires a lot of code and easily leads to errors.

Enter New Libraries!

Coming in v4.3.0 are two new C++ libraries:

  • iRODS IOStreams
  • iRODS Filesystem

 

Our goals with these libraries are:

  • Provide familiar interfaces
  • Make it harder for developers to introduce bugs
  • Make it easier for developers to interact with the virtual filesystem of iRODS

Agenda

  • iRODS IOStreams
  • iRODS Filesystem
  • iput prototype using new libraries

What is iRODS IOStreams?

iRODS IOStreams is a collection of classes and functions that simplify data object I/O.

 

Features:

  • Simple to use
  • Built on top of the C++ IOStreams classes
  • Provides a familiar interface to all C++ developers
  • Works on the client-side and server-side
  • Equivalent to C++'s std::fstream

What's Included in iRODS IOStreams?

At this time, the library consists of a single header file called dstream.hpp.

 

Four classes are defined by this header file:

  • idstream - A stream class supporting only input
  • odstream - A stream class supporting only output
  • dstream - A stream class supporting input and output
  • basic_data_object_buf - A stream buffer class in which the following classes are implemented in terms of

Old vs New - Open, Write, Close

Old

New

What's next for IOStreams?

We envision the library to consist of multiple header files that provide different capabilities.  For example, making the transport layer customizable.

 

  • dstream.hpp
  • transport/default.hpp
  • transport/rdma.hpp
  • transport/udt.hpp

 

Abstractions of common patterns, seen in the standards already.

 

Input welcome - tell us how to do this well...

What is iRODS Filesystem?

iRODS Filesystem is an implementation of the standard filesystem library introduced in the ISO C++17 standard.

 

It provides abstractions that simplify management of iRODS-based filesystem components such as paths, data objects, and collections.

 

Features:

  • Implements a standardized interface
  • Readable code
  • Works on the client-side and server-side
  • Users of Boost.Filesystem will feel comfortable with this library
  • Throws detailed exception messages for common errors
  • Works with C++ standard library algorithms

iRODS Filesystem Facilities

Standardized Functions:

copy

copy_data_object

create_collection

create_collections

exists

equivalent

data_object_size

is_data_object

is_collection

is_other

is_empty

last_write_time

remove

remove_all

permissions

rename

status

status_known

iRODS Specific Functions:

data_object_checksum

set_metadata

remove_metadata

Standardized Types:

path

collection_iterator

recursive_collection_iterator

Old vs New - Iterating Over A Collection

Old

New

iput prototype

4.2.4 Prototype Improvement
Lines of Code 242 + 1149 281 +80%
1000 512k files 17s 5s +70%
2000 256k files 32s 6s +81%
4000 128k files 58s 8s +86%
8000 64k files 111s 10s +90%
16000 32k files 212s 18s +91%
1 10G file 91s 94s -0.03%

Uses both IOStreams and Filesystem.

 

This is a single test run.

Network: 1000T

Prototype: Used 16 threads

2 Machines: 32 cores each

TRiRODS February 2019 - IOStreams and Filesystem

By iRODS Consortium

TRiRODS February 2019 - IOStreams and Filesystem

  • 1,191