IOStreams and Filesystem
February 20, 2019
Renaissance Computing Institute
UNC-Chapel Hill
Kory Draughn
korydraughn@renci.org
Software Developer, iRODS Consortium
IOStreams and Filesystem
C API - iRODS and POSIX
iRODS provides a C API for managing data objects and collections.
The functions making up this API follow the POSIX standard as much as possible.
However, unlike the standard POSIX C API, using the iRODS API to manipulate objects requires a lot of code and easily leads to errors.
Enter New Libraries!
Coming in v4.3.0 are two new C++ libraries:
Our goals with these libraries are:
Agenda
What is iRODS IOStreams?
iRODS IOStreams is a collection of classes and functions that simplify data object I/O.
Features:
What's Included in iRODS IOStreams?
At this time, the library consists of a single header file called dstream.hpp.
Four classes are defined by this header file:
Old vs New - Open, Write, Close
Old
New
What's next for IOStreams?
We envision the library to consist of multiple header files that provide different capabilities. For example, making the transport layer customizable.
Abstractions of common patterns, seen in the standards already.
Input welcome - tell us how to do this well...
What is iRODS Filesystem?
iRODS Filesystem is an implementation of the standard filesystem library introduced in the ISO C++17 standard.
It provides abstractions that simplify management of iRODS-based filesystem components such as paths, data objects, and collections.
Features:
iRODS Filesystem Facilities
Standardized Functions:
copy
copy_data_object
create_collection
create_collections
exists
equivalent
data_object_size
is_data_object
is_collection
is_other
is_empty
last_write_time
remove
remove_all
permissions
rename
status
status_known
iRODS Specific Functions:
data_object_checksum
set_metadata
remove_metadata
Standardized Types:
path
collection_iterator
recursive_collection_iterator
Old vs New - Iterating Over A Collection
Old
New
iput prototype
4.2.4 | Prototype | Improvement | |
---|---|---|---|
Lines of Code | 242 + 1149 | 281 | +80% |
1000 512k files | 17s | 5s | +70% |
2000 256k files | 32s | 6s | +81% |
4000 128k files | 58s | 8s | +86% |
8000 64k files | 111s | 10s | +90% |
16000 32k files | 212s | 18s | +91% |
1 10G file | 91s | 94s | -0.03% |
Uses both IOStreams and Filesystem.
This is a single test run.
Network: 1000T
Prototype: Used 16 threads
2 Machines: 32 cores each