Kory Draughn
Chief Technologist
iRODS Consortium
GenQuery2: A richer query interface into the iRODS namespace
GenQuery2: A richer query interface into the iRODS namespace
February 15, 2023
TRiRODS
Chapel Hill, NC
What is GenQuery?
An SQL-like syntax backed by APIs which allow users to query an iRODS Catalog without needing to know any details about the underlying database schema.
It also enforces the iRODS permission model.
Limitations of GenQuery1
- No support for logical OR and NOT
-
No support for grouping via parentheses
- Difficult to change (i.e. brittle)
- Metadata queries receive special treatment when part of the conditions clause
- Cannot handle embedded apostrophes
A brief history of GenQuery2
- First implementation written by Ben Keller in 2015
- Picked up by Jason Coposky in 2020
- Picked up by Kory Draughn (me) in 2022
Goals of GenQuery2
- Provide syntax closer to standard SQL
- Provide better support for logical AND, OR, and NOT
- Provide support for parentheses
- Allow queries that include conditions for metadata that is attached to different entities (i.e. data objects and collections)
- Implementation is easy to understand, maintain, and enhance
*** This is not a reimplementation of GenQuery1 ***
How do we satisfy these Goals?
- Use proven tools for implementing parsers
- Flex - A tool for generating scanners (lexical analysis)
- Bison - A tool for generating parsers (tokenization)
- Use the Boost Graph library (BGL) to represent the relationships between database tables
Approach 1
Use BGL's pre-defined algorithms to derive a path that connects all tables referenced by a query.
- BGL breadth-first search algorithm with Predecessor map
Pros / Cons
- Computes paths from source table to all other tables
- Alternative paths are unavailable
- Doesn't support cases where a table needs to be joined multiple times
Approach 2
Use BGL, but replace use of pre-defined algorithms with custom breadth-first search implementation.
- Allowed to include tables which aren't derived from the input
- Finds the shortest path which connects all necessary tables
Pros / Cons
- Provides access to all paths starting from a single table
- Allows filtering paths using any criteria we want
- Complicated
- Doesn't support cases where a table needs to be joined multiple times
- Requires designing data types which contain contextual information needed to determine joins
Approach 3
Use BGL to represent the relationships between database tables and manually craft all SQL based on what we know.
Pros / Cons
- Implementation is simple and straightforward
- Does not require knowledge of complicated graph theory
GenQuery2 - Demo 1
List all data objects having a metadata attribute name of foo and parent collection having a metadata attribute value of bar.
GenQuery2 - Demo 2
List three data objects, starting with the second one, sorted by parent collection in descending order and data name in ascending order.
GenQuery2 - Demo 3
GenQuery2 - Demo 4
Paging
- The parser supports OFFSET, LIMIT, FETCH FIRST N ROWS ONLY
- Implicitly supports Keyset Pagination scheme
- Paging is the responsibility of the client
- However, we're considering enforcing a default limit on the number rows returned when no limit is specified
Remaining Work
- Add support for permissions
- Add support for various tables (e.g. R_TICKET_ALLOWED_HOSTS)
- Finish implementing microservices
- Consider off-loading certain features to specific queries
- Resource / Storage quotas
- Add tests
- Add packaging for OS that aren't Debian-based
Future Work
- Add support for other SQL keywords
- CASE, CURRENT_TIMESTAMP, etc.
- Add support for sub-selects
- Add support for concatenating columns (i.e. || operator)
- Add support for multi-argument SQL functions
Community Engagement
- Available as an experimental package
-
https://github.com/irods/irods_api_plugin_genquery2
- Requires compiling from source
- Allows the community to try GenQuery2 and provide feedback
- Allows frequent updates (no ties to a server release)
- Does not target a specific version of iRODS
-
https://github.com/irods/irods_api_plugin_genquery2
- Once it's stable, the code will be shipped with the server making the experimental package obsolete
Thank you!
Questions?
GenQuery2 - A richer query interface into the iRODS namespace
By korydraughn
GenQuery2 - A richer query interface into the iRODS namespace
GenQuery2 Progress Update for TRiRODS
- 496