Kory Draughn

Chief Technologist

iRODS Consortium

GenQuery2: A richer query interface into the iRODS namespace

GenQuery2: A richer query interface into the iRODS namespace

February 15, 2023

TRiRODS

Chapel Hill, NC

What is GenQuery?

An SQL-like syntax backed by APIs which allow users to query an iRODS Catalog without needing to know any details about the underlying database schema.

 

It also enforces the iRODS permission model.

Limitations of GenQuery1

  • No support for logical OR and NOT
  • No support for grouping via parentheses
  • Difficult to change (i.e. brittle)
  • Metadata queries receive special treatment when part of the conditions clause
  • Cannot handle embedded apostrophes

A brief history of GenQuery2

  • First implementation written by Ben Keller in 2015
  • Picked up by Jason Coposky in 2020
  • Picked up by Kory Draughn (me) in 2022

Goals of GenQuery2

  • Provide syntax closer to standard SQL
  • Provide better support for logical AND, OR, and NOT
  • Provide support for parentheses
  • Allow queries that include conditions for metadata that is attached to different entities (i.e. data objects and collections)
  • Implementation is easy to understand, maintain, and enhance

*** This is not a reimplementation of GenQuery1​​​ ***

How do we satisfy these Goals?

  • Use proven tools for implementing parsers
    • Flex - A tool for generating scanners (lexical analysis)
    • Bison - A tool for generating parsers (tokenization)
  • Use the Boost Graph library (BGL) to represent the relationships between database tables

Approach 1

Use BGL's pre-defined algorithms to derive a path that connects all tables referenced by a query.

  • BGL breadth-first search algorithm with Predecessor map

 

Pros / Cons

  • Computes paths from source table to all other tables
  • Alternative paths are unavailable
  • Doesn't support cases where a table needs to be joined multiple times

Approach 2

Use BGL, but replace use of pre-defined algorithms with custom breadth-first search implementation.

  • Allowed to include tables which aren't derived from the input
  • Finds the shortest path which connects all necessary tables

 

Pros / Cons

  • Provides access to all paths starting from a single table
  • Allows filtering paths using any criteria we want
  • Complicated
  • Doesn't support cases where a table needs to be joined multiple times
  • Requires designing data types which contain contextual information needed to determine joins

Approach 3

Use BGL to represent the relationships between database tables and manually craft all SQL based on what we know.

 

Pros / Cons

  • Implementation is simple and straightforward
  • Does not require knowledge of complicated graph theory

GenQuery2 - Demo 1

List all data objects having a metadata attribute name of foo and parent collection having a metadata attribute value of bar.

 

https://asciinema.org/a/559830

GenQuery2 - Demo 2

List three data objects, starting with the second one, sorted by parent collection in descending order and data name in ascending order.

 

https://asciinema.org/a/559836

GenQuery2 - Demo 3

List all collections which have a prefix of col or a suffix of .d.

 

https://asciinema.org/a/559840

GenQuery2 - Demo 4

List all resources that are not in the set of target resources.

 

https://asciinema.org/a/559843

Paging

  • The parser supports OFFSET, LIMIT, FETCH FIRST N ROWS ONLY
    • Implicitly supports Keyset Pagination scheme
  • Paging is the responsibility of the client
    • However, we're considering enforcing a default limit on the number rows returned when no limit is specified

Remaining Work

  • Add support for permissions
  • Add support for various tables (e.g. R_TICKET_ALLOWED_HOSTS)
  • Finish implementing microservices
  • Consider off-loading certain features to specific queries
    • Resource / Storage quotas
  • Add tests
  • Add packaging for OS that aren't Debian-based

Future Work

  • Add support for other SQL keywords
    • CASE, CURRENT_TIMESTAMP, etc.
  • Add support for sub-selects
  • Add support for concatenating columns (i.e. || operator)
  • Add support for multi-argument SQL functions

Community Engagement

  • Available as an experimental package
  • Once it's stable, the code will be shipped with the server making the experimental package obsolete

Thank you!

Questions?

TRiRODS February 2023 - GenQuery2: A richer query interface into the iRODS namespace

By iRODS Consortium

TRiRODS February 2023 - GenQuery2: A richer query interface into the iRODS namespace

  • 511