Kory Draughn

Chief Technologist

iRODS Consortium

GenQuery2:

A more standardized, powerful parser for the iRODS namespace

June 13-16, 2023

iRODS User Group Meeting 2023

Chapel Hill, NC

Overview

  • What is GenQuery2?
  • GitHub Repository
  • General Features
  • Components and Examples
    • API Plugin
    • Rule Engine Plugin
    • iCommand
  • Remaining Work
  • Future Plans
  • Community Engagement

What is GenQuery2?

An experimental redesign (and implementation) of the iRODS GenQuery parser.

 

This project exists as a means for allowing the iRODS community to test the implementation and provide feedback so that the iRODS Consortium can produce a GenQuery parser that is easy to understand, maintain, and enhance all while providing a syntax that mirrors standard SQL as much as possible.

 

Once stable, the code will be merged into the iRODS server making it available with future releases of iRODS.

GitHub Repository

https://github.com/irods/irods_api_plugin_genquery2

 

The repository contains all source code for generating a package containing the following ...

  • An API Plugin
  • A Rule Engine Plugin
  • An iCommand

 

Everything discussed in this talk can be found in the repository.

General Features

  • Enforces the iRODS permission model
  • Logical AND, OR, and NOT
  • Grouping via parentheses
  • SQL CAST
  • SQL GROUP BY
  • SQL aggregate functions (e.g. count, sum, avg, etc)
  • Per-column sorting via ORDER BY [ASC|DESC]
  • SQL FETCH FIRST N ROWS ONLY (LIMIT offered as an alias)
  • Metadata queries involving different iRODS entities (i.e. data objects, collections, users, and resources)
  • Operators: =, !=, <, <=, >, >=, LIKE, BETWEEN, IS [NOT] NULL
  • SQL keywords are case-insensitive
  • Federation is supported

Components and Examples - API Plugin

Wraps the parser and makes it available to all clients.

 

Interface Details

  • API Number
    • 1000001 (may change in the future)
  • Input Parameters
    • query_string - The GenQuery2 string.
    • zone - The name of the zone to execute the query in.
    • sql_only - An integer instructing the plugin to return the generated SQL.
  • Output
    • On success, returns a JSON string representing the resultset
    • On failure, returns an iRODS error code

 

Defaults to returning a max of 16 rows if the client does not specify the number of rows to return.

Components and Examples - Rule Engine Plugin

Makes GenQuery2 available to the iRODS Rule Language and other rule engine plugins.

 

The use of a rule engine plugin is temporary, but required for 4.3.0. This requirement will be lifted following the release of iRODS 4.3.1.

 

The rule engine plugin includes the following rules ...

  • genquery2_execute(*handle, *query_string)

  • genquery2_next_row(*handle)

  • genquery2_column(*handle, *index, *value)

  • genquery2_destroy(*handle)

Components and Examples - Rule Engine Plugins Example

genquery2_test_rule()
{
    # Execute a query. The results are stored in the Rule Engine Plugin.
    genquery2_execute(*handle, "select COLL_NAME, DATA_NAME order by DATA_NAME desc limit 1");

    # Iterate over the resutls.
    while (errorcode(genquery2_next_row(*handle)) == 0) {
        genquery2_column(*handle, '0', *coll_name); # Copy the COLL_NAME into *coll_name.
        genquery2_column(*handle, '1', *data_name); # Copy the DATA_NAME into *data_name.
        writeLine("stdout", "logical path => [*coll_name/*data_name]");
    }

    # Free any resources used. This is handled for you when the agent is shut down as well.
    genquery2_destroy(*handle);
}

Enable access to the rules by adding the following to the rule_engines stanza of server_config.json. For example ...

Example rule ...

{
    "instance_name": "irods_rule_engine-genquery2-instance",
    "plugin_name": "irods_rule_engine-genquery2",
    "plugin_specific_configuration": {}
}

Components and Examples - iCommand

iquery - A binary which enables execution of GenQuery2 queries via the command line.

iquery - Query the iRODS Catalog

Usage: iquery [OPTION]... QUERY_STRING

Queries the iRODS Catalog using GenQuery2.

QUERY_STRING is expected to be a string matching the GenQuery2 syntax. Failing
to meet this requirement will result in an error.

Mandatory arguments to long options are mandatory for short options too.

Options:
      --sql-only        Print the SQL generated by the parser. The generated
                        SQL will not be executed.
  -z, --zone=ZONE_NAME  The name of the zone to run the query against. Defaults
                        to the local zone.
  -h, --help            Display this help message and exit.

iRODS Version 4.3.0                iquery (experimental)

Components and Examples - iCommand Examples

List the number of replicas for all data objects. jq is used for formatting purposes.

$ iquery "select COLL_NAME, DATA_NAME, count(DATA_ID) group by COLL_NAME, DATA_NAME" | jq
[
  [
    "/tempZone/home/rods",
    "foo",
    "3"
  ],
  [
    "/tempZone/home/rods",
    "bar",
    "1"
  ]
]

Below is the output from running the command.

Components and Examples - iCommand Examples

Show the SQL generated by the parser. pg_format is used for formatting purposes.

$ iquery --sql-only \
     "select COLL_NAME, DATA_NAME, count(DATA_ID) group by COLL_NAME, DATA_NAME" | \
     pg_format -
SELECT DISTINCT
    t0.coll_name,
    t1.data_name,
    count(t1.data_id)
FROM
    R_COLL_MAIN t0
    INNER JOIN R_DATA_MAIN t1 ON t0.coll_id = t1.coll_id
    INNER JOIN R_OBJT_ACCESS pdoa ON t1.data_id = pdoa.object_id
    INNER JOIN R_TOKN_MAIN pdt ON pdoa.access_type_id = pdt.token_id
    INNER JOIN R_USER_MAIN pdu ON pdoa.user_id = pdu.user_id
    INNER JOIN R_OBJT_ACCESS pcoa ON t0.coll_id = pcoa.object_id
    INNER JOIN R_TOKN_MAIN pct ON pcoa.access_type_id = pct.token_id
    INNER JOIN R_USER_MAIN pcu ON pcoa.user_id = pcu.user_id
WHERE
    pdu.user_name = ?
    AND pcu.user_name = ?
    AND pdoa.access_type_id >= 1050
    AND pcoa.access_type_id >= 1050
GROUP BY
    t0.coll_name,
    t1.data_name FETCH FIRST 16 ROWS ONLY

Below is the output from running the command. The SQL is never executed.

Remaining Work

The following items must be resolved before making GenQuery2 a part of the server.

Future Plans

  • Expose more SQL features
    • CASE, HAVING clauses
    • Sub-selects
    • Multi-argument functions
  • Consider controlling various options through GenQuery2 syntax
    • e.g. iquery "option distinct off; select DATA_NAME"
  • Consider switching from boost::variant to std::variant
  • Simplify pagination
    • Provide a utility library that manages the page information
    • Provide a document explaining how the utility may be implemented

Community Engagement

We are considering the idea of releasing GenQuery2 as an experimental package.

  • Allows the community to try GenQuery2 and provide feedback
  • Allows frequent updates (no ties to a server release)
  • Does not target a specific version of iRODS

 

We need your feedback!

 

The more the community participates, the better GenQuery2 will become.

Thank you!

Questions?

If you're interested in learning more about the implementation and/or seeing more examples of GenQuery2, please watch this TRiRODS talk.