Policy Training

Publishing

Jason M. Coposky

@jason_coposky

Executive Director, iRODS Consortium

Policy Training

Publishing

June 25-28, 2019

iRODS User Group Meeting

University of Utrecht, NL

  • Packaged and supported solutions
  • Require configuration not code
  • Derived from the majority of use cases observed in the user community

iRODS Capabilities

Publishing

A policy framework that provides an asynchronous, scalable data publishing service driven by metadata

  • Publishing technology of choice is reached by delegating policy implementation
  • Persistent identifier generation is delegated to a policy invocation

Publishing Policy Components

  • Persistent Identifier
  • Publishing Policy Implementation
    • irods_policy_publishing_object_publish_<technology>
    • irods_policy_publishing_object_purge_<technology>
    • irods_policy_publishing_collection_publish_<technology>
    • irods_policy_publishing_collection_purge_<technology>

<technology> is directly derived from metadata and is used to delegate the policy invocation

Core Competencies

Policy

Capabilities

Publishing Overview

Example Implementation

Getting Started

Installing the Publishing Plugins

wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-dataworld.deb

wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-publishing.deb

wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-persistent-identifier.deb

As the ubuntu user

Download the publishing packages

sudo dpkg -i irods-rule-engine-plugin-dataworld.deb irods-rule-engine-plugin-publishing.deb irods-rule-engine-plugin-persistent-identifier.deb

Install the Publishing packages

Configuring Publishing Plugins

        "rule_engines": [

            ...

            {
                "instance_name": "irods_rule_engine_plugin-dataworld-instance",
                "plugin_name": "irods_rule_engine_plugin-dataworld",
                "plugin_specific_configuration": {
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-publishing-instance",
                "plugin_name": "irods_rule_engine_plugin-publishing",
                "plugin_specific_configuration": {
                }
            },

            {
                "instance_name": "irods_rule_engine_plugin-persistent_identifier-instance",
                "plugin_name": "irods_rule_engine_plugin-persistent_identifier",
                "plugin_specific_configuration": {
                }
            },

As the irods user

Edit /etc/irods/server_config.json

Standing up data.world

imeta set -u <user name> irods::publishing::api_token <your token>

Go to https://data.world:

  • Create an account
  • Find your API Token:  User Icon > Settings > Advanced

Annotate the user with the token

iadmin mkuser <username> rodsuser

iadmin moduser <username> password <password>

Create an iRODS user that matches your data.world account

We will use the irodsconsortium user name and account

Tagging collections for publishing

Collections and Data Objects are tagged with metadata to indicate they should be published

A new AVU applied to a populated collection will schedule all objects for publication

New objects cannot be placed into a collection with a publishing AVUs applied.  Nor can those object be modified with POSIX operations.

Tagging for publication

Publishing metadata takes the form:

A:  irods::publishing::publish
V:  <service>

The service name is directly applied the the policy name template, which dictates which policies are invoked.

Tagging for publication

Download some data

wget https://cdn.patricktriest.com/data/books.zip

unzip books.zip

imkdir published_collection

Create a collection to be published

iput -r ./books published_collection/books0

Put a collection of data to the collection to be indexed

Initialize the ubuntu user as your iRODS dataworld username

Tagging collections for publishing

imeta set -C published_collection irods::publishing::publish dataworld

Set the metadata on published_collection

id     name
11269 {"collection-name":"/tempZone/home/rods/published_collection","publish-type":"collection","publisher":"dataworld","rule-engine-instance-name":"irods_rule_engine_plugin-publishing-instance","rule-engine-operation":"irods_policy_publishing_collection_publish","user-name":"rods"}

A delayed execution job is scheduled which will then scan and schedule publishing jobs

Once the queue is empty check data.world for the data set

Tagging objects for publishing

imeta set -d published_file0 irods::publishing::publish dataworld

Set the metadata on published_object0

truncate -s 1M 1Mfile

Create a new data object

iput 1Mfile published_file0

Tagging objects for publishing

imeta ls -d published_file0

Inspect the metadata for the object

AVUs defined for dataObj published_file0:
attribute: irods::access_time
value: 1561193732
units:
----
attribute: irods::publishing::persistent_identifier
value: OTcwNTczZGMtY2UwMC00MDU4LWJmNjEtMWRlNmMwZjAwNWEw
units:
----
attribute: irods::publishing::publish
value: dataworld
units:

Immutability of Published Content

remote addresses: 127.0.0.1 ERROR: rmUtil: rm error for /tempZone/home/irodsconsortium/published_file0, status = -35000 status = -35000 SYS_INVALID_OPR_TYPE
Level 0: object is published and now immutable [/tempZone/home/irodsconsortium/file3]

imeta rm -d file3 irods::publishing::publish dataworld

irm -f published_file0

rodsusers cannot modify or delete published content

Users cannot remove publication metadata

remote addresses: 127.0.0.1 ERROR: Level 0: publishing metadata tags are immutable [/tempZone/home/irodsconsortium/file3]
remote addresses: 127.0.0.1 ERROR: rcModAVUMetadata failed with error -35000 SYS_INVALID_OPR_TYPE
Level 0: publishing metadata tags are immutable [/tempZone/home/irodsconsortium/file3]

Overriding the Persistent Identifier Policy

The data.world publication policy delegates the generation of persistent identifiers.

  • By default it is a base64 encoded UUID
irods_policy_publishing_persistent_identifier(
    *object_path, *service_name, *pid) {
    writeLine("serverLog", "Persistent Identifier - [*object_path]")
    *pid = "ABC123"
}

As the irods user

edit /etc/irods/persistent_identifier.re

Overriding the Persistent Identifier Policy

Configure the rule base

            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
                "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                "plugin_specific_configuration": {
                    "re_data_variable_mapping_set": [
                        "core"
                    ],
                    "re_function_name_mapping_set": [
                        "core"
                    ],
                    "re_rulebase_set": [
                        "persistent_identifier",
                        "document_type",
                        "elasticsearch",
                        "core"
                    ],

Overriding the Persistent Identifier Policy

Disable the Persistent Identifier plugin

        "rule_engines": [

            {
                "instance_name": "irods_rule_engine_plugin-dataworld-instance",
                "plugin_name": "irods_rule_engine_plugin-dataworld",
                "plugin_specific_configuration": {
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-publishing-instance",
                "plugin_name": "irods_rule_engine_plugin-publishing",
                "plugin_specific_configuration": {
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-persistent_identifier-instance",
                "plugin_name": "irods_rule_engine_plugin-persistent_identifier",
                "plugin_specific_configuration": {
            }

Overriding the Persistent Identifier Policy

iput 1Mfile publishing_pid_test

imeta set -d publishing_pid_test irods::publishing::publish dataworld

As the ubuntu user test the new policy

AVUs defined for dataObj pid_test7:
attribute: irods::publishing::publish
value: dataworld
units:
----
attribute: irods::publishing::persistent_identifier
value: ABC123
units:

imeta ls -d publishing_pid_test

Check our results

Overriding the Publishing Policy

Policy Signatures - Implement these four policies to provide integration to a new publishing service

irods_policy_publishing_object_publish_<service>(

    *object_path, *user_name, *service_name)
irods_policy_publishing_object_purge_<service>(

    *object_path, *user_name, *service_name)

irods_policy_publishing_collection_index_<service>(

    *collection_name, *user_name, *service_name)
irods_policy_indexing_collection_purge_<service>(

    *collection_name, *user_name, *service_name

Publishing Policy

The Publishing Policy provides a reactive framework to metadata attributes.  Once the publishing service policy is invoked, it may provide any implementation desired.

For instance, some services may simply need a URI to the data set whereas others may require the data be uploaded, such as data.world.

The publishing service may require a specific submission package format, additional metadata or other requirements which would require the publishing job to wait until these needs are met.

Questions?

Made with Slides.com