Policy Training
Publishing
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
Policy Training
Publishing
June 25-28, 2019
iRODS User Group Meeting
University of Utrecht, NL
iRODS Capabilities
Publishing
A policy framework that provides an asynchronous, scalable data publishing service driven by metadata
Publishing Policy Components
<technology> is directly derived from metadata and is used to delegate the policy invocation
Core Competencies
Policy
Capabilities
Publishing Overview
Example Implementation
Getting Started
Installing the Publishing Plugins
wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-dataworld.deb
wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-publishing.deb
wget http://people.renci.org/~jasonc/irods/irods-rule-engine-plugin-persistent-identifier.deb
As the ubuntu user
Download the publishing packages
sudo dpkg -i irods-rule-engine-plugin-dataworld.deb irods-rule-engine-plugin-publishing.deb irods-rule-engine-plugin-persistent-identifier.deb
Install the Publishing packages
Configuring Publishing Plugins
"rule_engines": [
...
{
"instance_name": "irods_rule_engine_plugin-dataworld-instance",
"plugin_name": "irods_rule_engine_plugin-dataworld",
"plugin_specific_configuration": {
}
},
{
"instance_name": "irods_rule_engine_plugin-publishing-instance",
"plugin_name": "irods_rule_engine_plugin-publishing",
"plugin_specific_configuration": {
}
},
{
"instance_name": "irods_rule_engine_plugin-persistent_identifier-instance",
"plugin_name": "irods_rule_engine_plugin-persistent_identifier",
"plugin_specific_configuration": {
}
},
As the irods user
Edit /etc/irods/server_config.json
Standing up data.world
imeta set -u <user name> irods::publishing::api_token <your token>
Go to https://data.world:
Annotate the user with the token
iadmin mkuser <username> rodsuser
iadmin moduser <username> password <password>
Create an iRODS user that matches your data.world account
We will use the irodsconsortium user name and account
Tagging collections for publishing
Collections and Data Objects are tagged with metadata to indicate they should be published
A new AVU applied to a populated collection will schedule all objects for publication
New objects cannot be placed into a collection with a publishing AVUs applied. Nor can those object be modified with POSIX operations.
Tagging for publication
Publishing metadata takes the form:
A: irods::publishing::publish
V: <service>
The service name is directly applied the the policy name template, which dictates which policies are invoked.
Tagging for publication
Download some data
wget https://cdn.patricktriest.com/data/books.zip
unzip books.zip
imkdir published_collection
Create a collection to be published
iput -r ./books published_collection/books0
Put a collection of data to the collection to be indexed
Initialize the ubuntu user as your iRODS dataworld username
Tagging collections for publishing
imeta set -C published_collection irods::publishing::publish dataworld
Set the metadata on published_collection
id name
11269 {"collection-name":"/tempZone/home/rods/published_collection","publish-type":"collection","publisher":"dataworld","rule-engine-instance-name":"irods_rule_engine_plugin-publishing-instance","rule-engine-operation":"irods_policy_publishing_collection_publish","user-name":"rods"}
A delayed execution job is scheduled which will then scan and schedule publishing jobs
Once the queue is empty check data.world for the data set
Tagging objects for publishing
imeta set -d published_file0 irods::publishing::publish dataworld
Set the metadata on published_object0
truncate -s 1M 1Mfile
Create a new data object
iput 1Mfile published_file0
Tagging objects for publishing
imeta ls -d published_file0
Inspect the metadata for the object
AVUs defined for dataObj published_file0:
attribute: irods::access_time
value: 1561193732
units:
----
attribute: irods::publishing::persistent_identifier
value: OTcwNTczZGMtY2UwMC00MDU4LWJmNjEtMWRlNmMwZjAwNWEw
units:
----
attribute: irods::publishing::publish
value: dataworld
units:
Immutability of Published Content
remote addresses: 127.0.0.1 ERROR: rmUtil: rm error for /tempZone/home/irodsconsortium/published_file0, status = -35000 status = -35000 SYS_INVALID_OPR_TYPE
Level 0: object is published and now immutable [/tempZone/home/irodsconsortium/file3]
imeta rm -d file3 irods::publishing::publish dataworld
irm -f published_file0
rodsusers cannot modify or delete published content
Users cannot remove publication metadata
remote addresses: 127.0.0.1 ERROR: Level 0: publishing metadata tags are immutable [/tempZone/home/irodsconsortium/file3]
remote addresses: 127.0.0.1 ERROR: rcModAVUMetadata failed with error -35000 SYS_INVALID_OPR_TYPE
Level 0: publishing metadata tags are immutable [/tempZone/home/irodsconsortium/file3]
Overriding the Persistent Identifier Policy
The data.world publication policy delegates the generation of persistent identifiers.
irods_policy_publishing_persistent_identifier( *object_path, *service_name, *pid) { writeLine("serverLog", "Persistent Identifier - [*object_path]") *pid = "ABC123" }
As the irods user
edit /etc/irods/persistent_identifier.re
Overriding the Persistent Identifier Policy
Configure the rule base
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"persistent_identifier",
"document_type",
"elasticsearch",
"core"
],
Overriding the Persistent Identifier Policy
Disable the Persistent Identifier plugin
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-dataworld-instance",
"plugin_name": "irods_rule_engine_plugin-dataworld",
"plugin_specific_configuration": {
}
},
{
"instance_name": "irods_rule_engine_plugin-publishing-instance",
"plugin_name": "irods_rule_engine_plugin-publishing",
"plugin_specific_configuration": {
}
},
{
"instance_name": "irods_rule_engine_plugin-persistent_identifier-instance",
"plugin_name": "irods_rule_engine_plugin-persistent_identifier",
"plugin_specific_configuration": {
}
Overriding the Persistent Identifier Policy
iput 1Mfile publishing_pid_test
imeta set -d publishing_pid_test irods::publishing::publish dataworld
As the ubuntu user test the new policy
AVUs defined for dataObj pid_test7:
attribute: irods::publishing::publish
value: dataworld
units:
----
attribute: irods::publishing::persistent_identifier
value: ABC123
units:
imeta ls -d publishing_pid_test
Check our results
Overriding the Publishing Policy
Policy Signatures - Implement these four policies to provide integration to a new publishing service
irods_policy_publishing_object_publish_<service>(
*object_path, *user_name, *service_name)
irods_policy_publishing_object_purge_<service>(
*object_path, *user_name, *service_name)
irods_policy_publishing_collection_index_<service>(
*collection_name, *user_name, *service_name)
irods_policy_indexing_collection_purge_<service>(
*collection_name, *user_name, *service_name
Publishing Policy
The Publishing Policy provides a reactive framework to metadata attributes. Once the publishing service policy is invoked, it may provide any implementation desired.
For instance, some services may simply need a URI to the data set whereas others may require the data be uploaded, such as data.world.
The publishing service may require a specific submission package format, additional metadata or other requirements which would require the publishing job to wait until these needs are met.
Questions?