Policy Training

Native Rule Language

Policy Training

Native Rule Language

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

August 3-6, 2020

KU Leuven Training

Webinar Presentation

The Native Rule Language

The iRODS Rule Language is a domain specific language (DSL) provided by iRODS to define policies and actions in the system.

The iRODS Rule Language provides a syntax similar to C

rule_name(*rule, *parameters)
{
    # a comment in a rule
    
    writeLine("stdout", "Hello, World!")

    0 # return value
}

Boolean Values

Boolean literals include true and false

Boolean operations include

!  # not
&& # and
|| # or

Numeric Values

Numeric literals include integers and doubles

Numeric operations include

-  # Negation
^  # Power
*  # Multiplication
/  # Division
%  # Modulo
-  # Subtraction
+  # Addition
>  # Greater than
<  # Less than
>= # Greater than or equal
<= # Less than or equal

Numeric functions include

exp(<num>)
log(<num>)
abs(<num>)
floor(<num>)
ceiling(<num>)
average(<num>,<num>,...)
max(<num>,<num>,...)
min(<num>,<num>,...)

String Values

String literals include 'I am a string' and "I am also a string"

String operations include

str()      # converts other values to strings
int()      # converts a string to an integer
double()   # converts a string to a double
bool()     # converts a string to a boolean
++         # concatenates two strings
like       # wildcard comparison of two strings
like regex # regular expression matching
substr()   # extract a substring from a string
strlen()   # compute the length of a string
split()    # split a string on a given character
triml()    # trim left to a given character
trimr()    # trim right to a given character

Variables may be expanded in a string 'I am a variable: *x'

The * must be escaped to be a literal 'I am not a variable: \*x'

Key Value Pairs

The rule language provides a dictionary style data structure:

*var.key = "value"

For example:

*A.a="A";
*A.b="B";
*A.c="C";
str(*A); # a=A++++b=B++++c=C

Currently only string values are supported

Lists

The rule language provides a list style data structure:

*x = list('this', 'is', 'a', 'list')

List operations include:

*x = list('this', 'is', 'a', 'list')

elem(*x, 1)            # extracts an element, evaluates to 'is'
setelem(*x, 1,"isn't") # sets an element, replaces 'is' with 'isn't'
size(*x)               # computes the size of a list, evaluates to 4
hd(*x)                 # head of the list, returns 'this'
tl(*x)                 # tail of the list, returns ('is', 'a', list')
cons('foo', *x)        # prepends an element to a list,
                       # returns ('foo', 'this', is', 'a', list')

All entries in a list must be of the same type

Flow Control

The rule language provides a standard if - then - else structure

For Example:

if(*x == 'one') {
    # code for case one  
}
else if(*x == 'two') {
    # code for case two
}
else {
    # code for default
}

Iteration

The rule language provides foreach and while constructs for iteration

For Example:

*x = list('this', 'is', 'a', 'list')

foreach(*e in *x) {
    writeLine('stdout', 'element *e') 
}

*y = 0
while(*y < 10) {
    writeLine('stdout', 'Hello, World!')
    *y = *y + 1
}

Error Handling

The rule language provides errormsg and errorcode constructs to capture and manage errors from microservices

For Example:


*logical_path = '/tempZone/home/rods/example.txt'

*err = errorcode(msiObjStat(*logical_path, *stat))
if(*err < 0) {
    writeLine('serverLog', "msiObjStat failed for *logical_path")
}

*err = errormsg(msiObjStat(*logical_path, *stat), *msg)
if(*err < 0) {
    writeLine('serverLog', "msiObjStat failed for *logical_path with message *msg")
}

If the error is not captured properly the rule will error out

Error Handling

Alternatively fail() and failmsg() allow a rule to report errors

For Example:


*logical_path = '/tempZone/home/rods/example.txt'

*err = errorcode(msiObjStat(*logical_path, *stat))
if(*err < 0) {
    writeLine('serverLog', "msiObjStat failed for *logical_path")
    fail(*err)
}

*err = errormsg(msiObjStat(*logical_path, *stat), *msg)
if(*err < 0) {
    *msg = "msiObjStat failed for *logical_path with message *msg"
    writeLine('serverLog', *msg)
    failmsg(*err, *msg)
}

Language Integrated General Query

Provides a coupling between catalog queries and the rule language

Follows the General Query syntax, much like iquest

For Example:


*data_name = 'example.txt'
*coll_name = '/tempZone/home/rods'

*query = SELECT RESC_NAME, DATA_REPL_NUM, WHERE COLL_NAME = '*coll_name' AND DATA_NAME = '*data_name'

foreach(*row in *query) {
  
    *resc_name = *row.RESC_NAME
  
    *repl_num  = *row.DATA_REPL_NUM
  
    writeLine('stdout', 'replicas found for *coll_name/*data_name on *resc_name, *repl_num')
}

The row returned is a key value structure whose keys are the column names

Example Policy Implementations

Consider 3 use cases:

  • Do we have a sufficient number of replicas?
  • Are the replicas in the correct locations?
  • Is the data correct at rest?

How do we provide these assertions and guarantees?

When do we enforce this policy?

How do we know we should enforce this policy?

What do we do when policy is in violation?

Example Policy Implementations

Example code can be found here:

git clone https://github.com/jasoncoposky/irods_capability_integrity

sudo cp irods_capability_integrity/*.re /etc/irods

Clone the repository and stage the rule bases:

Data Integrity Policy - Replica Number

Provides checks around the number of replicas

For Example:

imkdir placement_policy

imeta set -C placement_policy irods::verification::replica_number 3

Driven by collection metadata:

irods::verification::replica_number <positive integer>

Data Integrity Policy - Replica Number

# Single point of truth for an error value
get_error_value(*err) { *err = "ERROR_VALUE" }

# The code to return for the rule engine plugin framework to look for additional PEPs to fire.
RULE_ENGINE_CONTINUE { 5000000 }

# Error code if input is incorrect
SYS_INVALID_INPUT_PARAM { -130000 }

# metadata attribute driving policy for user status
verify_replica_number_attribute { "irods::verification::replica_number" }

verify_replica_number(*violations)
{
    *attr = verify_replica_number_attribute

    # get a list of all matching collections given the metadata attribute
    foreach(*row0 in SELECT COLL_NAME, META_COLL_ATTR_VALUE WHERE META_COLL_ATTR_NAME = "*attr") {

        *number_of_replicas = int(*row0.META_COLL_ATTR_VALUE)

        *coll_name = *row0.COLL_NAME

Data Integrity Policy - Replica Number


        *number_of_replicas = int(*row0.META_COLL_ATTR_VALUE)

        *coll_name = *row0.COLL_NAME

        # get a list of all data objects in the given collection
        foreach(*row1 in SELECT COLL_NAME, DATA_NAME WHERE COLL_NAME like "*coll_name%") {
            *matched = 0

            *coll_name = *row1.COLL_NAME
            *data_name = *row1.DATA_NAME

            # get all of the resource names where this objects replicas reside
            foreach(*row2 in SELECT RESC_NAME WHERE COLL_NAME = "*coll_name" AND DATA_NAME = "*data_name") {
                *matched = *matched + 1
            } # for resources

            if(*matched < *number_of_replicas) {
                *violations = cons("*coll_name/*data_name violates the number policy " ++ str(*number_of_replicas), *violations)
            }

        } # for objects

    } # for collections

} # verify_replica_number

Data Integrity Policy - Replica Number

execute_replica_number_policy {
    *violations = list()

    verify_replica_number(*violations)

    foreach(*v in *violations) {
        writeLine("stdout", "*v")
    }
}
INPUT null
OUTPUT ruleExecOut

Executing the replica number policy

 irule -r irods_rule_engine_plugin-irods_rule_language-instance -F execute_replica_number_policy.r

Test from the command line

Data Integrity Policy - Replica Placement

Provides checks around the location of replicas on specific resources

For Example:

imkdir placement_policy

imeta set -C placement_policy irods::verification::replica_placement "demoResc, ufs0"

Driven by collection metadata:

irods::verification::replica_placement <list of resource names>

Data Integrity Policy - Replica Placement

# Single point of truth for an error value
get_error_value(*err) { *err = "ERROR_VALUE" }

# The code to return for the rule engine plugin framework to look for additional PEPs to fire.
RULE_ENGINE_CONTINUE { 5000000 }

# Error code if input is incorrect
SYS_INVALID_INPUT_PARAM { -130000 }

# metadata attribute driving policy for user status
verify_replicas_attribute { "irods::verification::replica_placement" }

verify_replica_placement(*violations)
{
    *attr = verify_replicas_attribute

    # get a list of all matching collections given the metadata attribute
    foreach(*row0 in SELECT COLL_NAME, META_COLL_ATTR_VALUE WHERE META_COLL_ATTR_NAME = "*attr") {
        *resource_list = *row0.META_COLL_ATTR_VALUE

        *number_of_resources = size(split(*resource_list, ","))

        *coll_name = *row0.COLL_NAME

        # get a list of all data objects in the given collection
        foreach(*row1 in SELECT COLL_NAME, DATA_NAME WHERE COLL_NAME like "*coll_name%") {
            *matched = 0

            *coll_name = *row1.COLL_NAME
            *data_name = *row1.DATA_NAME

            # get all of the resource names where this objects replicas reside
            foreach(*row2 in SELECT RESC_NAME WHERE COLL_NAME = "*coll_name" AND DATA_NAME = "*data_name") {
                *resource_name = *row2.RESC_NAME

                 # set modify for all collaborators
                 *split_list = split(*resource_list, ",")

Data Integrity Policy - Replica Placement


                 while(size(*split_list) > 0) {
                     # pull head of list
                     *name = str(hd(*split_list))

                     # subset remainder of list
                     *split_list = tl(*split_list)

                     # chomp space
                     *name = triml(*name, ' ')
                     *name = trimr(*name, ' ')

                     # set write permission for collaborator
                     if(*name == *resource_name) {
                         *matched = *matched + 1
                     }
                 }
            } # for resources

            if(*matched < *number_of_resources) {
                *violations = cons("*coll_name/*data_name violates the placement policy "
                                   ++ *resource_list, *violations)
            }
        } # for objects
    } # for collections
} # verify_replica_placement

Data Integrity Policy - Replica Placement

execute_replica_placement_policy {
    *violations = list()

    verify_replica_placement(*violations)

    foreach(*v in *violations) {
        writeLine("stdout", "*v")
    }
}
INPUT null
OUTPUT ruleExecOut

Executing the replica placement policy

 irule -r irods_rule_engine_plugin-irods_rule_language-instance -F execute_replica_number_policy.r

Test from the command line

Data Integrity Policy - Replica Checksum

Provides checks around the integrity of replicas data at rest

For Example:

imkdir placement_policy

imeta set -C placement_policy irods::verification::replica_checksum sha256

Driven by collection metadata:

irods::verification::replica_checksum <checksum type>

Data Integrity Policy - Replica Checksum

# Single point of truth for an error value
get_error_value(*err) { *err = "ERROR_VALUE" }

# The code to return for the rule engine plugin framework to look for additional PEPs to fire.
RULE_ENGINE_CONTINUE { 5000000 }

# Error code if input is incorrect
SYS_INVALID_INPUT_PARAM { -130000 }

# metadata attribute driving policy for user status
verify_checksum_attribute { "irods::verification::checksum" }

verify_replica_checksum(*all_flag, *resource_name, *violations)
{
    *attr = verify_checksum_attribute

    # get a list of all matching collections given the metadata attribute
    foreach(*row0 in SELECT COLL_NAME, META_COLL_ATTR_VALUE WHERE META_COLL_ATTR_NAME = "*attr") {

        *coll_name = *row0.COLL_NAME

        # get a list of all data objects in the given collection
        foreach(*row1 in SELECT COLL_NAME, DATA_NAME WHERE COLL_NAME like "*coll_name%") {

            *coll_name = *row1.COLL_NAME

            *data_name = *row1.DATA_NAME

Data Integrity Policy - Replica Checksum

            *coll_name = *row1.COLL_NAME
            *data_name = *row1.DATA_NAME

            if(true == *all_flag) {
                foreach(*row2 in SELECT DATA_REPL_NUM, DATA_CHECKSUM WHERE COLL_NAME = "*coll_name" AND DATA_NAME = "*data_name") {
                    *repl_num = *row2.DATA_REPL_NUM
                    *checksum = *row2.DATA_CHECKSUM
                    msiDataObjChksum("*coll_name/*data_name", "forceChksum=++++replNum=*repl_num", *out)

                    if(*checksum != *out) {
                        *violations = cons("*coll_name/*data_name violates the checksum policy *out vs *checksum", *violations)
                    }

                } # for resources
            }
            else {
                foreach(*row2 in SELECT DATA_REPL_NUM, DATA_CHECKSUM WHERE COLL_NAME = "*coll_name" AND DATA_NAME = "*data_name" AND RESC_NAME = "*resource_name") {
                    *repl_num = *row2.DATA_REPL_NUM
                    *checksum = *row2.DATA_CHECKSUM                  
                    msiDataObjChksum("*coll_name/*data_name", "forceChksum=++++replNum=*repl_num", *out)

                    if(*checksum != *out) {
                        *violations = cons("*coll_name/*data_name violates the checksum policy *out vs *checksum", *violations)
                    }

                } # for resources
            }

        } # for objects

    } # for collections

} # verify_replica_checksum

Data Integrity Policy - Replica Checksum

execute_replica_checksum_policy {
    *violations = list()

    # all_flag      : checksum all replicas
    # resource_name : if all_flag is false, provide a resource name
    # violations    : list of violating objects
    verify_replica_checksum(true, "", *violations)

    foreach(*v in *violations) {
        writeLine("stdout", "*v")
    }
}
INPUT null
OUTPUT ruleExecOut

Executing the replica checksum policy

 irule -r irods_rule_engine_plugin-irods_rule_language-instance -F execute_replica_checksum_policy.r

Test from the command line

Configuring the Policies

        "rule_engines": [
            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
                "shared_memory_instance": "irods_rule_language_rule_engine",
                "plugin_specific_configuration": {
                    "re_rulebase_set": [
                        "verify_replica_placement",
                        "verify_replica_number",
                        "verify_checksum",
                        "core"
                    ],

Place the rule base files into /etc/irods

Add to /etc/irods/server_config.json for the rule language plugin

Enforcing Policies

Synchronous Enforcement

process_violations(*v)
{
    # auditing code?

    # reporting code?
  
    # recovery code?
}

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT)
{
    *logical_path = *DATAOBJINP.obj_path
  
     verify_replica_checksum_for_object(true, "", *violations)
     verify_replica_number_for_object(*violations)
     verify_replica_placement_for_object(*violations)
  
     process_violations(*violations)
}

(our implementations were designed to be asynchronous)

Enforcing Policies

Asynchronous Enforcement

execute_replica_checksum_policy
{
    *violations = list()

    delay("<EF>RUN FOR EVER</EF><ET>7d</ET>") {
        verify_replica_checksum(true, "", *violations)
        verify_replica_number(*violations)
        verify_replica_placement(*violations)
        process_violations(*violations)
    }
}
INPUT null
OUTPUT ruleExecOut

Questions?