Data Management for the Smart Farm

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

Data Management for the Smart Farm

The iRODS Consortium

  • A Nonprofit organization embedded in the Renaissance Computing Institute, UNC Chapel Hill, North Carolina
  • Consists of Membership from enterprise public companies, to universities around the world
  • Provide sustainability around an open source data management project with a 20 year history in research

Our Membership

What is iRODS

Distributed - runs on a laptop, a cluster, on premises or geographically distributed

Open Source - BSD-3 Licensed, install it today and try before you buy

Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer

iRODS Core Competencies

The underlying technology categorized into four areas

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Projection of the Physical into the Logical

Logical Path

Physical Path(s)

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Metadata Everywhere

Workflow Automation

Integrated scripting language which is triggered by any operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Dynamic Policy Enforcement

  • restrict access
  • log for audit and reporting
  • provide additional context
  • send a notification

The iRODS rule may:

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

Federation - Shared Data and Services

Ingest to Institutional repository

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.

iRODS and the Smart Farm

Challenges with Sensor Networks

  • Varying Communication Protocols
  • Data Collection
  • Data Organization
  • Data Harmonization
  • Data Movement
  • Data Discovery
  • Security and Privacy

Challenges of the Smart Farm

  • Geographic Distribution
  • Network Capacity
  • Network Reliability
  • Large Geographic Areas
  • Variety of Sensors to Interface
  • Variety of Data Formats to Process
  • Variety of Required Policy

The iRODS IoT Gateway - Data Collection

  • Automate data collection
  • Leverage rule engine to reach out to other libraries for specific interface protocols
  • Many iRODS client libraries: REST, C++, Python, Java
  • Operate in a push and/or pull model
  • Includes user submitted data
  • Schedule periodic data collection

The iRODS IoT Gateway - Data Organization

  • Automate data collection - driven by policy
  • Route data to specific collections and storage
  • Harvest metadata - apply for discovery and provenance
  • Initiate data transformation
  • Trigger analytics workflows

The iRODS IoT Gateway - Data Harmonization

Prep data for analytics:

  • Normalize time scales
  • Normalize geographic projection
  • Normalize internal representation
  • Subset data
  • Transform data to common formats

The iRODS IoT Gateway - Data Movement

Data movement can be initiated by policy or by the user

  • Replicate data to archive storage
  • Synchronize data across federated namespaces
  • Replicate data to HPC storage for analytics
  • Move data to a central location for publication

The iRODS IoT Gateway - Data Discovery

Metadata within the catalog may be attached to any entity within the system: data, collections, users, storage

  • Metadata can be applied automatically or by the user
  • Once data is at rest it may be indexed for full text search
  • Metadata may be used to reference other data sets
  • Data may be discovered by queries across federated namespaces

The iRODS IoT Gateway - Architecture

Farm Zone

  • Each farm may host its own iRODS Zone
  • Data is gathered from sensors over the protocol of choice
  • Data is periodically synchronized to Agriculture Victoria 

Agriculture Victoria Zone

Federation

Catalog

Catalog

The iRODS IoT Gateway - Architecture

  • Each farm hosts Agriculture Victoria Servers
  • Data is gathered from sensors over the protocol of choice
  • Data is periodically replicated to Agriculture Victoria 

Agriculture Victoria Zone

Catalog

Agriculture Victoria Zone

iRODS Service Integration

Once Data is at rest in the Agriculture Victoria Namespace

Catalog

Agriculture Victoria Zone

  • Data may be replicated to HPC storage for analytics
  • Data may be published to CKAN
  • Data may be shared or made accessible via the API gateway
  • Data may be shared over an iRODS Interface : WebDAV, Metalnx, NFS, Command Line

REST or Python

Interface

DPC API Gateway

Things to consider in an iRODS Deployment

  • Number of users and expected simultaneous connections
  • Network Performance
  • Expected ingest rate
  • Sizes of files
  • Many small files (more overhead per connection)
  • Partial read / write versus get / put semantics
  • Replication for durability
  • Replication for locality of reference
  • Load balancing vs High Availability

iRODS will run on a RaspberryPi or a rack of servers

Questions?

Data Management for the Smart Farm

By jason coposky

Data Management for the Smart Farm

iRODS bring data management, provenance and policy to the extreme edge as an IoT Gateway

  • 1,217