Data Management for the Smart Farm
Jason Coposky
@jason_coposky
Executive Director, iRODS Consortium
Data Management for the Smart Farm
The iRODS Consortium
- A Nonprofit organization embedded in the Renaissance Computing Institute, UNC Chapel Hill, North Carolina
- Consists of Membership from enterprise public companies, to universities around the world
- Provide sustainability around an open source data management project with a 20 year history in research
Our Membership
What is iRODS
Distributed - runs on a laptop, a cluster, on premises or geographically distributed
Open Source - BSD-3 Licensed, install it today and try before you buy
Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure
iRODS as the Integration Layer
iRODS Core Competencies
The underlying technology categorized into four areas
Data Virtualization
Combine various distributed storage technologies into a Unified Namespace
- Existing file systems
- Cloud storage
- On premises object storage
- Archival storage systems
iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.
Projection of the Physical into the Logical
Logical Path
Physical Path(s)
Data Discovery
Attach metadata to any first class entity within the iRODS Zone
- Data Objects
- Collections
- Users
- Storage Resources
- The Namespace
iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.
Metadata Everywhere
Workflow Automation
Integrated scripting language which is triggered by any operation within the framework
- Authentication
- Storage Access
- Database Interaction
- Network Activity
- Extensible RPC API
The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.
Dynamic Policy Enforcement
- restrict access
- log for audit and reporting
- provide additional context
- send a notification
The iRODS rule may:
Secure Collaboration
iRODS allows for collaboration across administrative boundaries after deployment
- No need for common infrastructure
- No need for shared funding
- Affords temporary collaborations
iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.
Federation - Shared Data and Services
Ingest to Institutional repository
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
iRODS and the Smart Farm
Challenges with Sensor Networks
- Varying Communication Protocols
- Data Collection
- Data Organization
- Data Harmonization
- Data Movement
- Data Discovery
- Security and Privacy
Challenges of the Smart Farm
- Geographic Distribution
- Network Capacity
- Network Reliability
- Large Geographic Areas
- Variety of Sensors to Interface
- Variety of Data Formats to Process
- Variety of Required Policy
The iRODS IoT Gateway - Data Collection
- Automate data collection
- Leverage rule engine to reach out to other libraries for specific interface protocols
- Many iRODS client libraries: REST, C++, Python, Java
- Operate in a push and/or pull model
- Includes user submitted data
- Schedule periodic data collection
The iRODS IoT Gateway - Data Organization
- Automate data collection - driven by policy
- Route data to specific collections and storage
- Harvest metadata - apply for discovery and provenance
- Initiate data transformation
- Trigger analytics workflows
The iRODS IoT Gateway - Data Harmonization
Prep data for analytics:
- Normalize time scales
- Normalize geographic projection
- Normalize internal representation
- Subset data
- Transform data to common formats
The iRODS IoT Gateway - Data Movement
Data movement can be initiated by policy or by the user
- Replicate data to archive storage
- Synchronize data across federated namespaces
- Replicate data to HPC storage for analytics
- Move data to a central location for publication
The iRODS IoT Gateway - Data Discovery
Metadata within the catalog may be attached to any entity within the system: data, collections, users, storage
- Metadata can be applied automatically or by the user
- Once data is at rest it may be indexed for full text search
- Metadata may be used to reference other data sets
- Data may be discovered by queries across federated namespaces
The iRODS IoT Gateway - Architecture
Farm Zone
- Each farm may host its own iRODS Zone
- Data is gathered from sensors over the protocol of choice
- Data is periodically synchronized to Agriculture Victoria
Agriculture Victoria Zone
Federation
Catalog
Catalog
The iRODS IoT Gateway - Architecture
- Each farm hosts Agriculture Victoria Servers
- Data is gathered from sensors over the protocol of choice
- Data is periodically replicated to Agriculture Victoria
Agriculture Victoria Zone
Catalog
Agriculture Victoria Zone
iRODS Service Integration
Once Data is at rest in the Agriculture Victoria Namespace
Catalog
Agriculture Victoria Zone
- Data may be replicated to HPC storage for analytics
- Data may be published to CKAN
- Data may be shared or made accessible via the API gateway
- Data may be shared over an iRODS Interface : WebDAV, Metalnx, NFS, Command Line
REST or Python
Interface
DPC API Gateway
Things to consider in an iRODS Deployment
- Number of users and expected simultaneous connections
- Network Performance
- Expected ingest rate
- Sizes of files
- Many small files (more overhead per connection)
- Partial read / write versus get / put semantics
- Replication for durability
- Replication for locality of reference
- Load balancing vs High Availability
iRODS will run on a RaspberryPi or a rack of servers
Questions?
Data Management for the Smart Farm
By jason coposky
Data Management for the Smart Farm
iRODS bring data management, provenance and policy to the extreme edge as an IoT Gateway
- 1,217