Implementation of Data Ingestion Pipeline for CoBotAGV readings in edge-cloud environment
Piotr Grzesik, Paweł Benecki, Daniel Kostrzewa, Dariusz Mrozek, Bohdan Shubyn
Agenda
- Fetching data from OPC UA Server
- Location of OPC UA Client
- High-level overview of current data storage architecture
- Details about specific parts of the cloud architecture
- Implementation with "Infrastructure as code" approach with Terraform
Fetching data from OPC UA
Subscription-based approach
Fetching data from OPC UA
Subscription-based approach
Pros:
- No unnecessary calls to OPC UA Server
- Instant granular updates exactly when variable changes
- Persisted changes in local db in case of failures
Cons:
- More complex architecture and need for aggregation of single variable updates to project state of the system
- Need to maintain state of the system locally as a part of client
Fetching data from OPC UA
Subscription-based approach with incremental updates
Fetching data from OPC UA
Subscription-based approach with incremental updates
Pros:
- No unnecessary calls to OPC UA Server
- Instant granular updates exactly when variable changes
- Simple implementation of OPC UA Client
Cons:
- More complex processing architecture on the backend
- Needs failover scenario
Fetching data from OPC UA
Periodic fetch approach
Fetching data from OPC UA
Periodic fetch approach
Pros:
- Simpler architecture
- No need to maintain local state as part of the OPC UA Client
Cons:
- Unnecessary calls to OPC UA Server when no changes are observed
- Needs failover scenario
Placement of OPC UA Client
Co-located with AGV and OPC UA Server
Placement of OPC UA Client
Co-located with AGV and OPC UA Server
Pros:
- Lower latency between OPC UA Server/Client
- Better resilience to outages in Internet connection
Cons:
- More complex architecture at the edge
- More challening maintenance of OPC UA Client at the edge
- Need to create multiple clients if there are AGVs that are not co-located
Placement of OPC UA Client
Located in Cloud environment
Placement of OPC UA Client
Located in Cloud environment
Pros:
- Simpler architecture
- Can support data ingestion from multiple OPC UA Servers
Cons:
- Higher communication latency between OPC UA Server/Client
- Less resilience to Internet connection outages
High-level diagram - cloud part
Azure Data Lake Storage Gen2
A centralized, single-storage platform for data ingestion, processing and visualisation. Massively scalable, according to documentation it can handle exabytes of data, with throughput at gigabites per second. It supports hierarchical namespaces, that allow for efficient data access. It can be integrated with multiple analytical frameworks and offers Hadoop compatible access.
Azure IoT Hub
Service that allows for secure and reliable communication between cloud and IoT devices. It supports management of specific devices, authentication and authorization, and integrates with services such as Azure Stream Analytics or Azure Data Lake Storage. Additionally, it can be enhanced with Azure IoT Edge to deploy services directly at edge devices.
Azure Stream Analytics
Service that is a fully managed stream analytics engine, designed to process large volume of streaming data. It can be used to enrich the data, preprocess it, or discard invalid events. It can also be integrated with Azure Functions or Azure Machine Learning, to enable e.g. anomaly detection on incoming data streams. Azure Stream Analytics jobs can also be executed on edge devices.
Azure Event Hub
Azure Event Hub is a generic event ingestion service. It support multiple source and outputs, natively integrates with services such as Azure Functions. It supports three protocols for consumers and producers - AMQP, Kafka, and HTTPS. It also supports data Capture to save data to Azure Data Lake Storage for long-term retention.
Azure Time Series Insights
Set of services that allow for ingesting, storing, processing, organizing, and visualizing time series data. It is optimized for data coming from IoT devices. It supports warm and cold storage for both interactive and historical analysis. It can also be integrated with other services such as Azure Machine Learning, Azure Databricks for further analysis of stored data. Unfortunately, this service will be no longer available in 2025.
Infrastructure as Code
Infrastructure as Code (IaC) is a concept of defining and managing cloud infrastructure with configuration rather than manual interaction with GUI or via CLI. It allows to define and deploy repeatable cloud infrastructures, while at the same time providing a definition and overview of all your services.
Terraform
Terraform is an IaC tool that allows for managing infrastructure across multiple cloud providers such as Microsoft Azure, Amazon Web Services, or Google Cloud Platform. It uses a human-readable language for definitions of resources, it records state to track changes across deployments. Its configuration can be commited to version control systems to provide an audit trail of changes to your infrastructure.
Terraform config
resource "azurerm_eventhub" "eventhub" {
name = var.eventhub_name
namespace_name = azurerm_eventhub_namespace.eventhub_namespace.name
resource_group_name = azurerm_resource_group.rg.name
partition_count = 1
message_retention = 1
capture_description {
enabled = true
encoding = "Avro"
interval_in_seconds = 300
destination {
name = "EventHubArchive.AzureBlockBlob"
blob_container_name = azurerm_storage_container.storage_container.name
storage_account_id = azurerm_storage_account.storage_account.id
archive_name_format = "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}"
}
}
}
Potential services for data analytics
- Azure Databricks - processing data with Apache Spark
- Azure Data Lake Analytics - parallel data transformation and processing in serverless manner with U-SQL
- Azure Synapse Analytics - combination of analytics on data from data lakes and data warehouses
- Azure HDInsight - platform to provision Hadoop, Spark, Storm clusters
- Azure Data Explorer - managed data analytics service for real-time analysis on large volumes of streaming data
Implementation of data lake for CoBotAGV readings in cloud environment
By progressive
Implementation of data lake for CoBotAGV readings in cloud environment
- 200