Implementation of Data Ingestion Pipeline for CoBotAGV readings in edge-cloud environment

Piotr Grzesik, Paweł Benecki, Daniel Kostrzewa, Dariusz Mrozek, Bohdan Shubyn

Agenda

  1. Fetching data from OPC UA Server
  2. Location of OPC UA Client
  3. High-level overview of current data storage architecture
  4. Details about specific parts of the cloud architecture
  5. Implementation with "Infrastructure as code" approach with Terraform

Fetching data from OPC UA

Subscription-based approach

Fetching data from OPC UA

Subscription-based approach

Pros:

  • No unnecessary calls to OPC UA Server
  • Instant granular updates exactly when variable changes
  • Persisted changes in local db in case of failures

Cons:

  • More complex architecture and need for aggregation of single variable updates to project state of the system
  • Need to maintain state of the system locally as a part of client

Fetching data from OPC UA

Subscription-based approach with incremental updates

Fetching data from OPC UA

Subscription-based approach with incremental updates

Pros:

  • No unnecessary calls to OPC UA Server
  • Instant granular updates exactly when variable changes
  • Simple implementation of OPC UA Client

Cons:

  • More complex processing architecture on the backend
  • Needs failover scenario

Fetching data from OPC UA

Periodic fetch approach

Fetching data from OPC UA

Periodic fetch approach

Pros:

  • Simpler architecture
  • No need to maintain local state as part of the OPC UA Client

Cons:

  • Unnecessary calls to OPC UA Server when no changes are observed
  • Needs failover scenario

Placement of OPC UA Client

Co-located with AGV and OPC UA Server

Placement of OPC UA Client

Co-located with AGV and OPC UA Server

Pros:

  • Lower latency between OPC UA Server/Client
  • Better resilience to outages in Internet connection

Cons:

  • More complex architecture at the edge
  • More challening maintenance of OPC UA Client at the edge
  • Need to create multiple clients if there are AGVs that are not co-located

 

Placement of OPC UA Client

Located in Cloud environment

Placement of OPC UA Client

Located in Cloud environment

Pros:

  • Simpler architecture
  • Can support data ingestion from multiple OPC UA Servers

Cons:

  • Higher communication latency between OPC UA Server/Client
  • Less resilience to Internet connection outages
  •  

 

High-level diagram - cloud part

Azure Data Lake Storage Gen2

A centralized, single-storage platform for data ingestion, processing and visualisation. Massively scalable, according to documentation it can handle exabytes of data, with throughput at gigabites per second. It supports hierarchical namespaces, that allow for efficient data access. It can be integrated with multiple analytical frameworks and offers Hadoop compatible access. 

Azure IoT Hub

Service that allows for secure and reliable communication between cloud and IoT devices. It supports management of specific devices, authentication and authorization, and integrates with services such as Azure Stream Analytics or Azure Data Lake Storage. Additionally, it can be enhanced with Azure IoT Edge to deploy services directly at edge devices. 

Azure Stream Analytics

Service that is a fully managed stream analytics engine, designed to process large volume of streaming data. It can be used to enrich the data, preprocess it, or discard invalid events. It can also be integrated with Azure Functions or Azure Machine Learning, to enable e.g. anomaly detection on incoming data streams. Azure Stream Analytics jobs can also be executed on edge devices.

Azure Event Hub

Azure Event Hub is a generic event ingestion service. It support multiple source and outputs, natively integrates with services such as Azure Functions. It supports three protocols for consumers and producers - AMQP, Kafka, and HTTPS. It also supports data Capture to save data to Azure Data Lake Storage for long-term retention. 

Azure Time Series Insights

Set of services that allow for ingesting, storing, processing, organizing, and visualizing time series data. It is optimized for data coming from IoT devices. It supports warm and cold storage for both interactive and historical analysis. It can also be integrated with other services such as Azure Machine Learning, Azure Databricks for further analysis of stored data. Unfortunately, this service will be no longer available in 2025. 

Infrastructure as Code

Infrastructure as Code (IaC) is a concept of defining and managing cloud infrastructure with configuration rather than manual interaction with GUI or via CLI. It allows to define and deploy repeatable cloud infrastructures, while at the same time providing a definition and overview of all your services. 

Terraform

Terraform is an IaC tool that allows for managing infrastructure across multiple cloud providers such as Microsoft Azure, Amazon Web Services, or Google Cloud Platform. It uses a human-readable language for definitions of resources, it records state to track changes across deployments. Its configuration can be commited to version control systems to provide an audit trail of changes to your infrastructure. 

Terraform config

resource "azurerm_eventhub" "eventhub" {
  name                = var.eventhub_name
  namespace_name      = azurerm_eventhub_namespace.eventhub_namespace.name
  resource_group_name = azurerm_resource_group.rg.name
  partition_count     = 1
  message_retention   = 1

  capture_description {
    enabled = true
    encoding = "Avro"
    interval_in_seconds = 300
    destination  {
      name = "EventHubArchive.AzureBlockBlob"
      blob_container_name = azurerm_storage_container.storage_container.name
      storage_account_id = azurerm_storage_account.storage_account.id
      archive_name_format = "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}"
    }
  }
}

Potential services for data analytics

  • Azure Databricks - processing data with Apache Spark
  • Azure Data Lake Analytics - parallel data transformation and processing in serverless manner with U-SQL
  • Azure Synapse Analytics - combination of analytics on data from data lakes and data warehouses
  • Azure HDInsight - platform to provision Hadoop, Spark, Storm clusters
  • Azure Data Explorer - managed data analytics service for real-time analysis on large volumes of streaming data 
Made with Slides.com