Bazy Danych w IoT

Piotr Grzesik

Agenda

  • Characteristics of IoT data
  • Characteristics of IoT data processing
  • Important features of IoT databases
  • TimescaleDB
  • InfluxDB
  • MongoDB
  • Cassandra
  • Azure Time Series Insights

Characteristics of IoT data

  • usually timestamped
  • low number of relationships between data
  • usually metrics or events
  • temporal ordering of records (time-series)
  • high volume
  • high frequency
  • often low-level, raw data
  • usually numerical values

Characteristics of IoT data processing

  • high number of insert operations, often in batches
  • very rare updates of individual records
  • queries often perform aggregations over time

Important features for IoT databases

  • Support for very fast write operations (also handling bursts of data)
  • Support for fast reads and analytical queries
  • Support for retention policies
  • Support for downsampling
  • Support for compression

Time-series databases​

  • Database management systems that are optimized to handle timestamped or time-series data
  • Data is stored in form of measurements, events, or metrics, usually numerical
  • Support for advanced queries over time windows
  • Time-Series databases are built in two ways - as a standalone database or as an extension to an existing database

Time-series databases​

InfluxDB

  • Open source, time-series database written in Go, developed and maintained by Influx Inc. 
  • Uses InfluxQL, custom SQL-like query language, or Flux
  • Has support for aggregation functions over time-series data
  • Part of a popular TICK (Telegraf, Influx, Chronograf, Kapacitor) stack
  • Scalable and Highly Available thanks to support for clustering

InfluxDB Demo

TimescaleDB

  • Open source, time-series PostgreSQL extension written in C, developed and maintained by Timescale, Inc. 
  • Uses SQL and is compatible with "native" PostgreSQL
  • Supports the same client libraries and CLI tools as PostgreSQL
  • Adds support for aggregation functions over time-series data

TimescaleDB Demo

Document-oriented databases​

  • Also referred to as document stores, used for managing semi-structured data, often in form of JSON-like documents.
  • Schemaless, do not require predefined schemas which makes it perfect for storing dynamic, unstructured data
  • Often there's no need for object-relational mapping on application level
  • Advanced querying capabilities 

Document-oriented databases​

MongoDB

  • General purpose, document oriented database, developed by MongoDB, Inc.
  • Licensed under Server Side Public License (SSPL)
  • JSON documents, queries also in form of JSON
  • Support for geospatial queries
  • Two types of relationships - reference and embedded
  • Horizontal scaling using sharding

MongoDB demo

Column-oriented databases​

  • Database management systems that store data tables by column instead of by row
  • Often uses SQL or SQL-like language for querying 
  • Aimed at workloads that consider columns (specific values) more than whole records (rows)
  • Very often used in analytical applications
  • Often under this term also fall wide-column databases, which are better interpreted as two-dimensional key-value stores

Column-oriented databases​

Apache Cassandra

  • Open source, distributed, wide-column database, initially developed at Facebook, currently developed by Apache Software Foundation
  • Designed to scale both write and reads as more machines are added to cluster
  • Uses CQL (Cassandra Query Language)
  • Integrates with Hadoop
  • Automatically replicates data to multiple nodes to provide fault tolerance

Apache Cassandra Demo

Azure Time Series Insights

  • Cloud-based service that allows to ingest, model, query, and visualise time-series data
  • Columnar store for ingested data
  • Ingests data from Azure IoT Hub and Azure Event Hubs
  • Offers warm and cold storage tiers
  • Offers a query service for analytical querying of stored data
  • Offers a visualisation service

Azure Time Series Insights Demo

Quiz

Q&A + Contact

@p_grzesik

pj.grzesik@gmail.com

Bazy Danych w IoT

By progressive

Bazy Danych w IoT

  • 300