Bazy Danych w IoT

Piotr Grzesik

Agenda

Characteristics of IoT data
Characteristics of IoT data processing
Important features of IoT databases
TimescaleDB
InfluxDB
MongoDB
Cassandra
Azure Time Series Insights

Characteristics of IoT data

usually timestamped
low number of relationships between data
usually metrics or events
temporal ordering of records (time-series)
high volume
high frequency
often low-level, raw data
usually numerical values

Characteristics of IoT data processing

high number of insert operations, often in batches
very rare updates of individual records
queries often perform aggregations over time

Important features for IoT databases

Support for very fast write operations (also handling bursts of data)
Support for fast reads and analytical queries
Support for retention policies
Support for downsampling
Support for compression

Time-series databases

Database management systems that are optimized to handle timestamped or time-series data
Data is stored in form of measurements, events, or metrics, usually numerical
Support for advanced queries over time windows
Time-Series databases are built in two ways - as a standalone database or as an extension to an existing database

Time-series databases

InfluxDB (https://www.influxdata.com/)
TimescaleDB (https://www.timescale.com/)
OpenTSDB (http://opentsdb.net/)
Riak TS (https://riak.com/)
Amazon Timestream (https://aws.amazon.com/timestream/)
Prometheus (https://prometheus.io/)

InfluxDB

Open source, time-series database written in Go, developed and maintained by Influx Inc.
Uses InfluxQL, custom SQL-like query language, or Flux
Has support for aggregation functions over time-series data
Part of a popular TICK (Telegraf, Influx, Chronograf, Kapacitor) stack
Scalable and Highly Available thanks to support for clustering

InfluxDB Demo

TimescaleDB

Open source, time-series PostgreSQL extension written in C, developed and maintained by Timescale, Inc.
Uses SQL and is compatible with "native" PostgreSQL
Supports the same client libraries and CLI tools as PostgreSQL
Adds support for aggregation functions over time-series data

TimescaleDB Demo

Document-oriented databases

Also referred to as document stores, used for managing semi-structured data, often in form of JSON-like documents.
Schemaless, do not require predefined schemas which makes it perfect for storing dynamic, unstructured data
Often there's no need for object-relational mapping on application level
Advanced querying capabilities

Document-oriented databases

MongoDB (https://www.mongodb.com/)
Apache CouchDB (https://couchdb.apache.org/)
Azure Cosmos DB (https://azure.microsoft.com/en-us/services/cosmos-db/)
Elasticsearch (https://www.elastic.co/)
RethinkDB (https://rethinkdb.com/)
Couchbase (https://www.couchbase.com/)

MongoDB

General purpose, document oriented database, developed by MongoDB, Inc.
Licensed under Server Side Public License (SSPL)
JSON documents, queries also in form of JSON
Support for geospatial queries
Two types of relationships - reference and embedded
Horizontal scaling using sharding

MongoDB demo

Column-oriented databases

Database management systems that store data tables by column instead of by row
Often uses SQL or SQL-like language for querying
Aimed at workloads that consider columns (specific values) more than whole records (rows)
Very often used in analytical applications
Often under this term also fall wide-column databases, which are better interpreted as two-dimensional key-value stores

Column-oriented databases

Apache Cassandra (http://cassandra.apache.org/)
Apache HBase (https://hbase.apache.org/)
Amazon Redshift (https://aws.amazon.com/redshift/)
Google BigTable (https://cloud.google.com/bigtable/)
ClickHouse (https://clickhouse.yandex/)
Scylla (https://www.scylladb.com/)
Apache Druid (https://druid.apache.org/)

Apache Cassandra

Open source, distributed, wide-column database, initially developed at Facebook, currently developed by Apache Software Foundation
Designed to scale both write and reads as more machines are added to cluster
Uses CQL (Cassandra Query Language)
Integrates with Hadoop
Automatically replicates data to multiple nodes to provide fault tolerance

Apache Cassandra Demo

Azure Time Series Insights

Cloud-based service that allows to ingest, model, query, and visualise time-series data
Columnar store for ingested data
Ingests data from Azure IoT Hub and Azure Event Hubs
Offers warm and cold storage tiers
Offers a query service for analytical querying of stored data
Offers a visualisation service

Azure Time Series Insights Demo

Quiz

Q&A + Contact

@p_grzesik

pj.grzesik@gmail.com

Bazy Danych w IoT

By progressive

Bazy Danych w IoT

3 years ago
480

progressive