Bazy Danych w IoT
Piotr Grzesik
Agenda
Characteristics of IoT data
Characteristics of IoT data processing
Important features of IoT databases
TimescaleDB
InfluxDB
MongoDB
Cassandra
Azure Time Series Insights
Characteristics of IoT data
usually timestamped
low number of relationships between data
usually metrics or events
temporal ordering of records (time-series)
high volume
high frequency
often low-level, raw data
usually numerical values
Characteristics of IoT data processing
high number of insert operations, often in batches
very rare updates of individual records
queries often perform aggregations over time
Important features for IoT databases
Support for very fast write operations (also handling bursts of data)
Support for fast reads and analytical queries
Support for retention policies
Support for downsampling
Support for compression
Time-series databases
Database management systems that are optimized to handle timestamped or time-series data
Data is stored in form of measurements, events, or metrics, usually numerical
Support for advanced queries over time windows
Time-Series databases are built in two ways - as a standalone database or as an extension to an existing database
Time-series databases
InfluxDB (
https://www.influxdata.com/
)
TimescaleDB (
https://www.timescale.com/
)
OpenTSDB (
http://opentsdb.net/
)
Riak TS (
https://riak.com/
)
Amazon Timestream (
https://aws.amazon.com/timestream/
)
Prometheus (
https://prometheus.io/
)
InfluxDB
Open source, time-series database written in Go, developed and maintained by Influx Inc.
Uses InfluxQL, custom SQL-like query language, or Flux
Has support for aggregation functions over time-series data
Part of a popular TICK (Telegraf, Influx, Chronograf, Kapacitor) stack
Scalable and Highly Available thanks to support for clustering
InfluxDB Demo
TimescaleDB
Open source, time-series PostgreSQL extension written in C, developed and maintained by Timescale, Inc.
Uses SQL and is compatible with "native" PostgreSQL
Supports the same client libraries and CLI tools as PostgreSQL
Adds support for aggregation functions over time-series data
TimescaleDB Demo
Document-oriented databases
Also referred to as document stores, used for managing semi-structured data, often in form of JSON-like documents.
Schemaless, do not require predefined schemas which makes it perfect for storing dynamic, unstructured data
Often there's no need for object-relational mapping on application level
Advanced querying capabilities
Document-oriented databases
MongoDB (https://www.mongodb.com/)
Apache CouchDB (
https://couchdb.apache.org/
)
Azure Cosmos DB (
https://azure.microsoft.com/en-us/services/cosmos-db/
)
Elasticsearch (
https://www.elastic.co/
)
RethinkDB (
https://rethinkdb.com/
)
Couchbase (
https://www.couchbase.com/
)
MongoDB
General purpose, document oriented database, developed by MongoDB, Inc.
Licensed under Server Side Public License (SSPL)
JSON documents, queries also in form of JSON
Support for geospatial queries
Two types of relationships - reference and embedded
Horizontal scaling using sharding
MongoDB demo
Column-oriented databases
Database management systems that store data tables by column instead of by row
Often uses SQL or SQL-like language for querying
Aimed at workloads that consider columns (specific values) more than whole records (rows)
Very often used in analytical applications
Often under this term also fall wide-column databases, which are better interpreted as two-dimensional key-value stores
Column-oriented databases
Apache Cassandra
(
http://cassandra.apache.org/
)
Apache HBase (
https://hbase.apache.org/
)
Amazon Redshift (
https://aws.amazon.com/redshift/
)
Google BigTable (
https://cloud.google.com/bigtable/
)
ClickHouse (
https://clickhouse.yandex/
)
Scylla (
https://www.scylladb.com/
)
Apache Druid (
https://druid.apache.org/
)
Apache Cassandra
Open source, distributed, wide-column database, initially developed at Facebook, currently developed by Apache Software Foundation
Designed to scale both write and reads as more machines are added to cluster
Uses CQL (Cassandra Query Language)
Integrates with Hadoop
Automatically replicates data to multiple nodes to provide fault tolerance
Apache Cassandra Demo
Azure Time Series Insights
Cloud-based service that allows to ingest, model, query, and visualise time-series data
Columnar store for ingested data
Ingests data from Azure IoT Hub and Azure Event Hubs
Offers warm and cold storage tiers
Offers a query service for analytical querying of stored data
Offers a visualisation service
Azure Time Series Insights Demo
Quiz
Q&A + Contact
@p_grzesik
pj.grzesik@gmail.com
Made with Slides.com