Bazy Danych w IoT
Piotr Grzesik
Agenda
- Characteristics of IoT data
 - Characteristics of IoT data processing
 - Important features of IoT databases
 - TimescaleDB
 - InfluxDB
 - MongoDB
 - Cassandra
 - Azure Time Series Insights
 
Characteristics of IoT data
- usually timestamped
 - low number of relationships between data
 - usually metrics or events
 - temporal ordering of records (time-series)
 - high volume
 - high frequency
 - often low-level, raw data
 - usually numerical values
 
Characteristics of IoT data processing
- high number of insert operations, often in batches
 - very rare updates of individual records
 - queries often perform aggregations over time
 
Important features for IoT databases
- Support for very fast write operations (also handling bursts of data)
 - Support for fast reads and analytical queries
 - Support for retention policies
 - Support for downsampling
 - Support for compression
 
Time-series databases
- Database management systems that are optimized to handle timestamped or time-series data
 - Data is stored in form of measurements, events, or metrics, usually numerical
 - Support for advanced queries over time windows
 - Time-Series databases are built in two ways - as a standalone database or as an extension to an existing database
 
Time-series databases
- InfluxDB (https://www.influxdata.com/)
 - TimescaleDB (https://www.timescale.com/)
 - OpenTSDB (http://opentsdb.net/)
 - Riak TS (https://riak.com/)
 - Amazon Timestream (https://aws.amazon.com/timestream/)
 - Prometheus (https://prometheus.io/)
 
InfluxDB
- Open source, time-series database written in Go, developed and maintained by Influx Inc.
 - Uses InfluxQL, custom SQL-like query language, or Flux
 - Has support for aggregation functions over time-series data
 - Part of a popular TICK (Telegraf, Influx, Chronograf, Kapacitor) stack
 - Scalable and Highly Available thanks to support for clustering
 
InfluxDB Demo
TimescaleDB
- Open source, time-series PostgreSQL extension written in C, developed and maintained by Timescale, Inc.
 - Uses SQL and is compatible with "native" PostgreSQL
 - Supports the same client libraries and CLI tools as PostgreSQL
 - Adds support for aggregation functions over time-series data
 
TimescaleDB Demo
Document-oriented databases
- Also referred to as document stores, used for managing semi-structured data, often in form of JSON-like documents.
 - Schemaless, do not require predefined schemas which makes it perfect for storing dynamic, unstructured data
 - Often there's no need for object-relational mapping on application level
 - Advanced querying capabilities
 
Document-oriented databases
- MongoDB (https://www.mongodb.com/)
 - Apache CouchDB (https://couchdb.apache.org/)
 - Azure Cosmos DB (https://azure.microsoft.com/en-us/services/cosmos-db/)
 - Elasticsearch (https://www.elastic.co/)
 - RethinkDB (https://rethinkdb.com/)
 - Couchbase (https://www.couchbase.com/)
 
MongoDB
- General purpose, document oriented database, developed by MongoDB, Inc.
 - Licensed under Server Side Public License (SSPL)
 - JSON documents, queries also in form of JSON
 - Support for geospatial queries
 - Two types of relationships - reference and embedded
 - Horizontal scaling using sharding
 
MongoDB demo
Column-oriented databases
- Database management systems that store data tables by column instead of by row
 - Often uses SQL or SQL-like language for querying
 - Aimed at workloads that consider columns (specific values) more than whole records (rows)
 - Very often used in analytical applications
 - Often under this term also fall wide-column databases, which are better interpreted as two-dimensional key-value stores
 
Column-oriented databases
- Apache Cassandra (http://cassandra.apache.org/)
 - Apache HBase (https://hbase.apache.org/)
 - Amazon Redshift (https://aws.amazon.com/redshift/)
 - Google BigTable (https://cloud.google.com/bigtable/)
 - ClickHouse (https://clickhouse.yandex/)
 - Scylla (https://www.scylladb.com/)
 - Apache Druid (https://druid.apache.org/)
 
Apache Cassandra
- Open source, distributed, wide-column database, initially developed at Facebook, currently developed by Apache Software Foundation
 - Designed to scale both write and reads as more machines are added to cluster
 - Uses CQL (Cassandra Query Language)
 - Integrates with Hadoop
 - Automatically replicates data to multiple nodes to provide fault tolerance
 
Apache Cassandra Demo
Azure Time Series Insights
- Cloud-based service that allows to ingest, model, query, and visualise time-series data
 - Columnar store for ingested data
 - Ingests data from Azure IoT Hub and Azure Event Hubs
 - Offers warm and cold storage tiers
 - Offers a query service for analytical querying of stored data
 - Offers a visualisation service
 
Azure Time Series Insights Demo
Quiz
Q&A + Contact
@p_grzesik
pj.grzesik@gmail.com
Bazy Danych w IoT
By progressive
Bazy Danych w IoT
- 600