Bazy Danych w IoT
Piotr Grzesik
Agenda
- Characteristics of IoT data
- Characteristics of IoT data processing
- Important features of IoT databases
- TimescaleDB
- InfluxDB
- MongoDB
- Cassandra
- Azure Time Series Insights
Characteristics of IoT data
- usually timestamped
- low number of relationships between data
- usually metrics or events
- temporal ordering of records (time-series)
- high volume
- high frequency
- often low-level, raw data
- usually numerical values
Characteristics of IoT data processing
- high number of insert operations, often in batches
- very rare updates of individual records
- queries often perform aggregations over time
Important features for IoT databases
- Support for very fast write operations (also handling bursts of data)
- Support for fast reads and analytical queries
- Support for retention policies
- Support for downsampling
- Support for compression
Time-series databases
- Database management systems that are optimized to handle timestamped or time-series data
- Data is stored in form of measurements, events, or metrics, usually numerical
- Support for advanced queries over time windows
- Time-Series databases are built in two ways - as a standalone database or as an extension to an existing database
Time-series databases
- InfluxDB (https://www.influxdata.com/)
- TimescaleDB (https://www.timescale.com/)
- OpenTSDB (http://opentsdb.net/)
- Riak TS (https://riak.com/)
- Amazon Timestream (https://aws.amazon.com/timestream/)
- Prometheus (https://prometheus.io/)
InfluxDB
- Open source, time-series database written in Go, developed and maintained by Influx Inc.
- Uses InfluxQL, custom SQL-like query language, or Flux
- Has support for aggregation functions over time-series data
- Part of a popular TICK (Telegraf, Influx, Chronograf, Kapacitor) stack
- Scalable and Highly Available thanks to support for clustering
InfluxDB Demo
TimescaleDB
- Open source, time-series PostgreSQL extension written in C, developed and maintained by Timescale, Inc.
- Uses SQL and is compatible with "native" PostgreSQL
- Supports the same client libraries and CLI tools as PostgreSQL
- Adds support for aggregation functions over time-series data
TimescaleDB Demo
Document-oriented databases
- Also referred to as document stores, used for managing semi-structured data, often in form of JSON-like documents.
- Schemaless, do not require predefined schemas which makes it perfect for storing dynamic, unstructured data
- Often there's no need for object-relational mapping on application level
- Advanced querying capabilities
Document-oriented databases
- MongoDB (https://www.mongodb.com/)
- Apache CouchDB (https://couchdb.apache.org/)
- Azure Cosmos DB (https://azure.microsoft.com/en-us/services/cosmos-db/)
- Elasticsearch (https://www.elastic.co/)
- RethinkDB (https://rethinkdb.com/)
- Couchbase (https://www.couchbase.com/)
MongoDB
- General purpose, document oriented database, developed by MongoDB, Inc.
- Licensed under Server Side Public License (SSPL)
- JSON documents, queries also in form of JSON
- Support for geospatial queries
- Two types of relationships - reference and embedded
- Horizontal scaling using sharding
MongoDB demo
Column-oriented databases
- Database management systems that store data tables by column instead of by row
- Often uses SQL or SQL-like language for querying
- Aimed at workloads that consider columns (specific values) more than whole records (rows)
- Very often used in analytical applications
- Often under this term also fall wide-column databases, which are better interpreted as two-dimensional key-value stores
Column-oriented databases
- Apache Cassandra (http://cassandra.apache.org/)
- Apache HBase (https://hbase.apache.org/)
- Amazon Redshift (https://aws.amazon.com/redshift/)
- Google BigTable (https://cloud.google.com/bigtable/)
- ClickHouse (https://clickhouse.yandex/)
- Scylla (https://www.scylladb.com/)
- Apache Druid (https://druid.apache.org/)
Apache Cassandra
- Open source, distributed, wide-column database, initially developed at Facebook, currently developed by Apache Software Foundation
- Designed to scale both write and reads as more machines are added to cluster
- Uses CQL (Cassandra Query Language)
- Integrates with Hadoop
- Automatically replicates data to multiple nodes to provide fault tolerance
Apache Cassandra Demo
Azure Time Series Insights
- Cloud-based service that allows to ingest, model, query, and visualise time-series data
- Columnar store for ingested data
- Ingests data from Azure IoT Hub and Azure Event Hubs
- Offers warm and cold storage tiers
- Offers a query service for analytical querying of stored data
- Offers a visualisation service
Azure Time Series Insights Demo
Quiz
Q&A + Contact
@p_grzesik
pj.grzesik@gmail.com
Bazy Danych w IoT
By progressive
Bazy Danych w IoT
- 406