Piotr Grzesik
dr hab. inż. Dariusz Mrozek, prof. PŚ
Edge computing is a computing paradigm that brings the data processing and storage closer to a place where it is needed. It allows to reduce the volume of data that needs to be send over the Internet, allows to improve reaction time to the changing state of the system and improves resilience and allows for data loss prevention where Internet connection is not reliable or not available at all most of the time.
Jetson Nano
Source: https://www.nvidia.com/
Raspberry Pi 4
Source: https://www.raspberrypi.org/
Beaglebone Black
Source: https://beagleboard.org/
Coral Dev Board
Source: https://coral.ai/
Growth trend of various types of databases in the last 24 months according to DB-Engines.com
In his research, Bader et al., focused on open source time-series databases, examined 83 different solutions and presented a comparison of twelve selected databases, including InfluxDB, PostgreSQL, and OpenTSDB. All selected solutions were compared based on their scalability, functionality as well as licensing and support.
Wlodarczyk, in his article, provides an overview and comparison of four offerings, Chukwa, OpenTSDB, TempoDB and Squwk. The analysis focused on feature differences between selected technologies, without any performance benchmarks. The author identified OpenTSDB as a most popular choice for the time-series storage.
Pungila et al. compared the databases to use them in the system that stores large volumes of sensor data from smart meters. During the research, they compared three relational databases, SQLite3, MySQL, PostgreSQL, one time-series database, IBM Informix with DataBlade module, as well as three NoSQL databases, MonetDB, Hypertable and Oracle BerkeleyDB.
Fadhel et al. presented research concerning the evaluation of the databases for a low-cost water quality sensing system. Authors identified InfluxDB as the most suitable solution, listing the ease of installation and maintenance, support for multiple interface formats, and HTTP GUI as the deciding factors. In the second part of the research, they conducted performance experiments and determined that InfluxDB can handle the load from 450 sensors.
6LoWPAN-based sensor network, collecting measurements such as air quality and weather condition metrics
Edge device, that serves both as a 6LoWPAN gateway and as a database and analytical engine
Cloud-based service, long-term storage of aggregated metrics
During tests, sensor readings generator was used to allow for running tests multiple times in a short span of time
Data model used for the performance experiments
Each data point sent by the sensor consist of air quality metrics in the form of NO2 and dust particle size metrics PM2.5 and PM10. Besides, it also carries information about weather conditions such as ambient temperature, pressure, and humidity. Each reading is timestamped and tagged with the location of the sensor and the unique sensor identifier.
Versions of databases and client libraries used during experiments
Used data from 10 simulated sensors, where each sensor sends reading every 15 seconds over 24 hours, which results in 28800 data points
Insertion of data points, one-by-one and in batches of 10 points each batch
Insertion simulation ran 50 times (except for SQLite, where simulations were run 20 times due to relatively long simulation time)
Query for average temperature in the chosen period, grouped by location
Query for minimum and maximum values of NO2, PM2.5, and PM10 in the selected period, grouped by location
Query that counts data points grouped by sensor ID in the selected period, for which NO2 was larger than selected value and location was equal to a specific one
Each query was executed 5000 times
Number of data points ingested per second for each tested database
Tested aggregation query
Average query execution time
Tested aggregation query
Average query execution time
Tested aggregation query
Average query execution time
PostgreSQL emerges as the best performing solution for the evaluated workloads, with the exception for first evaluated query, where InfluxDB turned out to be more performant
Batching records for insertion causes performance gains, as high as 8.65 more data points ingested per second for InfluxDB
With the exception of Riak TS, all databases executed tested queries on average in less than 80 ms and the relative differences in performance for queries are not as high as in the case of insertion
DBMS popularity broken down by database model (accessed on Februrary 2nd, 2020), https://db-engines.com/en/ranking_categories
InfluxDB overview (accessed on Februrary 2nd, 2020), https://www.influxdata.com/products/influxdb-overview/
PostgreSQL documentation (accessed on January 9th, 2020), https://www.postgresql.org/about/
Raspberry Pi 4 datasheet (accessed on February 4th, 2020), https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/rpi_DATA_2711_1p0_preliminary.pdf
Riak TS datasheet (accessed on January 9th, 2020), https://riak.com/content/uploads/2016/05/Riak-Riak-TS-Datasheet.pdf
SQLite documentation (accessed on January 9th, 2020), https://www.sqlite.org/about.html