200 Megabytes
70 Terabytes
200 rooms
What if we disable heaters?
Product for Dormitories
500 devices
Each car driving on the road will generate about as much data as about 3,000 people"
Intel CEO Brian Krzanich
500 GB of data
per 1 hour
Use case #1
Internet of Things
Use case #2
Monitoring
Use case #3
Financial Data
agenda
1. Time series databases
2. State of libraries in Elixir
3. Queries & Functions
time series databases
- speed up IO read
- fast and easy range queries
-
high number of writes
-
data compression
Factors of Time series databases
How many
time series databases
are there?
InFlux Db
- Popularity
- NoSQL
- Tooling
timescale db
- PostgreSQL
- Ecto support
- SQL
InfluxDB
field (not indexed)
tags (indexed)
measurement
time
2019-10-11T12:30:11+00:00
TIME SERIES
"speaker_heart_rates"
pulse="120"
conf="ElixirLite" city="BER"
TimescaleDB
hypertables
Chunks
Ideas from time series databases
- CONTINOUS QUERY
- DATA RETENTION POLICY
Continous Query
Data retention policy
INFLUX DB
Timescale
CREATE RETENTION POLICY
"twenty_four_hours"
ON "measurements"
DURATION 24h
REPLICATION 1 DEFAULT
SELECT drop_chunks(interval
'24 hours', 'measurements');
Influx DB
CREATE CONTINUOUS QUERY "continous_query"
ON "demo_database"
BEGIN
SELECT mean("temp")
INTO "average_temp"
FROM "measurements"
GROUP BY time(1h)
END
Timescale DB
CREATE VIEW ten_minutes_stats_points WITH (
timescaledb.continuous,
timescaledb.refresh_lag = '-10 minutes',
timescaledb.refresh_interval = '5 minutes'
)
AS SELECT ROUND(AVG(value))::INTEGER AS average_value,
TIME_BUCKET('10 minutes', inserted_at) AS time_interval,
device_id
FROM devices_states
GROUP BY time_interval, device_id
State of libraries in ELixir
Influx DB
-
connection with the Influx database
-
connects to Influx via Http
-
authenticates to the database
-
writes with use of UDP
-
logging
-
querying the data with use of InfluxQL/Flux
Timescale Db
-
timescale_ecto
-
collection of helpful aggreagate functions
queries & functions
What is the average temperature in 5 minutes intervals?
Time_bucket
measurements
+----------------+---------------------------+----------------+
| :device_name | :inserted_at | :temperature |
+----------------+---------------------------+----------------+
| "device_1" | ~N[2019-10-05 19:27:12] | 12 |
| "device_2" | ~N[2019-10-05 19:27:12] | 14 |
| "device_1" | ~N[2019-10-05 19:22:02] | 25 |
| "device_2" | ~N[2019-10-05 19:22:02] | 27 |
+----------------+---------------------------+----------------+
+------------+----------------------------------+
| :average | :time_bucket |
+------------+----------------------------------+
| 13 | ~N[2019-10-05 19:25:00.000000] |
| 26 | ~N[2019-10-05 19:20:00.000000] |
+------------+----------------------------------+
Timescale db Time_bucket
from(state in State,
select: %{
time_bucket: time_bucket("5 minutes", state.inserted_at),
average: type(avg(state.temperature), :integer)
},
group_by: time_bucket("5 minutes", state.inserted_at)
)
Influx DB - Time
"""
select mean(temperature)
from \"measurement\"
GROUP BY time(5m)
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")
%{
results: [
%{
series: [
%{
columns: ["time", "mean"],
name: "measurement",
values: [
["2019-10-05T17:20:00Z", 26],
["2019-10-05T17:25:00Z", 13],
["2019-10-05T17:30:00Z", nil],
["2019-10-05T17:35:00Z", nil],
[...],
...
]
}
],
statement_id: 0
}
]
}
What is the last value of temperature recorded by the device?
Last
measurements
+----------------+---------------------------+----------------+
| :device_name | :inserted_at | :temperature |
+----------------+---------------------------+----------------+
| "device_1" | ~N[2019-10-06 10:22:17] | 12 |
| "device_2" | ~N[2019-10-06 10:22:17] | 14 |
| "device_1" | ~N[2019-10-06 10:17:07] | 25 |
| "device_2" | ~N[2019-10-06 10:17:07] | 27 |
+----------------+---------------------------+----------------+
+----------------+---------------+
| :device_name | :last_value |
+----------------+---------------+
| "device_1" | 12 |
| "device_2" | 14 |
+----------------+---------------+
last
from(state in State,
select: %{
device_name: state.device_name,
last_value: timescale_last(state.temperature, state.inserted_at)
},
group_by: state.device_name
)
last
"""
select last(temperature)
from \"measurement\"
GROUP BY device_name
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")
%{
results: [
%{
series: [
%{
columns: ["time", "last"],
name: "measurement",
tags: %{device_name: "device_1"},
values: [["2019-10-05T17:27:12Z", 12]]
},
%{
columns: ["time", "last"],
name: "measurement",
tags: %{device_name: "device_2"},
values: [["2019-10-05T17:27:12Z", 14]]
}
],
statement_id: 0
}
]
}
How to handle lack of data from some sensors?
Interpolate
measurements
+----------------+---------------------------+----------------+
| :device_name | :inserted_at | :temperature |
+----------------+---------------------------+----------------+
| "device_1" | ~N[2019-10-06 03:30:49] | 12 |
| "device_2" | ~N[2019-10-06 03:30:49] | 14 |
| "device_1" | ~N[2019-10-06 03:15:39] | 25 |
| "device_2" | ~N[2019-10-06 03:15:39] | 27 |
+----------------+---------------------------+----------------+
Interpolate
from(state in State,
select: %{
time_bucket: time_bucket_gapfill("5 minutes", state.inserted_at),
average: type(avg(state.temperature), :integer),
average_interpolated: interpolate(avg(state.temperature))
},
where: fragment("inserted_at > now () - interval '140 minutes'"),
where: fragment("inserted_at < now ()"),
group_by: time_bucket_gapfill("5 minutes", state.inserted_at)
)
Interpolate
+------------+-------------------------+----------------------------------+
| :average | :average_interpolated | :time_bucket |
+------------+-------------------------+----------------------------------+
| nil | nil | ~N[2019-10-06 03:10:00.000000] |
| 26 | 26.0 | ~N[2019-10-06 03:15:00.000000] |
| nil | 21.666666666666668 | ~N[2019-10-06 03:20:00.000000] |
| nil | 17.333333333333332 | ~N[2019-10-06 03:25:00.000000] |
| 13 | 13.0 | ~N[2019-10-06 03:30:00.000000] |
| nil | nil | ~N[2019-10-06 03:35:00.000000] |
| nil | nil | ~N[2019-10-06 03:40:00.000000] |
FILL
"""
select mean(temperature)
from "measurement"
GROUP BY time(5m)
fill(linear)
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")
FILL
%{
results: [
%{
series: [
%{
columns: ["time", "mean"],
name: "measurement",
values: [
["2019-10-05T17:20:00Z", 26],
["2019-10-05T17:25:00Z", 13],
["2019-10-05T17:30:00Z", 33.33333333333333],
["2019-10-05T17:35:00Z", 53.666666666666664],
["2019-10-05T17:40:00Z", 74],
["2019-10-05T17:45:00Z", nil],
["2019-10-05T17:50:00Z", nil],
["2019-10-05T17:55:00Z", nil],
...
7 billion Connected Devices
Links
deck
By Łukasz Jankowski
deck
- 152