200 Megabytes

70 Terabytes

200 rooms

What if we disable heaters?

Product for Dormitories

500 devices

Each car driving on the road will generate about as much data as about 3,000 people"

Intel CEO Brian Krzanich

500 GB of data

per 1 hour

Use case #1

Internet of Things

Use case #2

Monitoring

Use case #3

Financial Data

agenda

1. Time series databases

2. State of libraries in Elixir

3. Queries & Functions

time series databases

  • speed up IO read
  • fast and easy range queries
  • high number of writes

  • data compression

Factors of Time series databases

How many

time series databases

are there?

InFlux Db

  • Popularity
  • NoSQL
  • Tooling

timescale db

  • PostgreSQL
  • Ecto support
  • SQL

InfluxDB

field (not indexed)

tags (indexed)

measurement

time

2019-10-11T12:30:11+00:00

TIME SERIES

"speaker_heart_rates"

pulse="120"

conf="ElixirLite" city="BER"

TimescaleDB

hypertables

Chunks

Ideas from time series databases

  • CONTINOUS QUERY
  • DATA RETENTION POLICY

Continous Query

Data retention policy

INFLUX DB

Timescale

CREATE RETENTION POLICY
	"twenty_four_hours"
ON "measurements"
DURATION 24h
REPLICATION 1 DEFAULT
SELECT drop_chunks(interval
	'24 hours', 'measurements');

Influx DB

CREATE CONTINUOUS QUERY "continous_query"
  ON "demo_database"
BEGIN
  SELECT mean("temp")
  	INTO "average_temp"
    FROM "measurements"
    GROUP BY time(1h)
END

Timescale DB

CREATE VIEW ten_minutes_stats_points WITH (
        timescaledb.continuous,
        timescaledb.refresh_lag = '-10 minutes',
        timescaledb.refresh_interval = '5 minutes'
)
AS SELECT ROUND(AVG(value))::INTEGER AS average_value,
          TIME_BUCKET('10 minutes', inserted_at) AS time_interval,
          device_id
   FROM devices_states
   GROUP BY time_interval, device_id

State of libraries in ELixir

Influx DB

  • connection with the Influx database

  • connects to Influx via Http

  • authenticates to the database

  • writes with use of UDP

  • logging

  • querying the data with use of InfluxQL/Flux

Timescale Db

  • timescale_ecto

  • collection of helpful aggreagate functions

queries & functions

What is the average temperature in 5 minutes intervals?

Time_bucket

measurements

+----------------+---------------------------+----------------+
| :device_name   | :inserted_at              | :temperature   |
+----------------+---------------------------+----------------+
| "device_1"     | ~N[2019-10-05 19:27:12]   | 12             |
| "device_2"     | ~N[2019-10-05 19:27:12]   | 14             |
| "device_1"     | ~N[2019-10-05 19:22:02]   | 25             |
| "device_2"     | ~N[2019-10-05 19:22:02]   | 27             |
+----------------+---------------------------+----------------+
+------------+----------------------------------+
| :average   | :time_bucket                     |
+------------+----------------------------------+
| 13         | ~N[2019-10-05 19:25:00.000000]   |
| 26         | ~N[2019-10-05 19:20:00.000000]   |
+------------+----------------------------------+

Timescale db Time_bucket

from(state in State,
  select: %{
    time_bucket: time_bucket("5 minutes", state.inserted_at),
    average: type(avg(state.temperature), :integer)
  },
  group_by: time_bucket("5 minutes", state.inserted_at)
)

Influx DB - Time

"""
select mean(temperature)
from \"measurement\"
GROUP BY time(5m)
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")
%{
  results: [
    %{
      series: [
        %{
          columns: ["time", "mean"],
          name: "measurement",
          values: [
            ["2019-10-05T17:20:00Z", 26],
            ["2019-10-05T17:25:00Z", 13],
            ["2019-10-05T17:30:00Z", nil],
            ["2019-10-05T17:35:00Z", nil],
            [...],
            ...
          ]
        }
      ],
      statement_id: 0
    }
  ]
}

What is the last value of temperature recorded by the device?

Last

measurements

+----------------+---------------------------+----------------+
| :device_name   | :inserted_at              | :temperature   |
+----------------+---------------------------+----------------+
| "device_1"     | ~N[2019-10-06 10:22:17]   | 12             |
| "device_2"     | ~N[2019-10-06 10:22:17]   | 14             |
| "device_1"     | ~N[2019-10-06 10:17:07]   | 25             |
| "device_2"     | ~N[2019-10-06 10:17:07]   | 27             |
+----------------+---------------------------+----------------+
+----------------+---------------+
| :device_name   | :last_value   |
+----------------+---------------+
| "device_1"     | 12            |
| "device_2"     | 14            |
+----------------+---------------+

last

from(state in State,
  select: %{
    device_name: state.device_name,
    last_value: timescale_last(state.temperature, state.inserted_at)
  },
  group_by: state.device_name
)

last

"""
select last(temperature)
from \"measurement\"
GROUP BY device_name
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")
%{
  results: [
    %{
      series: [
        %{
          columns: ["time", "last"],
          name: "measurement",
          tags: %{device_name: "device_1"},
          values: [["2019-10-05T17:27:12Z", 12]]
        },
        %{
          columns: ["time", "last"],
          name: "measurement",
          tags: %{device_name: "device_2"},
          values: [["2019-10-05T17:27:12Z", 14]]
        }
      ],
      statement_id: 0
    }
  ]
}

How to handle lack of data from some sensors?

Interpolate

measurements

+----------------+---------------------------+----------------+
| :device_name   | :inserted_at              | :temperature   |
+----------------+---------------------------+----------------+
| "device_1"     | ~N[2019-10-06 03:30:49]   | 12             |
| "device_2"     | ~N[2019-10-06 03:30:49]   | 14             |
| "device_1"     | ~N[2019-10-06 03:15:39]   | 25             |
| "device_2"     | ~N[2019-10-06 03:15:39]   | 27             |
+----------------+---------------------------+----------------+

Interpolate

from(state in State,
  select: %{
    time_bucket: time_bucket_gapfill("5 minutes", state.inserted_at),
    average: type(avg(state.temperature), :integer),
    average_interpolated: interpolate(avg(state.temperature))
  },
  where: fragment("inserted_at > now () - interval '140 minutes'"),
  where: fragment("inserted_at < now ()"),
  group_by: time_bucket_gapfill("5 minutes", state.inserted_at)
)

Interpolate

+------------+-------------------------+----------------------------------+
| :average   | :average_interpolated   | :time_bucket                     |
+------------+-------------------------+----------------------------------+
| nil        | nil                     | ~N[2019-10-06 03:10:00.000000]   |
| 26         | 26.0                    | ~N[2019-10-06 03:15:00.000000]   |
| nil        | 21.666666666666668      | ~N[2019-10-06 03:20:00.000000]   |
| nil        | 17.333333333333332      | ~N[2019-10-06 03:25:00.000000]   |
| 13         | 13.0                    | ~N[2019-10-06 03:30:00.000000]   |
| nil        | nil                     | ~N[2019-10-06 03:35:00.000000]   |
| nil        | nil                     | ~N[2019-10-06 03:40:00.000000]   |

FILL

"""
select mean(temperature)
from "measurement"
GROUP BY time(5m)
fill(linear)
""" |> InfluxDemo.InfluxConnection.query(database: "mydb")

FILL

%{
  results: [
    %{
      series: [
        %{
          columns: ["time", "mean"],
          name: "measurement",
          values: [
            ["2019-10-05T17:20:00Z", 26],
            ["2019-10-05T17:25:00Z", 13],
            ["2019-10-05T17:30:00Z", 33.33333333333333],
            ["2019-10-05T17:35:00Z", 53.666666666666664],
            ["2019-10-05T17:40:00Z", 74],
            ["2019-10-05T17:45:00Z", nil],
            ["2019-10-05T17:50:00Z", nil],
            ["2019-10-05T17:55:00Z", nil],
            ...

7 billion Connected Devices

Links

deck

By Łukasz Jankowski