ElasticSearch vs Druid

Comparing two time series databases

about:nfroidure

Technical Leader @DiagRAMS

Building softwares for Lille's innovative startups.

Living in the Wayne's World.

Almost passed the Rust compiler checks once.

Blog & Tweet randomly

Time Series

Series of measures representing the evolution of a value overtime

Main steps

Receive/pull data (FTP/HTTP/web hooks)
Enrich data (add labels for later aggregation)
Stream the data to the time series db (queueing)
Index the data
Backup raw / enriched data

Challenges

managing an ever growing dataset ("big data")
reduce data loss to its minimum
keep good query performances for most common queries/aggregations (cold/warm, resampling, archiving...)
be able to reingest the data with schema changes
reduce ingestion/indexing time
ensure ingestion reliability (message queueing)
ensure system availability (monitoring)

Distributed queries steps

Filter: select the data points
Map : operate on each data point
Reduce : compute a reduced result for each node
Combine : combine every nodes results

Druid

Open-source project by the Apache Software Foundation

Pros

Dedicated for time series
Open-source
Cheap ?
Fast ingestion
UI with ingestion overview

Cons

Not managed
Cluster need several processes
Relatively new
Less complete docs
No security out of the box

Elastic Search

Mature, feature complete open-source

project

Pros

Good performances
Feature complete
Lots of managed offers
Very good documentation
Kibana (nice datavisualization)
Out of the box security

Cons

Expensive
Not only for time series
Less efficient

Benchmarking

Working with 13G (73M points) of data on a dockerized local environment

Ingestion

3h50 with ES (bulk queries via JS) :
data is available while ingesting
2.43Go for indice size
20 minutes with Druid (local source) :
data available only when ingestion ends
504Mo of deep storage used

Heavy aggregation

Counting each sensors data points

Druid

# X results in 0.13s to 0.05s
SELECT "sensor", "name", SUM("count") as "count"
FROM "raw_data" GROUP BY "sensor", "name"

Elastic Search

# 120 buckets in 21.6488s to 18.571s
GET /diagrams-data-v1-2020.*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 1,
  "aggs": {
    "sensors": {
      "terms": {
        "field": "sensorId",
        "size": 200
      },
      "aggs": {
        "count": {
          "value_count": {
            "field": "value"
          }
        }
      }
    }
  }
}

One sensor for one day grouped hourly

Druid

# 10 results in 0.28s to 0.10s
SELECT FLOOR("__time" to HOUR) AS HourTime,
  "sensor",  "name",  SUM("count") as "count",
  AVG("value") as "value"
FROM "raw_data"
WHERE "__time" BETWEEN
  TIMESTAMP '2020-09-15 00:00:00'
  AND TIMESTAMP '2020-09-16 00:00:00'
  AND "sensor" = '04c140cd-71b8-470f-97ac-6901007c9176'
GROUP BY 1, "sensor", "name"

Elastic Search

# 10 buckets in 0.234s to 0.070s

GET /diagrams-data-v1-2020.*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "sensorId": {
              "value": "04c140cd-71b8-470f-97ac-6901007c9176"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2020-09-15T00:00:00Z",
              "lte": "2020-09-16T00:00:00Z",
              "format": "strict_date_optional_time_nanos"
            }
          }
        }
      ]
    }
  },
  "size": 1,
  "aggs": {
    "data": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "hour"
      }
    }
  }
}

One sensor for one day grouped per minutes

Druid

# 577 results in 0.17s to 0.13s
SELECT FLOOR("__time" to MINUTE) AS MinuteTime,
  "sensor",  "name",  SUM("count") as "count",
  AVG("value") as "value"
FROM "raw_data"
WHERE "__time" BETWEEN
  TIMESTAMP '2020-09-15 00:00:00'
  AND TIMESTAMP '2020-09-16 00:00:00'
  AND "sensor" = '04c140cd-71b8-470f-97ac-6901007c9176'
GROUP BY 1, "sensor", "name"

Elastic Search

# 577 buckets in 0.146 to 0.102s

GET /diagrams-data-v1-2020.*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "sensorId": {
              "value": "04c140cd-71b8-470f-97ac-6901007c9176"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2020-09-15T00:00:00Z",
              "lte": "2020-09-16T00:00:00Z",
              "format": "strict_date_optional_time_nanos"
            }
          }
        }
      ]
    }
  },
  "size": 1,
  "aggs": {
    "data": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "minute"
      }
    }
  }
}

All sensors for one day grouped per minutes

Druid

# 97,823 results in 3.02s
SELECT FLOOR("__time" to MINUTE) AS MinuteTime,
  "sensor",  "name",  SUM("count") as "count",
  AVG("value") as "value"
FROM "raw_data"
WHERE "__time" BETWEEN
  TIMESTAMP '2020-09-15 00:00:00'
  AND TIMESTAMP '2020-09-16 00:00:00'
GROUP BY 1, "sensor", "name"

Elastic Search

# too_many_buckets_exception
GET /diagrams-data-v1-2020.*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": "2020-09-15T00:00:00Z",
              "lte": "2020-09-16T00:00:00Z",
              "format": "strict_date_optional_time_nanos"
            }
          }
        }
      ]
    }
  },
  "size": 1,
  "aggs": {
    "data": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "minute"
      },
      "aggs": {
        "sensors": {
          "terms": {
            "field": "sensorId",
            "size": 300
          }
        }
      }
    }
  }
}

Thanks! Questions?

ElasticSearch vs Druid for Time Series

By Nicolas FROIDURE

ElasticSearch vs Druid for Time Series

A little comparison between those two options for Time Series

Nicolas FROIDURE

Web Platform Architect

ElasticSearch vs Druid

about:nfroidure

Time Series

Main steps

Challenges

Distributed queries steps

Druid

Pros

Cons

Elastic Search

Pros

Cons

Benchmarking

Ingestion

Heavy aggregation

Druid

Elastic Search

One sensor for one day grouped hourly

Druid

Elastic Search

One sensor for one day grouped per minutes

Druid

Elastic Search

All sensors for one day grouped per minutes

Druid

Elastic Search

Thanks! Questions?

ElasticSearch vs Druid for Time Series

More from Nicolas FROIDURE