TimescaleDB

Stéphane Fréchette

twitter: @sfrechette | blog: stephanefrechette.com

Time-series Data Is Everywhere!

TimescaleDB

What is TimescaleDB?

TimescaleDB is an open-source time-series database optimized for fast ingest and complex queries. It speaks "full SQL" and is correspondingly easy to use like a traditional relational database, yet scales in ways previously reserved for NoSQL databases.

TimescaleDB

Easy to Use

  • Full SQL interface for all SQL natively supported by PostgreSQL (including secondary indexes, non-time based aggregates, sub-queries, JOINs, window functions).
  • Connects to any client or tool that speaks PostgreSQL, no changes needed.
  • Time-oriented features, API functions, and optimizations.
  • Robust support for Data retention policies.

TimescaleDB

Scalable

  • Transparent time/space partitioning for both scaling up (single node) and scaling out (forthcoming).
  • High data write rates (including batched commits, in-memory indexes, transactional support, support for data backfill).
  • Right-sized chunks (two-dimensional data partitions) on single nodes to ensure fast ingest even at large data sizes.
  • Parallelized operations across chunks and servers.

TimescaleDB

Reliable

  • Engineered up from PostgreSQL, packaged as an extension.
  • Proven foundations benefiting from 20+ years of PostgreSQL research (including streaming replication, backups).
  • Flexible management options (compatible with existing PostgreSQL ecosystem and tooling).

TimescaleDB

What is Time-series data?

"Time-series data is data that collectively represents how a system, process, or behaviour changes over time."

 

TimescaleDB

Data Model

TimescaleDB utilizes a "wide-table" data model, which is quite common in the world of relational databases. This makes Timescale somewhat different than most other time-series databases, which typically use a "narrow-table" model.

TimescaleDB

Architecture & Concepts

TimescaleDB is implemented as an extension on PostgreSQL, which means that a Timescale database runs within an overall PostgreSQL instance. The extension model allows the database to take advantage of many of the attributes of PostgreSQL such as reliability, security, and connectivity to a wide range of third-party tools.

TimescaleDB

Architecture & Concepts

TimescaleDB exposes what look like singular tables, called hypertables, that are actually an abstraction or a virtual view of many individual tables holding the data, called chunks.

TimescaleDB

Hypertables

  • The primary point of interaction with your data is a hypertable, the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL.
  • Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable.
  • A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value, and one (optional) column specifying an additional partitioning key.

TimescaleDB

Chunks

  • Internally, TimescaleDB automatically splits each hypertable into chunks, with each chunk corresponding to a specific time interval and a region of the partition key’s space (using hashing). These partitions are disjoint (non-overlapping), which helps the query planner to minimize the set of chunks it must touch to resolve a query.
  • Each chunk is implemented using a standard database table. (In PostgreSQL internals, the chunk is actually a "child table" of the "parent" hypertable.)
  • Chunks are right-sized, ensuring that all of the B-trees for a table’s indexes can reside in memory during inserts.

TimescaleDB

Single Node vs. Clustering

The current open-source release of TimescaleDB only supports single-node deployments. Of note is that the single-node version of TimescaleDB has been benchmarked to over 10-billion-row hypertables on commodity machines without a loss in insert performance.

 

*clustered deployments (in development)

TimescaleDB

Comparison: Postgres

TimescaleDB offers three key benefits over vanilla PostgreSQL or other traditional RDBMSs for storing time-series data:

  1. Much higher data ingest rates, especially at larger database sizes.
  2. Query performance ranging from equivalent to orders of magnitude greater.
  3. Time-oriented features.

TimescaleDB

Much Higher Ingest Rates

TimescaleDB

Time-oriented Analytics

TimescaleDB includes new functions for time-oriented analytics, including some of the following:

  • Time bucketing: A more powerful version of the standard date_trunc function, it allows for arbitrary time intervals (e.g., 5 minutes, 6 hours, etc.), as well as flexible groupings and offsets, instead of just second, minute, hour, etc.

TimescaleDB

SELECT time_bucket('3 hours', time) AS period
    asset_code,
    first(price, time) AS opening, last(price, time) AS closing,
    max(price) AS high, min(price) AS low
  FROM prices
  WHERE time > NOW() - interval '7 days'
  GROUP BY period, asset_code
  ORDER BY period DESC, asset_code;

Time-oriented Analytics

Last and first aggregates: These functions allow you to get the value of one column as ordered by another. For example, last(temperature, time) will return the latest temperature value based on time within a group (e.g., an hour).

TimescaleDB

SELECT time_bucket('1 day', time) AS day,
    asset_code,
    last(price, time_recorded)
  FROM prices
  WHERE time > '2017-01-01'
  GROUP BY day, asset_code
  ORDER BY day DESC, asset_code;

Time-oriented Data Management

TimescaleDB allows efficient deletion of old data at the chunk level, rather than at the row level, via its drop_chunks functionality.

TimescaleDB

SELECT drop_chunks(interval '7 days', 'conditions');

Demo

TimescaleDB

Summary

TimescaleDB

Why Use TimescaleDB over NoSQL?

Compared to general NoSQL databases (e.g., MongoDB, Cassandra) or even more specialized time-oriented ones (e.g., InfluxDB, KairosDB), TimescaleDB provides both qualitative and quantitative differences:

  • Normal SQL
  • Operational simplicity
  • JOINs 
  • Query performance
  • Manage like PostgreSQL
  • Native support for geospatial data
  • Third-party tools

 

TimescaleDB

When Not to Use TimescaleDB?

If any of the following is true, you might not want to use TimescaleDB:

  • Simple read requirements
  • Very sparse or unstructured data
  • Heavy compression is a priority
  • Infrequent or offline analysis

 

TimescaleDB

Resources

TimescaleDB

TimescaleDB

By Stéphane Fréchette

TimescaleDB

TimescaleDB [April 2018]

  • 2,534