Downsampling Timescale Data with Continuous Aggregations

Timescale v1.3 was released recently and the major feature is continuous aggregations. The main use case for these aggregations is downsampling data: which has been brought up several times before, so it’s great that these issues have been addressed.

Downsampling data allows queries spanning a long time interval to complete in a reasonable time frame. I wrote and maintain OhmGraphite, which I have configured to send hardware sensor data to a Timescale instance every couple seconds. Querying one sensor over the course of a week will cause the db to scan over 600k values. This query will take 10 or so seconds to run on penny pinched hardware. A single grafana dashboard can be composed of dozens of queries, which will compound performance degradations. If one could downsample the data from a resolution of 2 seconds to 5 minutes, the db would scan 150x fewer values.

This is why I’m so excited about the new continuous aggregations feature. It’s starting to get to the point where Timescale is checking all the boxes for me. I’ve compared it against graphite before and not downsampling was a caveat preventing me from investing more time with Timescale. To be clear, one could have approximated downsampling prior to this release, by implementing multi-table approach + a system timer to compute aggregations. But in my opinion, downsampling should be a native feature of any time series database.

There are caveats with continuous aggregations but they are documented and we’ll explore them here as well. Here I hope to outline how I’ve already started using them.

So OhmGraphite writes to a table structure like so

CREATE TABLE IF NOT EXISTS ohm_stats (
   time TIMESTAMPTZ NOT NULL,
   host TEXT,
   hardware TEXT,
   hardware_type TEXT,
   identifier TEXT,
   sensor TEXT,
   sensor_type TEXT,
   sensor_index INT,
   value REAL
);

SELECT create_hypertable('ohm_stats', 'time', if_not_exists => TRUE);

Basically it’s time + sensor info + sensor value. Here is how we downsample to 5 minute intervals without losing sensor granularity.

CREATE VIEW ohm_stats_5m
WITH (timescaledb.continuous)
AS SELECT
  time_bucket('5m', time) as time,
  host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index,
  avg(value) as value_avg,
  max(value) as value_max,
  min(value) as value_min
FROM ohm_stats
GROUP BY 1, host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index;

Some notes:

The CREATE VIEW has a backing table that materializes the data and refreshes automatically (so it’s more akin to a materialized view in that it trades spaces for faster querying times).
While I used avg, max, and min, any aggregate function can be used like count, stddev_samp, sum
Every column in the GROUP BY has an index created for it automatically unless timescaledb.create_group_indexes is set to false. I’m still grappling with how one should decide on this option. One can always create indices later on the materialized data, but I’m unsure how removing indices effects the refresh performance.
drop_chunks on ohm_stats will delete data on ohm_stats_5m, but decoupling this behavior is slated for future work. This unfortunately means that continuous aggregations are far from “space saver” as now there needs to be room for additional tables and indices. I’ve commented before that graphite’s disk format is already lightweight compared to a postgres db, and continuous aggregations only exasperates the difference.
The continuous aggregations lag behind realtime data by a set amount, but as I’ll demonstrate this caveat doesn’t concern my use case.

The one area where we have to do some heavy lifting is our queries should be rewritten on the client side to utilize the continuous aggregations when a “large” time range is requested. A large time range would be when each data point is 5 or more minutes apart. In grafana, I changed CPU load calculation from:

SELECT
  $__timeGroupAlias("time", $__interval),
  max(value)
FROM ohm_stats
WHERE
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%'
GROUP BY 1
ORDER BY 1

(SELECT
  $__timeGroupAlias("time", $__interval),
  max(value)
FROM ohm_stats
WHERE
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%' AND
  '$__interval' < '5m'::interval
GROUP BY 1
ORDER BY 1)

UNION ALL

SELECT
  $__timeGroupAlias("time", $__interval),
  max(value_max)
FROM ohm_stats_5m
WHERE
  $__timeFilter("time") AND
  hardware_type = 'CPU' AND
  sensor_type = 'Load' AND
  sensor LIKE '%Total%' AND
  '$__interval' >= '5m'::interval
GROUP BY 1
ORDER BY 1

Some notes:

The first UNIONed query is exactly the same as the first query outside of '$__interval' < '5m'::interval, which will aid rewriting queries as they can be copied and pasted
The '$__interval' < '5m'::interval causes the first query to be a noop if the interval is 5 minutes or greater, so only the second query will compute.
More tables at a less granular time interval (think ohm_graphite_1d for year or multi-year reports), can be accomplished by just tacking more queries to be unioned.
Since the fine grained data (ohm_stats) is still present, one can zoom in on a past time period to see the high resolution data.
When the day comes when drop_chunks can keep around the continuous aggregations, then the query will need to always pull data from continuous aggregation if viewing data from more than 7 days ago (for example):

('$__interval' >= '5m'::interval OR
  (NOW() - $__timeFrom()) > '7 days'::interval)

Long range queries using the continuous aggregations complete instantly freeing up server resources

Conclusion

Positives:

Continuous aggregations are a welcome feature and beat downsampling by hand
Postgres has more aggregate functions than graphite (though simulating graphite’s last aggregation may require a tad of SQL finesse).
Since the continuous aggregations are loselessly downsampling data, one gets both the performance of aggregated data and fine-grained access when zoomed in.
Multiple aggregations can be used per downsample (graphite only allows users to choose one unless they break it out into multiple series).

More work is needed to make continuous aggregations a clear win-win

Cons:

Queries ideally shouldn’t be rewritten to seemlessly query the continuous aggregations (unsure if this can be solved without compromises)
More disk space needed to hold materialized data and indices in addition to the regular data. This will be partialy solved when drop_chunks allows one to selectively keep the continuous aggregation data past the policy period.
Some future continuous aggregation features may be enterprise only.

Result: I will continue porting all timescale queries to continuous aggregations. The feature isn’t perfect but I only expect it to get better with time.

Downsampling Timescale Data with Continuous Aggregations

Table of Contents

Conclusion

Comments