Downsampling Timescale Data with Continuous AggregationsPublished on:
Timescale v1.3 was released recently and the major feature is continuous aggregations. The main use case for these aggregations is downsampling data: which has been brought up several times before, so it’s great that these issues have been addressed.
Downsampling data allows queries spanning a long time interval to complete in a reasonable time frame. I wrote and maintain OhmGraphite, which I have configured to send hardware sensor data to a Timescale instance every couple seconds. Querying one sensor over the course of a week will cause the db to scan over 600k values. This query will take 10 or so seconds to run on penny pinched hardware. A single grafana dashboard can be composed of dozens of queries, which will compound performance degradations. If one could downsample the data from a resolution of 2 seconds to 5 minutes, the db would scan 150x fewer values.
This is why I’m so excited about the new continuous aggregations feature. It’s starting to get to the point where Timescale is checking all the boxes for me. I’ve compared it against graphite before and not downsampling was a caveat preventing me from investing more time with Timescale. To be clear, one could have approximated downsampling prior to this release, by implementing multi-table approach + a system timer to compute aggregations. But in my opinion, downsampling should be a native feature of any time series database.
There are caveats with continuous aggregations but they are documented and we’ll explore them here as well. Here I hope to outline how I’ve already started using them.
So OhmGraphite writes to a table structure like so
CREATE TABLE IF NOT EXISTS ohm_stats ( time TIMESTAMPTZ NOT NULL, host TEXT, hardware TEXT, hardware_type TEXT, identifier TEXT, sensor TEXT, sensor_type TEXT, sensor_index INT, value REAL ); SELECT create_hypertable('ohm_stats', 'time', if_not_exists => TRUE);
Basically it’s time + sensor info + sensor value. Here is how we downsample to 5 minute intervals without losing sensor granularity.
CREATE VIEW ohm_stats_5m WITH (timescaledb.continuous) AS SELECT time_bucket('5m', time) as time, host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index, avg(value) as value_avg, max(value) as value_max, min(value) as value_min FROM ohm_stats GROUP BY 1, host, hardware, hardware_type, identifier, sensor, sensor_type, sensor_index;
CREATE VIEWhas a backing table that materializes the data and refreshes automatically (so it’s more akin to a materialized view in that it trades spaces for faster querying times).
- While I used avg, max, and min, any aggregate function can be used like
- Every column in the
GROUP BYhas an index created for it automatically unless
timescaledb.create_group_indexesis set to
false. I’m still grappling with how one should decide on this option. One can always create indices later on the materialized data, but I’m unsure how removing indices effects the refresh performance.
ohm_statswill delete data on
ohm_stats_5m, but decoupling this behavior is slated for future work. This unfortunately means that continuous aggregations are far from “space saver” as now there needs to be room for additional tables and indices. I’ve commented before that graphite’s disk format is already lightweight compared to a postgres db, and continuous aggregations only exasperates the difference.
- The continuous aggregations lag behind realtime data by a set amount, but as I’ll demonstrate this caveat doesn’t concern my use case.
The one area where we have to do some heavy lifting is our queries should be rewritten on the client side to utilize the continuous aggregations when a “large” time range is requested. A large time range would be when each data point is 5 or more minutes apart. In grafana, I changed CPU load calculation from:
SELECT $__timeGroupAlias("time", $__interval), max(value) FROM ohm_stats WHERE $__timeFilter("time") AND hardware_type = 'CPU' AND sensor_type = 'Load' AND sensor LIKE '%Total%' GROUP BY 1 ORDER BY 1
(SELECT $__timeGroupAlias("time", $__interval), max(value) FROM ohm_stats WHERE $__timeFilter("time") AND hardware_type = 'CPU' AND sensor_type = 'Load' AND sensor LIKE '%Total%' AND '$__interval' < '5m'::interval GROUP BY 1 ORDER BY 1) UNION ALL SELECT $__timeGroupAlias("time", $__interval), max(value_max) FROM ohm_stats_5m WHERE $__timeFilter("time") AND hardware_type = 'CPU' AND sensor_type = 'Load' AND sensor LIKE '%Total%' AND '$__interval' >= '5m'::interval GROUP BY 1 ORDER BY 1
- The first UNIONed query is exactly the same as the first query outside of
'$__interval' < '5m'::interval, which will aid rewriting queries as they can be copied and pasted
'$__interval' < '5m'::intervalcauses the first query to be a noop if the interval is 5 minutes or greater, so only the second query will compute.
- More tables at a less granular time interval (think
ohm_graphite_1dfor year or multi-year reports), can be accomplished by just tacking more queries to be unioned.
- Since the fine grained data (
ohm_stats) is still present, one can zoom in on a past time period to see the high resolution data.
- When the day comes when
drop_chunkscan keep around the continuous aggregations, then the query will need to always pull data from continuous aggregation if viewing data from more than 7 days ago (for example):
('$__interval' >= '5m'::interval OR (NOW() - $__timeFrom()) > '7 days'::interval)
- Long range queries using the continuous aggregations complete instantly freeing up server resources
- Continuous aggregations are a welcome feature and beat downsampling by hand
- Postgres has more aggregate functions than graphite (though simulating graphite’s
lastaggregation may require a tad of SQL finesse).
- Since the continuous aggregations are loselessly downsampling data, one gets both the performance of aggregated data and fine-grained access when zoomed in.
- Multiple aggregations can be used per downsample (graphite only allows users to choose one unless they break it out into multiple series).
More work is needed to make continuous aggregations a clear win-win
- Queries ideally shouldn’t be rewritten to seemlessly query the continuous aggregations (unsure if this can be solved without compromises)
- More disk space needed to hold materialized data and indices in addition to the regular data. This will be partialy solved when
drop_chunksallows one to selectively keep the continuous aggregation data past the policy period.
- Some future continuous aggregation features may be enterprise only.
Result: I will continue porting all timescale queries to continuous aggregations. The feature isn’t perfect but I only expect it to get better with time.
If you'd like to leave a comment, please email email@example.com