[collectd] Proposition: Holt-Winters Forecasting

Sergiusz Pawlowicz sergiusz at pawlowicz.name
Wed Jun 27 13:49:48 CEST 2007


Hello,
It looks that collectd lacks interesting possibilities of new
specialized functions that enable RRDtool to provide data smoothing
via the Holt-Winters forecasting algorithm.
This forecasting database enables also confidence bands and flagging
aberrant behavior in the data source time series.

My proposition is to add them using such a config syntax in plugin
rrdtool section:

<Plugin rrdtool>
   Forecast "On|Semi|Off"
   ForecastRows 21777
   ForecastAlpha 0.1
   ForecastBeta 0.0035
   ForecastSeason 3111
</Plugin rrdtool>

Legend:
---------

Forecast "On" - Holt-Winters Forecasting is added by default
to all plugins but could be switched off explicitly in module definition.

Forecast "Semi" - Holt-Winters Forecasting is added only
to explicitly defined modules

Forecast "Off" - Holt-Winters Forecasting is globally switched off.

ForecastRows <value> - specifies the length of the RRA prior to wrap around.

ForecastSeason <value> - specifies the number of primary data points
in a seasonal cycle.

ForecastAlpha  <value> - is the adaption parameter of the intercept
(or baseline) coefficient in the Holt-Winters forecasting algorithm.
ForecastAlpha value
must lie between 0 and 1. A value closer to 1 means that more recent
observations
carry greater weight in predicting the baseline component of the forecast.
A value closer to 0 means that past history carries greater weight in
predicting the baseline component.

ForecastBeta <value> - is the adaption parameter of the slope (or
linear trend) coefficient in the Holt-Winters forecasting algorithm.
ForecastBeta value must lie between 0 and 1 and plays the same role as
ForecastAlpha with respect to the predicted linear trend.

Each module can define it's own values for forecasting, with the same syntax
as default in rrdtool definition, but of course without Semi value.

In my example values specified in config:

- The seasonal cycle is one day ( 3111 data points at 10 second intervals),
- The RRD file will store 7 days ( 21777 data ponts) of forecasts and
deviation predictions before wrap around,
- The forecasting algorithm baseline adapts quickly; in fact the most
recent one hour of observations (each at 10 second intervals) accounts
for 75% of the baseline prediction,
- The linear trend forecast adapts much more slowly. Observations made
during the last day (at 3111 observations per day) account for only
65% of the predicted linear trend.

I'd like to express, that these values are currently not tested in
production environment, and I reserve rights to change these suggested
values after
some time of my observations on my server farm.

Cheers,
Sergio



More information about the collectd mailing list