[collectd] fix to create RRDs correctly

Florian Forster octo at verplant.org
Tue Feb 19 23:41:45 CET 2008

Hi Thorsten,

On Mon, Feb 18, 2008 at 12:13:12PM -0800, Thorsten von Eicken wrote:
> The problem is independent of specifying timespans explicitly. The
> docs/comments say that by default collectd creates 5 RRAs, in fact it
> only creates 4.

oh, of course I'll update the documentation, but I assume that's not the
central point here.

> The first one is >8000 points, which is way more than the 1200
> requested and wastes a *lot* of space.

Well, it enforces that (a) the first RRA doesn't consolidate data points
and (b) the RRA has a length of at least 1200 data points.

> Basically it doubles the size of the RRD files. I'll try your patch
> (can't do it at the moment), but you should check that the default
> result is "reasonable" and matches the intent expressed in the
> comments.

The idea behind this concept was that the interval at which data is
collected is unknown. This is often the case for data received via the
network plugin or data collected by the exec plugin. What one does know
is the graphs one expects, typically one hour, day, week, month and
year. The logic should design RRA so that for each a timespan a graph of
the desired width can be drawn.

A problem arises when there are multiple timespans where
 step * #datapoints < timespan.                                      (1)
Assume there are 3600, 7200, and 14400 seconds timespans configured. You
could calculate the number of consolidated datapoints and set if to 1 if
it's <1, then calculate the number of datapoints needed. So for this
example you'd end up with:
- RRA0:  1 datapoints consolidated,  360 `rows'
- RRA1:  1 datapoints consolidated,  720 `rows'
- RRA2: 12 datapoints consolidated, 1200 `rows'
Obviously RRA0 is redundant.

We might ignore all timespans where (1) is true except the one with the
longest timespan and then calculate the langth as described above. Would
that be a better solution in your opinion?

> Perhaps the target timespans are not realistic. Here's what I use,
> note the 20 second stepsize and that I'm targeting 600 pixel wide
> graphs:

The timespans offer a very graph-centric view on the RRD files and, in
my experience, the chosen are the ones most often used. Sometimes
two-days, quarterly or two-year graphs are used, too, but as far as I
can judge they are far less common.

> I produce graphs for "now", "day", "week", "month", "quarter", and
> "year", which is what is reflected in the above timespans. The RRDs I
> generate are 1/3 the size of the default ones.

Yes, chosing the timespans to match your interval can result in much
smaller files. But the logic is designed to work without knowing the
Tuning the default parameters may be possible, for example by changing
the one-hour timespan to a four-hour timespan, but this will only work
with the default interval. As soon as there's a datasource with a 20
second interval you'll have the same effect again.
Using the argumentation, that people who change default parameters
should know what they are doing, this would be an acceptable change,
too. And it would decrease the file-size of files created by default
parameters, too. Thoughts on this?

Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20080219/b9f11317/attachment.pgp 

More information about the collectd mailing list