[collectd] Question re: rrd formats in collectd....

Sun Jan 20 14:35:20 CET 2008

Hi Chad,

first off, please send further emails to the mailinglist, too (see the Cc:
field for the address). The best way to do this is to `group-reply' to
this message. I left in most of your mail for people on the mailinglist.

On Tue, Jan 15, 2008 at 08:56:27PM -0800, Chad Manning wrote:
> First off... NICE WORK on the collectd tool! I've been using it
> extensively for over the past year in a production environment of
> about 80servers and am very thankful for the ability to have such high
> resolution performance data of the servers I am monitoring.

Thanks :)

> Before I go digging into the code and start looking into how to tinker
> with the rrd db schema's  you have chosen, etc... I have been running
> collectd v3.11.2 for the last year or so across all of my servers and
> thought I would take a look at upgrading to your latest release.
> While reviewing the latest release, I noticed that the actual .rrd
> files generated in v4.x have changed since v3.11.2, at least for the
> cpu, haven't gotten further yet.  It looks like what used to be a
> single .rrd with data including idle, system, user, etc values has now
> been broken down into numerous .rrd files each with only a single
> value collected...  Unfortunately, I am using Cacti to provide a
> templatized graphing interface atop the data collection and this isn't
> going to be very smooth upgrade for me.  In fact, using cacti, it is
> possible but very hack/challenging to be able to use multiple input
> .rrd files and create cdef's that span multiple rrd files to create a
> single aggregate graph.  It is possible, but it is very ugly, and I
> fault the cacti architecture for this as it should be trivial to
> perform these cross file data merges into graphs...  Nonetheless, it
> was trivial in the v3.x series of collectd given that all of the data
> was in a single rrd.
> 
> Before I go hacking your code, is there any easy means of reverting in
> a patch on my side the ability to reproduce single rrd's for each type
> of data collected a la v3 or am I best off forgoing the latest
> features and sticking to 3.x?

No, there's no easy way, you'd have to change all plugins that have been
changed, for example the cpu- and memory-plugins. If it is possible to
use multiple RRD-files in one graph I'd do that: The cases where it is
needed are not that many. To mind come
- cpu
- memory
- swap
- ps_state (the former processes)

> Why did the format change?

To be more flexible. Linux has `buffered' memory (the buffer cache), Mac
OS X has something called `wired' memory. In recent versions of Linux
the `steal' CPU-counter has been added and so forth. When something like
that comes up we only need to generate a new RRD-file, which is trivial
and backwards compatible. With 3.* we needed to use some kind of hack
(use `buffered' for `wired' or something like that) or simply couldn't
support new features (the `steal' time, e. g.). Changing the RRD-files
is troublesome, not backwards compatible and ugly in general.

> Are there performance benefits to splitting the datafiles, thread-safe
> issues?

No, updating one file with several DSes is actually ``cheaper'' than
updating several files with one DS each. RRD-files are written
sequentially so that this doesn't have any thread-issues.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20080120/1c68b9b4/attachment.pgp