[collectd] CSV plugin

Tue Feb 26 11:19:06 CET 2008

Hi Christophe,

On Mon, Feb 25, 2008 at 04:11:00PM -0500, Christophe Kalt wrote:
> Arg.. i'll never complete the migration to v4.x
> It's been such a painful and lengthy process that i need to
> rant/vent a bit, sorry.  (skip the next 2 paragraphs;)

no problem, I'm in the middle of migrating the home-grown solution of a
company to collectd myself.. ;)

> The latest round of pain has been the fact that data that belongs
> together has been spread around various files for no obvious reason
> (at least to me).  (I'm curious to hear why, btw)

In one word: Flexibility. Take the cpu plugin as an example: Some time
after collectd 4 was released, some Xen specific stuff made it into the
Linux kernel. That introduced a new counter for `steal time'. With all
data in one file and the promised backwards compatibility we wouldn't be
able to collect that value before version 5. With the split data it was
a matter of adding two or so lines of code.

> There's also no apprent consistency.  The load plugin keeps it all in
> one file, but the cpu plugin doesn't.

We keep in one file what cannot be separated from another. RX and TX
traffic of network interfaces, for example. The system load of 1, 5, and
15 minutes is so old, that I doubt that 2, 10 and 30 minute values will
be common anytime soon.

> And then, the df plugin keeps it all in one file per filesystem, and
> all of these within a single directory, while the cpu plugin has a
> directory per cpu.

If only one, `plugin instance' or `type instance' is used, you have to
chose either and I have to admit that there are inconsistencies. In
general we want to have one directory for a ``logical unit'', e. g. a
disk drive or a cpu. For the df plugin one might argue that there's only
mtab, but the truth is that I simply didn't think that one file per
directory would be a very good distribution.
In fact, I feel quite comfortable with the df plugin having all files in
one directory. A much worse decision was to put all the interfaces into
one directory.

If you want to help prevent such annoying inconsistencies in the future
please consider checking out the Git repository every now and then, give
it a try on some test machine and start asking ``stupid questions''[*].

> May be that makes more sense when outputing RRD files, but it's
> definitely a pain with the CSV plugin.  On top of this, i now find
> that the epoch used for each file may vary:
> 
> $ egrep '^120396949' *
> swap-free-2008-02-25    :1203969496,20311580672.000000
> swap-reserved-2008-02-25:1203969496,4972544.000000
> swap-used-2008-02-25    :1203969495,59842560.000000
>                                   ^

The time passed here is determined in the plugins and passed to collectd
in the `value_list_t' structure. The swap plugin, like most plugin, has
a `submit' function which fills this structure and calls time(2) for
each value that is submitted. Pulling that call out of the submit
function will solve this problem - for the swap plugin only.

I'll see that I can do that at least for some main plugins, but since
it's an easy fix and these `submit' functions are used all over the
place patches are very welcome ;)

> Any way i could convince you to fix this particular discrepency within
> collectd?  Please?

As I said: I'm not at all opposed to that, but it's a trivial fix in ~40
places - not exactly a fun activity..

Regards,
-octo

[*] which, as we all now, don't exist, because there are only stupid
    answers.
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20080226/4540db27/attachment.pgp