[collectd] collectd + YaketyStats

Thu Jan 22 16:50:00 CET 2009

Hi again!

Sorry for the delay!

Florian Forster <octo at verplant.org> writes:

> first off, nice job on YaketyStats :) If we could get both projects to
> play with one another nicely, that'd be a huge win for both, I think :)

:)

> The minimum is always especially interesting, if you're regarding
> metrics, where low values are a bad thing. Take, for example, the
> predicted time an UPS will be able to provide energy for the current
> load.

> ...

OK, I'm sold.  I'll look at changing YaketyStats to include all three
and at a script to convert existing YS RRDs to have all three.

> If you graph all three consolidation functions, you probably get the
> best overall picture. If you look at [0], you will see the minimum and
> maximum values as a light blue area and the average as a dark blue line.

I like that graph.  Graphing all three works nicely if you have just one
stat on a graph.  If you have more than one stat and graph all three CFs
it's hard to tell what's going on.

>> B)  Jart assumes you have one DS per RRD file.  There's a
>> proof-of-concept implementation of handling more than one DS per file,
>> but it's not the prettiest.  Unless there are compelling reasons to
>> keep multiple DSes per file, it would be nice to break them up.
>
> Most of collectd's ``data sets'' have only one ``data source'' (which
> basically directly translates to: Most RRD files have only one DS). Some
> have two, especially interface statistics (RX and TX) and other
> IO-stats.
>
> I think splitting the IO-stats up into two data sets / RRD files doesn't
> make much sense - IO, as the name implies, will always consist of input
> and output, so putting those two into one file is the reasonable thing
> to do.

It *is* reasonable but would you lose anything (other than the pain of
conversion) by splitting them up?  It is definitely possible to code
around having multiple DSes in each RRD file.  And it's possible to
discover the DS name(s) in each RRD file.  But it's easier and cleaner
if you don't have to do that.

For example, it seems like there are two choices for determining DS
names.  One way is to look in each RRD file as you get to it.  If you
have a lot of graphs to draw (say you click on "[All]" for a certain
host name in Jart, or a playlist that graphs the rx_bits for all hosts at
one of your sites) that's a lot of extra 'rrdtool info' commands.

The other way is to cache the data.  We could have a cron job that looks
at each RRD file and stores all the DS names for Jart to read as needed.
But then Jart is less dynamic than it is now.  That is, if a new host or
stat shows up Jart won't know its DS names until the cron job runs.  Or
you could make Jart say "If the DS name isn't in the cache, run 'rrdtool
info'".

> Oh, there's the `system load' that has three data sources and very very
> few other types with more than two data sources. If you ask me, they can
> be ignored, but other people may disagree ;)

Ha!