[collectd] collectd + YaketyStats

Fri Jan 23 19:14:42 CET 2009

Florian Forster <octo at verplant.org> writes:

> Hi Mark,
>
> On Thu, Jan 22, 2009 at 10:50:00AM -0500, Mark Plaksin wrote:
>> OK, I'm sold.  I'll look at changing YaketyStats to include all three
>> and at a script to convert existing YS RRDs to have all three.
>
> in collectd's contrib/ directory is a script called `rrd_filter.px'
> which can do that. Run
>   $ perldoc contrib/rrd_filter.px
> to access the built-in documentation.

Cool--thanks for the pointer!

>> It *is* reasonable but would you lose anything (other than the pain of
>> conversion) by splitting them up?  It is definitely possible to code
>> around having multiple DSes in each RRD file.  And it's possible to
>> discover the DS name(s) in each RRD file.  But it's easier and cleaner
>> if you don't have to do that.
>
> Yes, it would technically be possible to split up all files so each one
> has only one DS. You would be able to store the same data afterwards and
> would hardly lose any data while converting the files.
>
> I do see that having just one DS per file and having all DSes have the
> same name makes graphing a big deal easier. However, changing the layout
> of data in collectd plugins would be a backwards incompatible change.
> Making backwards incompatible changes every now and then is important to
> keep stuff elegant and clean, but right now there aren't enough problems
> in collectd that would be solved by such a dramatic step to justify it.

I see :)  FWIW, I've been planning on making a conversion script for
existing collectd RRDs so see how well they work with Jart.  Perhaps
this will uncover other things we'd need to change to make the two work
well together.  And maybe it will help provide some justification :)  Or
maybe not.  It's an easy first step anyhow!

>> For example, it seems like there are two choices for determining DS
>> names.  One way is to look in each RRD file as you get to it.  If you
>> have a lot of graphs to draw (say you click on "[All]" for a certain
>> host name in Jart, or a playlist that graphs the rx_bits for all hosts at
>> one of your sites) that's a lot of extra 'rrdtool info' commands.
>
> I think the most elegant solution to this is to consider the DS name
> part of the overall name of the data. Where previously you used `$FILE'
> (and actually meant `the "value" DS of the file $FILE'), you could use
> `($FILE [, $DS])' to mean `the $DS DS of the file $FILE'.
>
> This way the DSes of an RRD file need to be looked up when scanning for
> new/available files to build a new graph, but not when displaying a
> ``playlist'' or pre-created graph. Kind of `caching in the graph name'.

True, this would work.  For the UI to be the prettiest we'd want do
distinguish between meaningless DS names (such as those in one-DS RRD
files) and DS names that have meaning.  That is, for a YaketyStats load
average RRD, we'd want to display the tree like this with *no* DS name:
	jojo.example.com/load/1-minute

For a collectd RRD about IO it might be this where "read" is the DS name:
	collectd.example.com/disk-dm-0/disk_ops/read

Playlists in Jart are very useful but I only use them maybe one third or
one quarter of the time.  The rest of the time I'm looking at stats for
a specific host (or cluster).  When you're poking through the UI that
way, the UI has to either already know all of the DS names or it has to
discover them as you poke around.  So I think we'd need a cache of all
the DS names in addition to (or in place of) the playlist cache you
mention.