[collectd] Shouldnt collectd also provide an interface to the data?

Sun Sep 13 15:44:02 CEST 2009

Heya,

On Sun, Sep 13, 2009 at 6:40 PM, Lindsay Holmwood
<lindsay at holmwood.id.au> wrote:
> G'day,
>
> 2009/9/13 Spike Spiegel <fsmlab at gmail.com>:
>>
>> The main problem I see is that new frontends and GUIs come up tied to
>> some application the developer is using and while that's normal I
>> think it doesn't help the cause as much as it could. Effectively what
>> is being discussed here is a generic toolkit to interact with/graph
>> data stored by a monitoring system, and also focusing exclusively on
>> rrd is possibly limiting.
>>
>
> By "application", do you mean the statistics collection app like
> collectd or Ganglia, or the data abstraction like Visage?

former, ganglia has its own frontend and so does collectd, cacti,
nagios, etc etc etc... all in some way get data into rrd and then
build their own UI on top of it, lots of duplicated code.

>
> I'm all for building a generic solution to the world's graphing
> problems, but for the time being I think the focus should be on
> improving the graphs of individual statistic collection apps, then
> genericise later.

I disagree, but of course it's just an opinion. Personally I think
it's this approach that led to the vast amount of duplication we have
today, by the time you have something that works for an app people
will have built all sort of hacks on top of it that changes won't be
possible/welcome. Rinse and repeat.

Don't get me wrong, I understand the perils of generic frameworks and
I don't want to overengineer it, yet I think that design and
implementation should be completely agnostic.

> There are enough inconsistencies in the collectd
> data to keep me busy for a while. :-)

I think the crucial point here is to think of this as a plugin versus
an application. Collectd's inconsistencies should be addressed in the
collectd module of the data webservice

> People will have strong opinions on how they want their data graphed.
> Some will think JS is insane, while others will think that server side
> rendered images are too brittle.

And that is why we split the data services from the UI, so that people
will be able to build whatever UI they want in whatever language they
want, but implementing the same API, which also means that for example
you can use JS on a browser, but switch to the server generated images
for a phone or something.

>
> Where the graphs are rendered will strongly influence how the reusable
> the code is. Client side code won't be usable on the server side, and
> vice versa.

agreed, but this has nothing to do with having an independent web
service that abstracts your data into a sane API that different
consumers can access to build all sorts of graphs.

> I'm part way there with Visage right now. The JavaScript that
> generates the graphs (visageGraph) is abstracted away from the fetch
> phase (visageBase). I've done it this way so I can easily build
> different graph types while utilising the same fetch code.
>
> Right now the fetch code is tied to the the Visage JSON api, but it
> wouldn't take a lot to refactor if someone wants to take up that
> mantle.

have you seen the latest thread on the rrd ml? they are talking of
building it a -a JSON option into rrdxport/graph (there are some
problems there). This is also why having things modular matters, it's
easy to swap them out.

>> indeed, so here something has to be done, either by writing a web
>> service like visage or maybe using rrdcached, point again is, it's
>> irrelevant to the UI and graphs generation
>
> Actually it's pretty relevant, if you don't want god-awful
> normalisation in your graphing code.

I've lost you here, sorry. I don't see why normalizing the data is
relevant to the UI, that should all happen in the backend, that's the
point of it. All the UI sees is a tuple (host,metric,timestamp) or
something along those lines.

> I've spent the last week battling inconsistencies in the way collectd
> stores its RRD data. collectd uses 2 different conventions:
>
> plugin/plugin_instance => datasource, e.g.
> cpu-0/cpu-load => value
> load/load => shortterm, midterm, longterm
>
> Even within that plugins log data slightly different (e.g. multiple
> plugin instances relating to a single plugin, like the cpu plugin),
> and the processing phase gets tricky, regardless of whether the code
> is client or server side.

sure, and since it's specific to the collection application that's
dealt with in the relative module for the data webservice so that
different inconsistencies in the way different apps handle rrds can be
appropriately dealt with in a clean way.

> I don't know what form Ganglia's RRDs are, but I assume there are some
> corner cases that will make it difficult.

there are some indeed, and that's fine, different apps/devels make
different choices on how to implement stuff.

>>
>>> Yeah but then for keeping it in some kind of db
>>
>> I lost you here, sorry, why would you want to keep the data in some kind of db?
>
> If you don't want large RRDs containing lots of historical data.

and instead you want large DBs and have to implement your own purge and stuff?

>>>
>>> And if ure interested only in actual data u can run collectd on
>>> "graph" node and get data using unixsock plugin
>>
>> what about historical data? the graph node shouldn't be active 24/7,
>> graphs should be on demand. Am I missing something here?
>
> See above.

as long as you're fine with not having *all* data rrd is a better
solution. Of course there is the argument for being able to drill down
to the minute at any point in time, but that's a different discussion.

> Opinions? Lets keep the discussion going!

thumbs up, let's keep it going indeed.

Spike

-- 
"Behind every great man there's a great backpack" - B.