[collectd] Monitoring of massive amout of sensors

Endre Szabo collectd-list at urbnet.hu
Sun Jul 17 20:29:02 CEST 2011


Dear Collectd list,

I'd like to port my monitoring solution to collectd, and I've ran into
problems. I guess I'm wrong on the way with the following. What is the
best approach to achieve this?

I have 2 application servers where I'd like to monitor 3850 sensors at
each server, that's a total of 7700. Right now a perl script runs from
cron in 5 minute interval and collects this data and spits out the
gathered values in csv, having the following format:

timestamp,hostname,sensor_name,sensor_value

The sensors are GAUGE types, with values minimum of 0, maximum of
~200000. One problem I'm facing is the type instance string limitation
of 63 characters, several of my sensors' name longer than that. A
hashing of the type instance names maybe do the job. But in that case
I have to keep record of the hash and the assigned type instance names
somewhere else. In fact, the sensors naming scheme is like a
serialization of a tree stucture, where nodes are separated by '/'
(this is another limitation to the type instance string, therefore I
started to use \ instead of /).

I've tried to push values to collectd this way (perl script talks to
collectd via unix socket):

PUTVAL emon/appserver1/system\somesystemsubtree\somesystemsubsystem\somesubsystemproperty
interval=300 1310917139:3921

The problem with this is that collectd can't keep up with the input on
the socket and gets blocked or chunked the command lines, I see
shortnamed type instance names in the rrd/csv directories. I guess
this is not the suggested method, especially when collectd has it's
own perl and python bindings.

What do you guys think?
--
end.re

PGP: http://end.re/endre.gpg
PGP FP: 090B 77BC 2055 8306 5B3A  C635 00DB 7F46 4AAB 7A78



More information about the collectd mailing list