[collectd] Scalability

Florian Forster octo at verplant.org
Thu Nov 15 10:11:36 CET 2007

Hi Petri,

On Thu, Nov 15, 2007 at 12:13:08AM -0800, Petri Jarre wrote:
> Even without the rrdtool plugin this seems to be too much on a single
> Pentium machine. I have been using the exec plugin with a simple C
> program that spews out fake numbers for 10000 "cpu's".

hm, it's hard to say anything about that without any numbers.. :/

I've done the same thing just now: Written a small shell script with
spit out fantasy values for 100000 ``CPUs'' as fast as it could and fed
that to collectd's exec plugin. Since you want to access the values
somehow I've loaded the `unixsock' plugin which uses a binary search
tree to store the values and provides a UNIX socket you can use to query
the values as you need them.

The results on my machine (specs see below) are:
Dispatching 100000 values to the exec plugin takes a little less than 30
seconds, i. e. about 3350 values/second. So collecting 40k datapoints
from 8 servers should be possible at an 120 second interval.
(I'm using the current development version which has an cache inside
collectd AND inside the unixsock plugin. The cache in the unixsock
plugin will most likely be discarded before the next release..)

The machine I tested on:
  AMD Athlon(tm) XP 2100+ (1733 MHz, 256 kByte cache)
  786 MByte memory at 266 MHz
(So, all in all, just your everyday, slightly outdated workstation)

> I looked at collectd because it had a very promising mindset, but it
> looks like it does not match the task I have at hand after all. Am I
> missing some clever way of using collectd for this, or is this just
> too far from collectd's original purpose?

You could try to use the UNIX socket provided by the unixsock plugin to
dispatch the values. I have no idea if sockets are faster than pipes,
but it might be worth a try. Other than that: As far as I understand
what you're trying to achieve it sounds like the unixsock plugin is your
friend: You can easily put data in and, whenever needed, get the values
out again. Since the data is kept in memory this will probably not be
IO-bound (during my test the instance had an RSS of 50 MByte..)

Hope this helps.. If I missed your point or your tests showed something
different than my findings, please let me know.

Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071115/a4a5c28f/attachment.pgp 

More information about the collectd mailing list