[collectd] collected 4.10.1 stops writing and high CPU

Fabien Wernli collectd at faxm0dem.org
Thu Jan 17 13:57:21 CET 2013


Hi,

On Wed, Jan 16, 2013 at 11:03:00PM +1030, Jesse Reynolds wrote:
> We are not using rrdcached. I was aware of it but not in any detail, thanks for the recommendation. I can see that even just decoupling the RRD file writes to a separate process has big benefits, eg just being able to restart collectd without triggering a flush of all RRD files. I'll look at using it on the next build. 

I am currently moving my collectd-rrdtool servers to collectd-rrdcached.
We have a relatively large setup (roughly a thousand collectd clients, with
4 load-balanced servers).
The main reasons for this move are:

1) as you mentioned: decoupling collectd and rrd writes, very handy when
   restarting collectd
2) getting up-to-date graphs without having to change code (setting
   RRDCACHED_SERVER_ADDRESS)
3) finer control over rrd cache setting (e.g. possibility to set number
   of write theads)
4) performance increase ?

The main problems so far:

a) syslog saturation (gazillions of rrdcached update -1 messages, especially
   shortly after rrdcached startup)
b) rrdcached tuning can be tricky, mainly due to c)
c) the system's load and IO performance can be quite chaotic upon rrdcached
   startup: everything runs smooth at first, then there comes some I/O hell,
	 and later everything goes back to a somewhat stable situation. This was
	 much more smooth with rrdtool plugin.

We addressed a) by removing syslog plugin :-/.
As far as performance is concerned, I believe the gain is reasonable:
server load tends to go down (from 2.5 to 0.7 averaged on a day), IOPS
went down a little, but is less stable than before (we see some beating,
similar to yours). Also, CPU seems much more scattered among cores, with
quite fewer io_wait (we're using 6 rrdcached write threads).

The bottom line for me is: as of the time of writing, go for rrdcached, for
the benefit of being able to restart collectd without leaving holes behind
in your graphs. But when collectd will have a proper reload functionality,
and maybe better write threading, I may come back to rrdtool plugin. Don't
forget that the code in rrdcached actually comes from collectd's
rrdcached-plugin, so you get the caching both ways.

Hope this helps




More information about the collectd mailing list