[collectd] terrible perfomance of collectd

Sun Nov 15 18:35:54 CET 2009

Hi Israel

2009/11/15 Israel Garcia <igalvarez at gmail.com>:
> Hi list, I'm running collectd 4.7.2 on a xen  domU(debian lenny) and
> on a Dual Intel 1.4GHz, RAID1 with 2x36GB SCSI 10,500RPM . I'm
> collecting information of more than 100 servers (a lot of rrd files),
> so cpu load is always at 100% iowait, the load is  always over 3, disk
> is doing over 400 IOPS and 3MB/s througput. No more domU's are running
> in this dom0 server. How can I improve the performance in this server?
> Can you help me?

The first step could be setting the filesystem properly. In my case
(i'm collecting data from more than 150 servers), I mount the rrd
directory with "data=writeback, commit=60, noatime, nodiratime". This
will be improve a bit the disk io.

The next step is setting the rrdtool plugin or rrdcached if you are using
this plugin. In my case I use directly rrdtool, and has the following values:

CacheFlush    7200
CacheTimeout 900
RandomTimeouts 10

You can get information about this parameters in collectd.conf(5) man page.
The RandomTimeouts was recently commited in collectd and I'm not
sure if it is available in 4.7.2 version :(

Finally, and how Josef says, check the interval on your config. In my case
we have an aggresive interval of 10s (we require a "real time monitoring" in
some situations), but usually values of 30s or greather will be enough for a
normal recollection.

One more thing. Xen hypervisor, AFAIK, drives the IO operations from VMs to
domU, so if you have a number of VM which works hard with disk you could
have a big bottleneck in the host. I was not working much with Xen, but I
heard creepy stories about the IO management of the domU.. :(

Regards,
  Andres