[collectd] terrible perfomance of collectd
Andrés J. Díaz
ajdiaz at connectical.com
Sun Nov 15 18:35:54 CET 2009
2009/11/15 Israel Garcia <igalvarez at gmail.com>:
> Hi list, I'm running collectd 4.7.2 on a xen domU(debian lenny) and
> on a Dual Intel 1.4GHz, RAID1 with 2x36GB SCSI 10,500RPM . I'm
> collecting information of more than 100 servers (a lot of rrd files),
> so cpu load is always at 100% iowait, the load is always over 3, disk
> is doing over 400 IOPS and 3MB/s througput. No more domU's are running
> in this dom0 server. How can I improve the performance in this server?
> Can you help me?
The first step could be setting the filesystem properly. In my case
(i'm collecting data from more than 150 servers), I mount the rrd
directory with "data=writeback, commit=60, noatime, nodiratime". This
will be improve a bit the disk io.
The next step is setting the rrdtool plugin or rrdcached if you are using
this plugin. In my case I use directly rrdtool, and has the following values:
You can get information about this parameters in collectd.conf(5) man page.
The RandomTimeouts was recently commited in collectd and I'm not
sure if it is available in 4.7.2 version :(
Finally, and how Josef says, check the interval on your config. In my case
we have an aggresive interval of 10s (we require a "real time monitoring" in
some situations), but usually values of 30s or greather will be enough for a
One more thing. Xen hypervisor, AFAIK, drives the IO operations from VMs to
domU, so if you have a number of VM which works hard with disk you could
have a big bottleneck in the host. I was not working much with Xen, but I
heard creepy stories about the IO management of the domU.. :(
More information about the collectd