[collectd] terrible perfomance of collectd
igalvarez at gmail.com
Sun Nov 15 20:28:59 CET 2009
On 11/15/09, Andrés J. Díaz <ajdiaz at connectical.com> wrote:
> Hi Israel
> 2009/11/15 Israel Garcia <igalvarez at gmail.com>:
>> Hi list, I'm running collectd 4.7.2 on a xen domU(debian lenny) and
>> on a Dual Intel 1.4GHz, RAID1 with 2x36GB SCSI 10,500RPM . I'm
>> collecting information of more than 100 servers (a lot of rrd files),
>> so cpu load is always at 100% iowait, the load is always over 3, disk
>> is doing over 400 IOPS and 3MB/s througput. No more domU's are running
>> in this dom0 server. How can I improve the performance in this server?
>> Can you help me?
> The first step could be setting the filesystem properly. In my case
> (i'm collecting data from more than 150 servers), I mount the rrd
> directory with "data=writeback, commit=60, noatime, nodiratime". This
> will be improve a bit the disk io.
> The next step is setting the rrdtool plugin or rrdcached if you are using
> this plugin. In my case I use directly rrdtool, and has the following
> CacheFlush 7200
> CacheTimeout 900
> RandomTimeouts 10
> You can get information about this parameters in collectd.conf(5) man page.
> The RandomTimeouts was recently commited in collectd and I'm not
> sure if it is available in 4.7.2 version :(
OK, I'll check is it available or not. I'll let you know. :-)
> Finally, and how Josef says, check the interval on your config. In my case
> we have an aggresive interval of 10s (we require a "real time monitoring" in
> some situations), but usually values of 30s or greather will be enough for a
> normal recollection.
I've already changed Interval to 60 but the performance was not good.
Do you think wit Interval over 60 there's isn't lost of data?
> One more thing. Xen hypervisor, AFAIK, drives the IO operations from VMs to
> domU, so if you have a number of VM which works hard with disk you could
> have a big bottleneck in the host. I was not working much with Xen, but I
> heard creepy stories about the IO management of the domU.. :(
You're right, but in my case I hace only one domU (collectd) in my
server. Nothing else is running on dom0 either.
thanks again Andres,
More information about the collectd