[collectd] cpu wait time on collectd server

eric fauser ef_cd at apa.at
Wed Sep 12 17:36:39 CEST 2007


Hi

 > Athlon 64 3200, 1Gb ram running gentoo In the moment it
 > receives stats from 30 hosts. In total there are 1426 rrd files.

our specs are 70 hosts (3556 rrd files ) reporting to a server
which has a disk-backend of 6x 36GB SAS disks (raid5) and
8GB memory for the page cache, but we are running collectd3 now.
(cpu 2x DualCore 3.2GHz)

 > large cpu wait times averaging about 70%. it must be waiting
 > on network IO because disk write throughput is only ~1Mb/sec,

as we used 2GB Ram , we ran first into a udp-kernel-buffer problem 
(netstat -su)
and then into a diskbottleneck-problem.
phys.ReadIO increased to 50MByte/sec and cpu-wait-io gone
from 20% to 60%
the only way was to dramatically increase the memory (page-cache)
(in our case 8GB)

so, imho with collectd4 the solution should be
 1.) use rrd-cachetimeout
 2.) very fast disk-backend
 3.) as much ram as possible ;)

eric



More information about the collectd mailing list