[collectd] collectd versus rrdcollect-remote rrdtool IO

Tue Jan 20 09:17:25 CET 2009

Hi Greg,

On Mon, Jan 19, 2009 at 10:46:49AM +0100, Greg wrote:
> Now please compare both CPU graph generated by Drraw: one with rrd
> from rrdcollect, and one with rrd from collectd. The "collectd" one
> doesn't have all data !! Because of disk IO. Seems that rrdcollect is
> better with IO performance, or my collectd isn't tuned for IO perf...

I'm confident, that once a value has made it into the system, it will
get written to an RRD file. I have seen collectd in some high-IO
situations and this has never occurred. So values aren't ``lost'' inside
collectd.

That doesn't mean that high-IO situations don't come with potential
problems. As I've writte in [0], I have encountered two problem so far:

 1) Overflow of the (UDP) receive-buffer.
 2) Corrupted RRD file caused collectd to quit ungracefully.

If you have gaps in all graphs but the ones the server collects about
itself, the first point is the most likely. I've done the changes I was
talking about in [0], you can get the code by checking out the Git
repository or use Sebastian's snapshots. If you see collectdmon
restarting collectd frequently, the second case is very likely.

The guess that collectd isn't ``tuned for IO perf'' is next to a
personal insult. Please read [1] which explains all the troubles
collectd is going through to provide the best IO-performance there is.
(To my knowledge only the RRD accelerator `rrdcached' is comparable.)

Since you have this
> <Plugin rrdtool>
>        DataDir "/var/lib/collectd/rrd"
>        CacheTimeout 120
> </Plugin>
in your config, collectd will write to each file only every 120 seconds.
rrdcollect, however, writes to each file every 10 seconds. The number of
updates per second is the critical number here (not the amount of data
put into the files). It appears you have both tools running on the same
machine at the same time, so my guess is that rrdcollect is killing the
machine and collectd is not scheduled often enough, leading to the UDP
receive-buffer issue. Try increasing the receive-buffer and restarting
collectd:
  # sysctl -w net.core.rmem_max=8388608
  # sysctl -w net.core.rmem_default=4194304
  # /etc/init.d/collectd restart

If this helps, put it in the /etc/sysctl.conf file so it's set at system
startup.

Regards,
-octo

[0] <http://mailman.verplant.org/pipermail/collectd/2009-January/002335.html>
[1] <http://collectd.org/documentation/inside_rrdtool.shtml>
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20090120/504c97f1/attachment.pgp