[collectd] collectd rrdtool performance
Thorsten von Eicken
tve at voneicken.com
Wed Dec 19 17:29:37 CET 2007
I'm running into performance issues with collectd's rrdtool plugin. I'm
collecting data from ~150 hosts with the std 10 sec step on a dual-core
dual-drive machine with a striped xfs filesystem and it's sitting in
90%+ I/O wait. The disks are at 100% util. Even if I set the RRDTool
plugin cache to 60 seconds the situation is not much better. In terms of
numbers, the 150 hosts result in 10500 RRDs being updated every 10
seconds, so >1000 RRD updates per second. I see the disks doing about
500 writes/sec and almost no reads (large memory cache).
The biggest issue I see is that 150 hosts = 10500 RRDs. I'm planning to
go ahead and reorganize a little how the RRD data is stored by placing
all related variables of a plugin into a single RRD as opposed to the
current scheme where almost every variable is in its own RRD. The reason
I'm writing is to get some feedback on how to do this so it can be
accepted into the collectd source. Here are the options I see:
1_ fuhgetaboutit, will never get integrated into collectd source
2_ add a CompactRRD option to each relevant plugin to switch between the
standard layout and the new compact layout
3_ clone each plugin and create a new version using the compact layout
I like #2 the best, but would love to hear some feedback before I roll
up my sleeves. Also any other suggestions would be welcome. Other things
I want to pursue:
- try rrdtool 1.2.24, which has disk I/O vadvise optimizations
- increase the collection interval a bit
Comments?
By the way, I'm very impressed how well collectd has handled the current
overload situation. It looks like there is enough buffering between the
network input and the rrdtool updates such that apparently no data gets
lost. Nice!
Thorsten
More information about the collectd
mailing list