[collectd] collectd rrdtool performance

Thorsten von Eicken tve at voneicken.com
Wed Dec 19 17:29:37 CET 2007


I'm running into performance issues with collectd's rrdtool plugin. I'm 
collecting data from ~150 hosts with the std 10 sec step on a dual-core 
dual-drive machine with a striped xfs filesystem and it's sitting in 
90%+ I/O wait. The disks are at 100% util. Even if I set the RRDTool 
plugin cache to 60 seconds the situation is not much better. In terms of 
numbers, the 150 hosts result in 10500 RRDs being updated every 10 
seconds, so >1000 RRD updates per second. I see the disks doing about 
500 writes/sec and almost no reads (large memory cache).

The biggest issue I see is that 150 hosts = 10500 RRDs. I'm planning to 
go ahead and reorganize a little how the RRD data is stored by placing 
all related variables of a plugin into a single RRD as opposed to the 
current scheme where almost every variable is in its own RRD. The reason 
I'm writing is to get some feedback on how to do this so it can be 
accepted into the collectd source. Here are the options I see:

1_ fuhgetaboutit, will never get integrated into collectd source

2_ add a CompactRRD option to each relevant plugin to switch between the 
standard layout and the new compact layout

3_ clone each plugin and create a new version using the compact layout

I like #2 the best, but would love to hear some feedback before I roll 
up my sleeves. Also any other suggestions would be welcome. Other things 
I want to pursue:

- try rrdtool 1.2.24, which has disk I/O vadvise optimizations
- increase the collection interval a bit

Comments?

By the way, I'm very impressed how well collectd has handled the current 
overload situation. It looks like there is enough buffering between the 
network input and the rrdtool updates such that apparently no data gets 
lost. Nice!

Thorsten



More information about the collectd mailing list