[collectd] collectd rrdtool performance
    Thorsten von Eicken 
    tve at voneicken.com
       
    Wed Dec 19 17:29:37 CET 2007
    
    
  
I'm running into performance issues with collectd's rrdtool plugin. I'm 
collecting data from ~150 hosts with the std 10 sec step on a dual-core 
dual-drive machine with a striped xfs filesystem and it's sitting in 
90%+ I/O wait. The disks are at 100% util. Even if I set the RRDTool 
plugin cache to 60 seconds the situation is not much better. In terms of 
numbers, the 150 hosts result in 10500 RRDs being updated every 10 
seconds, so >1000 RRD updates per second. I see the disks doing about 
500 writes/sec and almost no reads (large memory cache).
The biggest issue I see is that 150 hosts = 10500 RRDs. I'm planning to 
go ahead and reorganize a little how the RRD data is stored by placing 
all related variables of a plugin into a single RRD as opposed to the 
current scheme where almost every variable is in its own RRD. The reason 
I'm writing is to get some feedback on how to do this so it can be 
accepted into the collectd source. Here are the options I see:
1_ fuhgetaboutit, will never get integrated into collectd source
2_ add a CompactRRD option to each relevant plugin to switch between the 
standard layout and the new compact layout
3_ clone each plugin and create a new version using the compact layout
I like #2 the best, but would love to hear some feedback before I roll 
up my sleeves. Also any other suggestions would be welcome. Other things 
I want to pursue:
- try rrdtool 1.2.24, which has disk I/O vadvise optimizations
- increase the collection interval a bit
Comments?
By the way, I'm very impressed how well collectd has handled the current 
overload situation. It looks like there is enough buffering between the 
network input and the rrdtool updates such that apparently no data gets 
lost. Nice!
Thorsten
    
    
More information about the collectd
mailing list