[collectd] collected 4.10.1 stops writing and high CPU
jesse at bulletproof.net
Wed Jan 16 07:32:29 CET 2013
We have a collectd server that is writing to about 24,000 RRD files, most of which are 15 MB each (with some at 30 MB and some at 45 MB), about 480 GB of RRD files in all.
On occasion we are seeing disk writes drop right down to a trickle, and at the same time collectd's CPU shooting through the roof. Once collectd goes into this state it can be like this for hours, and the RRD files are mostly not being updated in this time. The only way to get things going again is to 'kill -9 <collectd's pids>' and start collectd again.
The RRD files for data originating within this instance of collectd (and not coming via the network plugin) are not interrupted, so it is something to do with the network plugin, it seems.
Has anyone got any advice on how we might chase this problem down further? We are on Ubuntu 12.04.
Is it possible to peer into collectd to see if it's a problem with the network plugin, or the rrd plugin, or something else?
More information about the collectd