[collectd] collectd with rrdcached, not reconnecting when rrdcached gets stopped/restarted

Ulf Zimmermann ulf at openlane.com
Thu Dec 30 08:57:26 CET 2010


So I am working on upgrading a number of things right now to collectd 4.10.2 and rrdtool 1.4.5. Unfortunately one of my major problems, is still around. When rrdcached dies/stops and then gets restarted, collectd will not reconnect. There was a discussion of this on IRC at some point, but I never got back to do more testing. Even with the latest version of rrdtool on client and server, collectd will go into a spin with messages like:

Dec 29 23:45:40 appbuild01 collectd: collectd startup succeeded
Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-user.rrd, [1293695340:49451049], 1) failed with status -3.
Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-nice.rrd, [1293695340:28472], 1) failed with status -3.
Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-system.rrd, [1293695340:10117087], 1) failed with status -3.

At this point I have to restart collectd and everything will be fine. The pain is having to do this on > 300 machines.

I would like to take this post to try and track this further down. The client for this particular test was RedHat EL4 32-bit using collectd 4.10.2 and rrdtool 1.4.5 (with patches to remove graphing). Server is EL 5 64-bit with rrdtool 1.4.5 (full code).

Ulf.




More information about the collectd mailing list