[collectd] collectd with rrdcached, not reconnecting when rrdcached gets stopped/restarted

Ulf Zimmermann ulf at openlane.com
Fri Dec 31 01:07:52 CET 2010


Yes, they all go to a central server. I tried using the collectd network plugin a longer time ago and the results was too many lost updates. Because of that I used to run collectd locally and copied the rrd files to a central locations every 30 minutes, not so very effective. Then we starting going the vmware way and with the older version of collectd and rrdtool, many small writes were destroying our SAN. At that point I took the way of rrdcached. Works mostly great, allows flushing when I graph, etc. Just there have been some memory issues/bugs, which should be fixed now. Due to those bugs I had talked about how collectd rrdcached plugin isn’t reconnecting and was told it should be reconnecting. But it doesn’t for me, even with the latest versions.


From: XANi [mailto:xani666 at gmail.com]
Sent: Thursday, December 30, 2010 10:25 AM
To: Ulf Zimmermann
Cc: 'collectd at verplant.org'
Subject: Re: [collectd] collectd with rrdcached, not reconnecting when rrdcached gets stopped/restarted






So I am working on upgrading a number of things right now to collectd 4.10.2 and rrdtool 1.4.5. Unfortunately one of my major problems, is still around. When rrdcached dies/stops and then gets restarted, collectd will not reconnect. There was a discussion of this on IRC at some point, but I never got back to do more testing. Even with the latest version of rrdtool on client and server, collectd will go into a spin with messages like:



Dec 29 23:45:40 appbuild01 collectd: collectd startup succeeded

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-user.rrd, [1293695340:49451049], 1) failed with status -3.

Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-nice.rrd, [1293695340:28472], 1) failed with status -3.

Dec 29 23:49:00 appbuild01 collectd[10338]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.

Dec 29 23:49:00 appbuild01 collectd[10338]: rrdcached plugin: rrdc_update (appbuild01.autc.com/cpu-0/cpu-system.rrd, [1293695340:10117087], 1) failed with status -3.



At this point I have to restart collectd and everything will be fine. The pain is having to do this on > 300 machines.
Hi,

are u directing all 300 machines to same rrdcached server ? Wouldn't it be better to put collectd with rrdcached and network plugin (set up to act as server) enabled on machine that is collecting data, and just use network plugin to send data to that machine on all other machines ?

Regards



--

Mariusz Gronczewski (XANi) <xani666 at gmail.com<mailto:xani666 at gmail.com>>

GnuPG: 0xEA8ACE64

http://devrandom.pl


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20101230/51193e9f/attachment.htm>


More information about the collectd mailing list