[collectd] v5upgrade target causing high IO wait, no RRD updates
jesse at bulletproof.net
Tue Apr 8 09:56:45 CEST 2014
So I've used omnibus to build a package containing rrdtool 1.4.7 and collectd 5.4.1, and it all works fairly nicely with rrdcached if I don't employ the v5upgrade target. I am seeing a lot of rrdcached errors due to it being unable to update RRD files that were created with times in the far distant future and that kind of thing, however.
Are you using the v5upgrade target? Or you have no v4 clients, or it's ok to live with the old metric naming schema for df, mysql etc?
On 8 Apr 2014, at 16:41:40, GRAY Andrew G (SPARQ) <andrew.gray1 at sparq.com.au> wrote:
> Not sure about debian packaging ... but 5.4 rpm's on centos package the collectd rrdcached plugin separately.
> Check it's there. Check the rrdcached daemon and collectd config are referencing the same pipe.
> One thing I did notice ... new rrd db files are created with the root owner, rrdcached is running as low priv, implying that collectd is creating the rdd dirs, and then rrdcached cannot write to them. Cron re-chown fix is not very elegant but does work.
> My 5.4 + rrdcached 4.7 is now running pretty smoothly and I'm really happy about all the IO wait going away :)
> Andrew Gray.
> RHCSA, Professional Unix Administration.
> -----Original Message-----
> From: collectd-bounces at verplant.org [mailto:collectd-bounces at verplant.org] On Behalf Of Jesse Reynolds
> Sent: Tuesday, 8 April 2014 4:52 PM
> To: collectd at verplant.org
> Subject: [collectd] v5upgrade target causing high IO wait, no RRD updates
> Hi all,
> I've just upgraded one of our collectd servers from 4.10.1 to 5.4.1 with rrdcached (1.4.7). Woo! One thing that surprised me was that if I had the v5upgrade target enabled, then RRDs weren't being updating (or not that I could discern) and top was saying 99% IO wait (compare about 10 to 20% without v5upgrade). So I'm currently running it without the v5upgrade target.
> Has anyone seen anything like this? Or have any idea what might be going on to cause this? I didn't see this behaviour on my testing prior to the upgrade. The main difference is in the size of the RRD files.
> I also ran `iostat 5` when testing and the IO wait stayed at about 99% consistently even when there were periods of no disk activity to speak of.
> This is on ubuntu precise 12.04, 64 bit, no virtualisation.
More information about the collectd