[collectd] Debugging NaN values being recorded

Tue Oct 23 07:17:14 CEST 2012

Hi Brandon,

On 23 October 2012 03:43, Brandon Hume <hume-ml+collectd at bofh.ca> wrote:
>  I've got the following collectd arrangement:
>
>     Solaris Zone 1 collectd --.
>     Solaris Zone 2 collectd --+--  Linux collectd -> rrd
>     Solaris Zone 3 collectd --'
>     Solaris zone 4 collectd --'
>
> So four Solaris zones, which all exist on the same host server, reporting
> (via network plugin) to collectd running on Linux.  It actually works very
> well.

Have you verified that the Process plugin is sending the fork_rate
metric over the wire? tcpdump + wireshark are excellent for this.

> What are my best means of finding out *why* RRD would reject a value?  I've
> checked to make sure the "heartbeat" of each rrd matches the interval... and
> I've tried turning up syslogging but there's a lot of traffic and it's hard
> to pick things out when I don't know what I'm looking for.
>
> Is there a means of detecting rrd rejections?

Generally the syslog or logfile plugins with logging upped to the
debug level are pretty good for this.

You should be able to grep for the rrdtool plugin in the log, which
might look something like this:

[2012-10-04 11:22:09] rrdtool plugin: rrd_update_r
(/var/lib/collectd/rrd/foobar.example.org/disk-dm-0/disk_ops.rrd)
failed: mmaping file
'/var/lib/collectd/rrd/foobar.example.org/disk-dm-0/disk_ops.rrd':
Invalid argument

If any errors are being raised, they'll be in the rrd_update_r
function, but you could also see something in the rrd_create_r
function.

As a last resort, in the past I have enabled the CSV plugin and
checked the value is being emitted there correctly.

Cheers and good luck,
Lindsay

-- 
w: http://holmwood.id.au/~lindsay/
t: @auxesis