[collectd] Debugging NaN values being recorded
Brandon Hume
hume-ml+collectd at bofh.ca
Mon Oct 22 18:43:56 CEST 2012
I've got the following collectd arrangement:
Solaris Zone 1 collectd --.
Solaris Zone 2 collectd --+-- Linux collectd -> rrd
Solaris Zone 3 collectd --'
Solaris zone 4 collectd --'
So four Solaris zones, which all exist on the same host server,
reporting (via network plugin) to collectd running on Linux. It
actually works very well.
The binaries and configurations for all four zones are identical, except
for Hostname. Most of the stats are working fine, *except* for
"fork_rate" from the processes plugin.
This is where it gets weird.
"fork_rate", because these are zones and not full VMs, is the exact same
metric across all four. So it's wasteful for me to be recording it four
times, but not terribly so - and it helps avoid needing to flip pages
when viewing the stats.
However, two of the zones are reporting "NaN" for that metric, while the
other two are happily recording real, useful values. Keep in mind that
this is effectively the same number being sent by all four zones... I
don't think it'd vary that much as each zone's collectd gets CPU time,
and not this consistently.
What are my best means of finding out *why* RRD would reject a value?
I've checked to make sure the "heartbeat" of each rrd matches the
interval... and I've tried turning up syslogging but there's a lot of
traffic and it's hard to pick things out when I don't know what I'm
looking for.
Is there a means of detecting rrd rejections?
More information about the collectd
mailing list