I'm exploring using collectd (5.0.1 on centos) as an adjunct to
monitoring. In most cases I'll be using non-persistent notifications
just to cut down on noise.

At it is I've followed the docs here
http://collectd.org/wiki/index.php/Notifications_and_thresholds and
have basic thresholds/notifications working.

As it stands, hosts that have plugins, which are configured with
thresholds, have alerts sent when either the stat being monitored is
not within the threshold boundaries or the collectd server has not
seen a stat from a host within the past N seconds.

What about transient outages such as a network blip? In that case
collectd would generate notifications for host(s)/stat(s). When the
network starts working, stat collection resumes. It might be nice if
there were a corresponding OK message indicating that we are now
seeing stats come in for the same set of servers.

It might also be nice from the perspective of measuring 'flap' or just
the outage window if we were able to get the failure notices and also
a transition notice where stats resume.

I just want to note this is different than regular plugin thresholds
which can generate the FAILURE/OK events as boundaries are crossed.

Does this seem right? I am still getting my feet under me with
collectd so correct me where I have it wrong.

