Hi,<br><br>I'm having some problems getting notifications to work consistently or sometimes work at all. Every machine in our cluster runs collectd and sends its data via the network plugin to a central master server. The master server writes the data using rrd and also contains the <threshold/> definitions for alarming. An interval of 30 is used everywhere at the moment (10 was thrashing the master disk too much). The cluster runs a mixture of 4.5.x to 4.6.x and the master is running 4.6.2.<br>
<br>In this particular instance, one of the machines runs a custom <exec> plugin that reports on a queue backlog size. The data is definitely making it to the server becaues the RRD graph shows a nice fluctuation of queue size (typically 0 and 10). On the central server I have the following definition:<br>
<br><Threshold><br> <Type "current"><br> WarningMin 0.00<br> WarningMax 1.00<br> FailureMin 0.00<br> FailureMax 1.00<br> DataSource "value"<br> Invert false<br>
Persist true<br> Instance "FetchQueueCount"<br> </Type><br></Threshold><br><br>The queue fluctuates between full and empty quite often, so this alarm ought to be getting triggered and then okay and so on repeatedly. Instead, I have never seen this alarm on a value range problem. Once in a long while it will alarm on not receiving data, but never on data out of threshold.<br>
<br>The notification exec looks like:<br><br><Plugin exec><br> NotificationExec "dotspots:dotspots" "/dist/collectd/bin/notify-exec.rb"<br></Plugin><br><br>I have a logfile running on the master server and it is not indicating any errors, and I don't see any info about notifications in the log (except when I get a failure due to to the occasional incident of data not being received.)<br>
<br>Am I doing something wrong? <br><br>Thanks,<br>Brian<br>