[collectd] Having problems with notifications

Andrés J. Díaz ajdiaz at connectical.com
Mon May 4 13:10:06 CEST 2009


Hi,

I have the same problem in my instalation, exactly the same scenario,
I found a possible explanation in the plugin.c module. While in the
4.5 branch exists a callback to function ut_check_threshold (defined
in utils_threshold.c), in the 4.6.2 (and also in 4.6.1, I think), the
callback is missing, so threshold checking never runs.

In fact, a grep -r ut_check_threshold over src dir, only show the
definition of the function in ut_check_threshold module.

I'm not sure if it's really a bug, but when I patched my code, it
works fine for me :) I'm using the 4.6.2 version from tar.gz.


Best regards,
  Andrés


2009/4/15 Brian Long <brian at dotspots.com>:
> Hi,
>
> I'm having some problems getting notifications to work consistently or
> sometimes work at all. Every machine in our cluster runs collectd and sends
> its data via the network plugin to a central master server. The master
> server writes the data using rrd and also contains the <threshold/>
> definitions for alarming. An interval of 30 is used everywhere at the moment
> (10 was thrashing the master disk too much). The cluster runs a mixture of
> 4.5.x to 4.6.x and the master is running 4.6.2.
>
> In this particular instance, one of the machines runs a custom <exec> plugin
> that reports on a queue backlog size. The data is definitely making it to
> the server becaues the RRD graph shows a nice fluctuation of queue size
> (typically 0 and 10). On the central server I have the following definition:
>
> <Threshold>
>   <Type "current">
>      WarningMin    0.00
>      WarningMax    1.00
>      FailureMin    0.00
>      FailureMax    1.00
>      DataSource    "value"
>      Invert        false
>      Persist       true
>      Instance      "FetchQueueCount"
>   </Type>
> </Threshold>
>
> The queue fluctuates between full and empty quite often, so this alarm ought
> to be getting triggered and then okay and so on repeatedly. Instead, I have
> never seen this alarm on a value range problem. Once in a long while it will
> alarm on not receiving data, but never on data out of threshold.
>
> The notification exec looks like:
>
> <Plugin exec>
>         NotificationExec "dotspots:dotspots"
> "/dist/collectd/bin/notify-exec.rb"
> </Plugin>
>
> I have a logfile running on the master server and it is not indicating any
> errors, and I don't see any info about notifications in the log (except when
> I get a failure due to to the occasional incident of data not being
> received.)
>
> Am I doing something wrong?
>
> Thanks,
> Brian
>
> _______________________________________________
> collectd mailing list
> collectd at verplant.org
> http://mailman.verplant.org/listinfo/collectd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collectd-plugin.c.patch
Type: text/x-patch
Size: 269 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20090504/dacde753/attachment.bin 


More information about the collectd mailing list