[collectd] curl plugin & persistent failure notification

Dave Cottlehuber dch at skunkwerks.at
Sun Jan 20 16:40:45 CET 2019


ahoy,

I have a cluster of 3 nodes each with an HTTP /_up endpoint that  returns 200 OK when things are well, and hangs when they are not (as the node is offline). I'm expecting to receive a persistent FAILURE notification each time round the main event loop if one of the nodes is down.

BTW full log and minimal config is more readable here: https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d

The metric collection works as expected, but if one node is down, I only see 1 FAILURE notification, and not a persistent one each time collectd does its loop:

option = Hostname; value = i09;
Created new plugin context.
plugin_load: plugin "uptime" successfully loaded.
plugin_load: plugin "curl" successfully loaded.
plugin_load: plugin "threshold" successfully loaded.
[2019-01-20 12:06:51] plugin_load: plugin "logfile" successfully loaded.
[2019-01-20 12:06:51] type = logfile, key = LogLevel, value = info
[2019-01-20 12:06:51] [info] plugin_load: plugin "target_notification" successfully loaded.
[2019-01-20 12:06:51] [info] Initialization complete, entering read-loop.

[2019-01-20 12:07:11] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000.
[2019-01-20 12:07:12] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c02, type = response_code, message = Host i09, plugin curl (instance couchdb_c02) type response_code: All data sources are within range again. Current value of "value" is 200.000000.
[2019-01-20 12:07:12] [error] curl plugin: curl_easy_perform failed with status 28: Connection timed out after 515 milliseconds
[2019-01-20 12:07:21] [info] Notification: severity = FAILURE, host = i09, plugin = curl, plugin_instance = couchdb_c03, type = response_code, message = i09/curl-couchdb_c03/response_code has not been updated for 29.474 seconds.
^^^ good this is what expected to see - curl fails and a notification is triggered

[2019-01-20 12:07:21] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000.
[2019-01-20 12:07:22] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c02, type = response_code, message = Host i09, plugin curl (instance couchdb_c02) type response_code: All data sources are within range again. Current value of "value" is 200.000000.
[2019-01-20 12:07:32] [error] curl plugin: curl_easy_perform failed with status 28: Connection timed out after 528 milliseconds
^^^ woops where is the next notification?

[2019-01-20 12:07:41] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000.
...

# input

https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d#file-collectd-conf-L23-L39

<Plugin curl>
  <Page "couchdb_c01">
    URL "http://c01.skunkwerks.at:5984/_up"
    Timeout 500
    MeasureResponseCode true
  </Page>
  <Page "couchdb_c02">
...

# notification

https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d#file-collectd-conf-L23-L39

LoadPlugin target_notification
LoadPlugin threshold
<Plugin "threshold">
  <Plugin "curl">
  Instance "couchdb_c01"
    <Type "response_code">
      FailureMin    200
      FailureMax    299
      Persist       true
      PersistOK     true
    </Type>
  Instance "couchdb_c02"
...

Is this is a bug or do I need to arrange my collectd.conf differently?

FreeBSD 12.0-RELEASE-p2 amd64
collectd 5.8.1.git (FreeBSD packages)

A+
Dave



More information about the collectd mailing list