[collectd] Perl plugin to pass notifications to nagios

Russell Poyner rpoyner at engr.wisc.edu
Tue Aug 28 01:37:06 CEST 2012


Attached is a short perl plugin for collectd along with some sample 
config files.

The plugin catches collectd notifications and passes them to nagios as 
passive checks submitted to the nagios command pipe. It has been tested 
with collectd 4.10 and Icinga 1.6.1 on Ubuntu 12.04.

nagios-passive.pm          the perl plugin
collectd.conf              collectd config with 2 configuration examples
collectd-nagios-passive.cfg      Sample nagios config file with entries 
for all collectd plugins.
README.nagios-passive      the README


I had fun writing this. Hopefully someone will have fun using it.

RP

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios_passive.pm
Type: application/x-perl
Size: 7211 bytes
Desc: not available
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120827/abe1c9f5/attachment-0001.bin>
-------------- next part --------------
# Config samples for nagios-passive
# Need this with collectd 4.10 at least
<LoadPlugin perl>
	Globals true
</LoadPlugin>

# The minimal config
# Will report all notices to nagios with a service_description of collectd
<Plugin perl>
        IncludeDir "/usr/local/collectd/perl" # where the perl modules are kept
        BaseName "Collectd::Plugins"
#       EnableDebugger ""
        LoadPlugin nagios_passive
        <Plugin nagios_passive>
                cmd_file "/usr/local/icinga/var/rw/icinga.cmd" # the nagios command pipe
        </Plugin>
</Plugin>

# A fuller example
# Reports custom service_descriptions for hddtemp, sensors, and df
# Two hosts get hddtemps per drive: hddtemp-sd[a-z]
# Other hosts get hddtemp
# Other plugins are reported as my-collectd
# name or default name can contain ${plugin},${plugin_instance},${type} or ${type_instance}
# which get replaced by the values from the collectd notification
# defaultname ${plugin} will report all service_descriptions as the collectd plugin name.
<Plugin perl>
	IncludeDir "/usr/local/collectd/perl" # where the perl modules are kept
	BaseName "Collectd::Plugins"
#	EnableDebugger ""
	LoadPlugin nagios_passive
	<Plugin nagios_passive>
		cmd_file "/usr/local/icinga/var/rw/icinga.cmd" # the nagios command pipe
		debug false # Set this to true to get output on STDOUT
		no_unknown true # Default is false. True prevents unknown events from being sent
		log_level 1 # 1 default, 0-4 possible
		defaultname "my-collectd" # collectd is the default, but can be any valid name string
		<ServiceName "hddtemp"> # This is the name of a collectd plugin
			name "${plugin}" # ${plugin} gets replaced with the plugin name
			<HostName "unreliable"> # A file server
				name "${plugin}-${type_instance}" # Can declare per-host service_descriptions
			</HostName>
                        <HostName "kiss-your-data-goodbye"> # Another file server
                                name "${plugin}-${type_instance}"
                        </HostName>
		</ServiceName>
		ServiceName "sensors" # Same as <ServiceName "sensors">name ${plugin}</ServiceName>
		ServiceName "df"
	</Plugin>
</Plugin>

Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"

-------------- next part --------------
# collectd-nagios-passive.cfg
# Example/template file for integrating nagios with collectd
#
# This file is set up to use passive checks sent from collectd to nagios
# via the collectd perl plugin nagios-passive where possible.
# It uses the collectd-nagios plugin in cases where collectd thresholds aren't currently working
#
# Russ Poyner 8/2012

# Define a collectd-nagios command to use in cases where passive checks with thresholds don't work
# 'collectd-nagios' command definition
# ARG1 = plugin/plugin-instance
# ARG2 = -g <consolidation-function>
# ARG3 = warning limits
# ARG$ = critical limits
define command{
        command_name    collectd-nagios
        command_line    /usr/bin/collectd-nagios -s /tmp/collectd-unixsock -n $ARG1$ -H $HOSTNAME$ $ARG2$ -w $ARG3$ -c $ARG4$
        }

# Define a service template to use with per-plugin passive checks
define service {
        name       collectd-service
        action_url http://monitor.school.edu/collection3/bin/index.cgi?hostname=$HOSTNAME$&plugin=$SERVICEDESC$&timespan=86400&action=show_selection&ok_button=OK  ;action URL points to graphs, in this case from collection3
	max_check_attempts      1	; With passive checks from collectd nagios should go to a hard state on the first notice.
	active_checks_enabled   0	; Don't need active checks
	passive_checks_enabled  1	; Passive checks from nagios-passive
	initial_state           o	; OK to start out
	check_command           check_ping!100.0,20%!500.0,60%	; A dummy command to keep nagios config happy
	contact_groups          admins	; default notification group
        register   0
        }

# By default nagios-passive will send all passive notifications to the collectd service
# If you don't want to see your collectd checks separated by plugin in nagios this is all you need.
define service{
	use			generic-service
#	host_name		* ; Could use this to enable collectd for all your hosts
	hostgroup_name		linux-servers,fileservers,cluster
	service_description	collectd
	max_check_attempts	1
	active_checks_enabled	0
	passive_checks_enabled	1
	initial_state		o
	check_command		check_ping!100.0,20%!500.0,60%
	contact_groups		admins
}

# This begins the list of per-collectd plugin services.
# If you configure nagios-passive to send notifications by plugin-name
# you need to uncomment, and adapt the corresponding services below.

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	apcups
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	apple_sensors
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ascent
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	battery
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	bind
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	conntrack
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	contextswitch
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	cpu
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	cpufreq
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	curl
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	curl_json
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	curl_xml
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	dbi
#}

# Using collectd-nagios to check the used disk space
define service{
	use			collectd-service
#	host_name		*
	hostgroup_name		linux-servers,fileservers,cluster
	service_description	df-root ; Service name chosen to match the plugin-instance string in collectd
	max_check_attempts	3
        active_checks_enabled	1
        passive_checks_enabled	0
	check_command		collectd-nagios!df/df-root!-g percentage!90!100
	contact_groups		+admins_sms ; At least nagios texts me ;-)
}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	dns
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	email
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	entropy
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ethstat
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	exec
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	filecount
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	fscache
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	gmond
#}

# service hddtemp used for most hosts with one disk
define service{
	use			collectd-service
#	host_name		*
	hostgroup_name		linux-servers,cluster
	service_description	hddtemp
}

# individual service hddtemp-sd? set up for file servers
# nagios-passive is configured with name ${plugin}-${type_instance} for these hosts.
define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sda
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sdb
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sdc
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sdd
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sde
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sdf
}

define service{
        use                     collectd-service
#       host_name               *
        hostgroup_name          fileservers
        service_description     hddtemp-sdg
}

define service{
	use			collectd-service
#	host_name		*
	hostgroup_name		linux-servers,fileservers,cluster
	service_description	interface
}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	iptables
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ipmi
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ipvs
#}

define service{
	use			collectd-service
#	host_name		*
	hostgroup_name		linux-servers,fileservers,cluster
	service_description	irq
}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	java
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	load
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	lpar
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	libvirt
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	madwifi
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	mbmon
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	md
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	memcachec
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	memcached
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	memory
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	modbus
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	multimeter
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	mysql
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	netapp
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	netlink
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	network
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	nfs
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	nginx
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ntpd
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	nut
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	numa
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	olsrd
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	onewire
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	openvpn
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	oracle
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	perl
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	pinba
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ping
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	postgresql
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	powerdns
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	processes
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	protocols
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	python
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	redis
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	routeros
#}

#define service{
#	use			collectd-service
#	host_name		monitor
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	rrdcached
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	sensors
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	serial
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	snmp
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	swap
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	table
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	tail
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	tape
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	tcpconns
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	teamspeak2
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	ted
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	thermal
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	tokyotyrant
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	uptime
#}

#define service{
#	use			collectd-service
##	host_name		*
#	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	users
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	varnish
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	vmem
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	vserver
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	wireless
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	xmms
#}

#define service{
#	use			collectd-service
#	host_name		*
##	hostgroup_name		linux-servers,fileservers,cluster
#	service_description	zfs_arc
#}
-------------- next part --------------
nagios-passive is a perl plugin for collectd. It catches collectd notifications 
and passes them to nagios as passive checks submitted to the nagios command pipe. 
It has been tested with collectd 4.10 and Icinga 1.6.1 on Ubuntu 12.04.

nagios-passive.pm          the perl plugin
collectd.conf              collectd config with 2 configuration examples
collectd-nagios-passive.cfg      Sample nagios config file with entries for all collectd plugins.

Getting nagios to accept passive checks from collectd is as much about configuring 
nagios as about getting collectd to submit the checks. Nagios ignores checks that 
don't match one of its configured service_descriptions, so you may end up adding a 
lot of service_descriptions to nagios.

The simplest way to get all collectd notifcations into nagios is to give them all 
the same service_description. nagios-passive defaults to submitting all checks with 
the description collectd. This has the advantage of being easy, but the disadvantage 
of lumping disparate checks under one name in nagios.

The next simplest is to submit all notifications with a name based on the collectd 
plugin name. defaultname ${plugin} will accomplish that. Separating checks by plugin 
improves the granularity in nagios, and complicates the nagios config.

A middle complexity config is to have nagios-passive submit some checks under a default 
name, and other, high priority checks, under custom names. This is what happens if 
defaultname is a static string, and you have ServiceName "<collectd plugin>" entries.

nagios-passive can also do fancier name-mangling of the service_description per-plugin 
and per-host. This will present the data with a lot of granularity in nagios, and has 
the potential to require a long and messy nagios configuration.

Collectd can send notifications based on missing values. By default nagios-passive will 
submit these to nagios as UNKNOWN checks. My experience was that this just caused the 
nagios services to flap, so I added the no_unknown boolean to the config to suppress these.


More information about the collectd mailing list