[collectd] exec plugin stuck on mutex
Ryan Tomayko
r at tomayko.com
Wed Mar 3 05:27:27 CET 2010
We're seeing some strange behavior with the exec plugin. It works
great for a short period of time (usually a few hours) and then stops
reporting. I've also confirmed that the configured scripts are not
being exec'd once collectd gets into this funny state, so it appears
not to be a network/reporting problem, but a problem with the exec
plugin itself. All other aspects of collectd work fine while the exec
plugin is in this state.
Basic info:
$ uname -a
Linux fs5b.rs.github.com 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18
UTC 2009 x86_64 GNU/Linux
$ collectd --help
<snip>
collectd 4.8.1, http://collectd.org/
Once I notice the plugin has stopped reporting, I have an extra
process (28489) hanging around:
$ pstree -apu 22935
collectdmon,22935 -P /var/run/collectdmon.pid -- -C
/etc/collectd/collectd.conf
collectd,22936 -C /etc/collectd/collectd.conf -f
collectd,28489 -C /etc/collectd/collectd.conf -f
{collectd},22937
{collectd},22938
{collectd},22939
{collectd},22940
{collectd},22941
{collectd},28487
That process seems to exist only when the exec plugin is no longer
reporting. Sometimes there's two of these processes.
strace reports that the extra process is sitting in a mutex. It never
leaves this state:
$ sudo strace -p 28489
Process 28489 attached - interrupt to quit
futex(0x7f2f7d4e8fb0, FUTEX_WAIT_PRIVATE, 2, NULL
We currently have two different exec plugins configured on this
machine. Both are short-lived (i.e. don't sleep loop on INTERVAL):
<Plugin exec>
Exec "nobody" "/etc/collectd/exec/haproxy-fs.sh"
Exec "nobody" "/etc/collectd/exec/ernie-fs.sh"
</Plugin>
Any ideas what might be going on here or information I could provide
to help find a root cause?
Thanks,
Ryan
More information about the collectd
mailing list