[collectd] exec plugin stuck on mutex

Ryan Tomayko r at tomayko.com
Wed Mar 3 05:27:27 CET 2010


We're seeing some strange behavior with the exec plugin. It works
great for a short period of time (usually a few hours) and then stops
reporting. I've also confirmed that the configured scripts are not
being exec'd once collectd gets into this funny state, so it appears
not to be a network/reporting problem, but a problem with the exec
plugin itself. All other aspects of collectd work fine while the exec
plugin is in this state.

Basic info:

    $ uname -a
    Linux fs5b.rs.github.com 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18
UTC 2009 x86_64 GNU/Linux

    $ collectd --help
    <snip>
    collectd 4.8.1, http://collectd.org/

Once I notice the plugin has stopped reporting, I have an extra
process (28489) hanging around:

    $ pstree -apu 22935
    collectdmon,22935 -P /var/run/collectdmon.pid -- -C
/etc/collectd/collectd.conf
      collectd,22936 -C /etc/collectd/collectd.conf -f
          collectd,28489 -C /etc/collectd/collectd.conf -f
          {collectd},22937
          {collectd},22938
          {collectd},22939
          {collectd},22940
          {collectd},22941
          {collectd},28487

That process seems to exist only when the exec plugin is no longer
reporting. Sometimes there's two of these processes.

strace reports that the extra process is sitting in a mutex. It never
leaves this state:

    $ sudo strace -p 28489
    Process 28489 attached - interrupt to quit
    futex(0x7f2f7d4e8fb0, FUTEX_WAIT_PRIVATE, 2, NULL

We currently have two different exec plugins configured on this
machine. Both are short-lived (i.e. don't sleep loop on INTERVAL):

    <Plugin exec>
        Exec "nobody" "/etc/collectd/exec/haproxy-fs.sh"
        Exec "nobody" "/etc/collectd/exec/ernie-fs.sh"
    </Plugin>

Any ideas what might be going on here or information I could provide
to help find a root cause?

Thanks,
Ryan



More information about the collectd mailing list