[collectd] Exec plugin - forked process freezing with 4.10.1

Gerrie Roos groos at xiplink.com
Wed Jun 6 15:40:43 CEST 2012


Hi Jesse

We are using collectd 4.10.2 (close enough) with unixsock, exec and a host
of other plugins.  We experienced similar problems and I spent a lot of time
debugging and getting it stable.  I've posted my patches to the mailing
list, but will attach them here again.  They should apply easily (touch
wood!) and there's a chance it will make things better.  Here's the patches,
note not all are really relevant, I suggest you just apply the first 3 or 4:

Quick overview:
    Fixed various collectd memory leaks.
    Fixed collectd's unixsock read interrupted by SIGCHLD's.
    Fixed collectd exec module: incorrectly exited select loop on signal.
    Fixed incorrect collectd warning when configuring network plugin.
    Collectd's scale plugin now supports scaling only on specific data
sources, yay!  (New feature)
 
Apologies if I wasn't supposed to paste the patch in the body of the email,
but here goes:
 
 
commit ba04be9feeec4f40b9ada1cae64c0a27634f1c76
Author: Gerrie Roos <groos at xiplink.com>
Date:   Fri May 25 05:38:47 2012 +0200
 
    Fixed various collectd memory leaks.
    
    Once I understood what's going on I tried to keep the changes to a
minimum.

diff --git a/src/collectd.c b/src/collectd.c
index 277d3b0..f5df460 100644
--- a/src/collectd.c
+++ b/src/collectd.c
@@ -77,6 +77,7 @@ static void sig_usr1_handler (int __attribute__((unused))
signal)
  pthread_attr_init (&attr);
  pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
  pthread_create (&thread, &attr, do_flush, NULL);
+ pthread_attr_destroy (&attr);
 }
 
 static int init_hostname (void)
diff --git a/src/exec.c b/src/exec.c
index 322ff67..265ec01 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -827,6 +827,7 @@ static int exec_read (void) /* {{{ */
     pthread_attr_init (&attr);
     pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
     pthread_create (&t, &attr, exec_read_one, (void *) pl);
+    pthread_attr_destroy(&attr);
   } /* for (pl) */
 
   return (0);
@@ -870,6 +871,7 @@ static int exec_notification (const notification_t *n,
/* {{{ */
     pthread_attr_init (&attr);
     pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
     pthread_create (&t, &attr, exec_notification_one, (void *) pln);
+    pthread_attr_destroy (&attr);
   } /* for (pl) */
 
   return (0);
diff --git a/src/meta_data.c b/src/meta_data.c
index 6a336c4..9307e5e 100644
--- a/src/meta_data.c
+++ b/src/meta_data.c
@@ -214,6 +214,7 @@ void meta_data_destroy (meta_data_t *md) /* {{{ */
   if (md == NULL)
     return;
 
+  pthread_mutex_destroy(&md->lock);
   md_entry_free (md->head);
   free (md);
 } /* }}} void meta_data_destroy */
diff --git a/src/unixsock.c b/src/unixsock.c
index 1010b59..dea8be0 100644
--- a/src/unixsock.c
+++ b/src/unixsock.c
@@ -187,6 +187,7 @@ static void *us_handle_client (void *arg)
   close (fdin);
   close (fdout);
   pthread_exit ((void *) 1);
+  return ((void *) 0);
  }
 
  fhout = fdopen (fdout, "w");
@@ -198,6 +199,7 @@ static void *us_handle_client (void *arg)
   fclose (fhin); /* this closes fdin as well */
   close (fdout);
   pthread_exit ((void *) 1);
+  return ((void *) 0);
  }
 
  /* change output buffer to line buffered mode */
@@ -209,6 +211,7 @@ static void *us_handle_client (void *arg)
   fclose (fhin);
   fclose (fhout);
   pthread_exit ((void *) 1);
+  return ((void *) 0);
  }
 
  while (42)
@@ -253,6 +256,7 @@ static void *us_handle_client (void *arg)
    fclose (fhin);
    fclose (fhout);
    pthread_exit ((void *) 1);
+   return ((void *) 0);
   }
 
   if (strcasecmp (fields[0], "getval") == 0)
@@ -307,9 +311,13 @@ static void *us_server_thread (void
__attribute__((unused)) *arg)
  pthread_t th;
  pthread_attr_t th_attr;
 
+ pthread_attr_init (&th_attr);
+ pthread_attr_setdetachstate (&th_attr, PTHREAD_CREATE_DETACHED);
+
  if (us_open_socket () != 0)
   pthread_exit ((void *) 1);
 
+
  while (loop != 0)
  {
   DEBUG ("unixsock plugin: Calling accept..");
@@ -325,6 +333,7 @@ static void *us_server_thread (void
__attribute__((unused)) *arg)
      sstrerror (errno, errbuf, sizeof (errbuf)));
    close (sock_fd);
    sock_fd = -1;
+   pthread_attr_destroy(&th_attr);
    pthread_exit ((void *) 1);
   }
 
@@ -341,9 +350,6 @@ static void *us_server_thread (void
__attribute__((unused)) *arg)
 
   DEBUG ("Spawning child to handle connection on fd #%i", *remote_fd);
 
-  pthread_attr_init (&th_attr);
-  pthread_attr_setdetachstate (&th_attr, PTHREAD_CREATE_DETACHED);
-
   status = pthread_create (&th, &th_attr, us_handle_client, (void *)
remote_fd);
   if (status != 0)
   {
@@ -358,6 +364,7 @@ static void *us_server_thread (void
__attribute__((unused)) *arg)
 
  close (sock_fd);
  sock_fd = -1;
+ pthread_attr_destroy(&th_attr);
 
  status = unlink ((sock_file != NULL) ? sock_file : US_DEFAULT_PATH);
  if (status != 0)
diff --git a/src/utils_cmd_listval.c b/src/utils_cmd_listval.c
index 4ca9646..f923fca 100644
--- a/src/utils_cmd_listval.c
+++ b/src/utils_cmd_listval.c
@@ -89,8 +89,9 @@ int handle_listval (FILE *fh, char *buffer)
 
   print_to_socket (fh, "%i Value%s found\n",
       (int) number, (number == 1) ? "" : "s");
-  for (i = 0; i < number; i++)
+  for (i = 0; i < number; i++) {
     print_to_socket (fh, "%u %s\n", (unsigned int) times[i], names[i]);
+  }
 
   free_everything_and_return (0);
 } /* int handle_listval */
diff --git a/src/utils_fbhash.c b/src/utils_fbhash.c
index d20b7e3..9f1058b 100644
--- a/src/utils_fbhash.c
+++ b/src/utils_fbhash.c
@@ -234,6 +234,7 @@ void fbh_destroy (fbhash_t *h) /* {{{ */
   if (h == NULL)
     return;
 
+  pthread_mutex_destroy(&h->lock);
   free (h->filename);
   fbh_free_tree (h->tree);
 } /* }}} void fbh_destroy */
 
commit 402e0a1746a67e47619ffc4ce2ad6317e8884cfe
Author: Gerrie Roos <groos at xiplink.com>
Date:   Tue Apr 17 15:22:36 2012 +0200
 
    Fixed collectd's unixsock read interrupted by SIGCHLD's.
 
diff --git a/src/unixsock.c b/src/unixsock.c
index 0b89748..1010b59 100644
--- a/src/unixsock.c
+++ b/src/unixsock.c
@@ -222,6 +222,9 @@ static void *us_handle_client (void *arg)
   errno = 0;
   if (fgets (buffer, sizeof (buffer), fhin) == NULL)
   {
+   if (errno == EINTR)
+    continue;
+
    if (errno != 0)
    {
     char errbuf[1024];
 
commit 9f4c960f94bc70d680400b50a84655f288e7e96d
Author: Gerrie Roos <groos at xiplink.com>
Date:   Tue Jan 10 09:00:42 2012 +0200
 
    Fixed collectd exec module: incorrectly exited select loop on signal.
 
diff --git a/src/exec.c b/src/exec.c
index c64f949..322ff67 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -584,8 +584,15 @@ static void *exec_read_one (void *arg) /* {{{ */
   /* We use a copy of fdset, as select modifies it */
   copy = fdset;
 
-  while (select(highest_fd + 1, &copy, NULL, NULL, NULL ) > 0)
+  while (1)
   {
+    int e = select(highest_fd + 1, &copy, NULL, NULL, NULL );
+    if (e < 0) {
+      if (errno == EINTR)
+ continue;
+      break;
+    }
+
     int len;
 
     if (FD_ISSET(fd, &copy))
 
commit 27dd489e1242af63d2df04dfe627c440512e8da1
Author: Gerrie Roos <groos at xiplink.com>
Date:   Tue Dec 20 06:30:59 2011 +0200
 
    Fixed incorrect collectd warning when configuring network plugin.
 
diff --git a/src/network.c b/src/network.c
index 412b457..aa1c858 100644
--- a/src/network.c
+++ b/src/network.c
@@ -1685,9 +1685,9 @@ static int network_set_interface (const sockent_t *se,
const struct addrinfo *ai
  }
 
  /* else: Not a multicast interface. */
-#if defined(HAVE_IF_INDEXTONAME) && HAVE_IF_INDEXTONAME &&
defined(SO_BINDTODEVICE)
  if (se->interface != 0)
  {
+#if defined(HAVE_IF_INDEXTONAME) && HAVE_IF_INDEXTONAME &&
defined(SO_BINDTODEVICE)
   char interface_name[IFNAMSIZ];
 
   if (if_indextoname (se->interface, interface_name) == NULL)
@@ -1704,11 +1704,10 @@ static int network_set_interface (const sockent_t
*se, const struct addrinfo *ai
      sstrerror (errno, errbuf, sizeof (errbuf)));
    return (-1);
   }
- }
 /* #endif HAVE_IF_INDEXTONAME && SO_BINDTODEVICE */
 
 #else
- WARNING ("network plugin: Cannot set the interface on a unicast "
+  WARNING ("network plugin: Cannot set the interface on a unicast "
    "socket because "
 # if !defined(SO_BINDTODEVICE)
    "the the \"SO_BINDTODEVICE\" socket option "
@@ -1718,6 +1717,8 @@ static int network_set_interface (const sockent_t *se,
const struct addrinfo *ai
    "is not available on your system.");
 #endif
 
+ }
+
  return (0);
 } /* }}} network_set_interface */
 
commit 641dc2dd4c595ede24c3997e4aa9db5c4528a2cd
Author: Gerrie Roos <groos at xiplink.com>
Date:   Wed May 25 18:28:55 2011 +0200
 
    Collectd's scale plugin now supports scaling only on specific data
sources, yay!
 
diff --git a/src/target_scale.c b/src/target_scale.c
index 29fecdf..a8d7038 100644
--- a/src/target_scale.c
+++ b/src/target_scale.c
@@ -29,6 +29,9 @@ struct ts_data_s
 {
  double factor;
  double offset;
+
+ char **data_sources;
+ size_t data_sources_num;
 };
 typedef struct ts_data_s ts_data_t;
 
@@ -300,6 +303,67 @@ static int ts_config_set_double (double *ret,
oconfig_item_t *ci) /* {{{ */
  return (0);
 } /* }}} int ts_config_set_double */
 
+static int ts_config_add_data_source(ts_data_t *data, /* {{{ */
+  oconfig_item_t *ci)
+{
+ size_t new_data_sources_num;
+ char **temp;
+ int i;
+
+ /* Check number of arbuments. */
+ if (ci->values_num < 1)
+ {
+  ERROR ("`value' match: `%s' needs at least one argument.",
+    ci->key);
+  return (-1);
+ }
+
+ /* Check type of arguments */
+ for (i = 0; i < ci->values_num; i++)
+ {
+  if (ci->values[i].type == OCONFIG_TYPE_STRING)
+   continue;
+
+  ERROR ("`value' match: `%s' accepts only string arguments "
+    "(argument %i is a %s).",
+    ci->key, i + 1,
+    (ci->values[i].type == OCONFIG_TYPE_BOOLEAN)
+    ? "truth value" : "number");
+  return (-1);
+ }
+
+ /* Allocate space for the char pointers */
+ new_data_sources_num = data->data_sources_num + ((size_t) ci->values_num);
+ temp = (char **) realloc (data->data_sources,
+   new_data_sources_num * sizeof (char *));
+ if (temp == NULL)
+ {
+  ERROR ("`value' match: realloc failed.");
+  return (-1);
+ }
+ data->data_sources = temp;
+
+ /* Copy the strings, allocating memory as needed.  */
+ for (i = 0; i < ci->values_num; i++)
+ {
+  size_t j;
+
+  /* If we get here, there better be memory for us to write to.  */
+  assert (data->data_sources_num < new_data_sources_num);
+
+  j = data->data_sources_num;
+  data->data_sources[j] = sstrdup (ci->values[i].value.string);
+  if (data->data_sources[j] == NULL)
+  {
+   ERROR ("`value' match: sstrdup failed.");
+   continue;
+  }
+  data->data_sources_num++;
+ }
+
+ return (0);
+} /* }}} int ts_config_add_data_source */
+
 static int ts_destroy (void **user_data) /* {{{ */
 {
  ts_data_t **data;
@@ -309,6 +373,13 @@ static int ts_destroy (void **user_data) /* {{{ */
 
  data = (ts_data_t **) user_data;
 
+ if (*data && (*data)->data_sources) {
+  size_t i;
+  for (i = 0; i < (*data)->data_sources_num; i++)
+   free((*data)->data_sources[i]);
+  free((*data)->data_sources);
+ }
+
  free (*data);
  *data = NULL;
 
@@ -341,6 +412,8 @@ static int ts_create (const oconfig_item_t *ci, void
**user_data) /* {{{ */
     status = ts_config_set_double (&data->factor, child);
   else if (strcasecmp ("Offset", child->key) == 0)
     status = ts_config_set_double (&data->offset, child);
+  else if (strcasecmp ("DataSource", child->key) == 0)
+    status = ts_config_add_data_source(data, child);
   else
   {
    ERROR ("Target `scale': The `%s' configuration option is not understood
"
@@ -393,6 +466,18 @@ static int ts_invoke (const data_set_t *ds,
value_list_t *vl, /* {{{ */
 
  for (i = 0; i < ds->ds_num; i++)
  {
+  /* If we've got a list of data sources, is it in the list? */
+  if (data->data_sources) {
+   size_t j;
+   for (j = 0; j < data->data_sources_num; j++)
+    if (strcasecmp(ds->ds[i].name, data->data_sources[j]) == 0)
+     break;
+
+   /* No match, ignore */
+   if (j >= data->data_sources_num)
+    continue;
+  }
+
   if (ds->ds[i].type == DS_TYPE_COUNTER)
    ts_invoke_counter (ds, vl, data, i);
   else if (ds->ds[i].type == DS_TYPE_GAUGE)
 
 
 
 
Hope you find this useful,
gerrie





Message: 1
Date: Wed, 6 Jun 2012 10:02:31 +0930
From: Jesse Reynolds <jesse at bulletproof.net>
To: collectd at verplant.org
Subject: [collectd] Exec plugin - forked process freezing with 4.10.1
	on	Ubuntu 12.04 (precise)
Message-ID: <B76A766A-8364-4C57-936C-65EE0150F7C3 at bulletproof.net>
Content-Type: text/plain; charset=us-ascii

Hello

We've been using the Exec plugin to great effect on our collectd servers
running on Ubuntu 10.04 (lucid) with collectd version 4.9.x but after
upgrading to Ubuntu 12.04 (precise) and using the new version of collectd it
ships with (4.10.1) the Exec plugin is now intermittently failing. What
happens is similar to a thread from 2010 where the collectd process forks
but then freezes up before exec'ing the external command specified in the
Exec plugin definition. 

I have reproduced this on my laptop in a VirtualBox VM running precise and
the same version of collectd. The details are here:
https://gist.github.com/2878994 ... 

This feels like a collectd bug, but before I raise a bug report is there
anything obvious anyone can think of here that I might have missed? 

Thank you
Jesse

   Jesse Reynolds
   R&D Engineer
   Bulletproof Networks
   Mission Critical Hosting
   http://www.bulletproof.net/ 
   tel: 1300 663 903




------------------------------

Message: 2
Date: Wed, 6 Jun 2012 13:48:21 +0930
From: Jesse Reynolds <jesse at bulletproof.net>
To: collectd at verplant.org
Subject: Re: [collectd] Exec plugin - forked process freezing with
	4.10.1 on	Ubuntu 12.04 (precise)
Message-ID: <BB0E7288-9C54-4C09-A206-121819CE6985 at bulletproof.net>
Content-Type: text/plain; charset=us-ascii


On 06/06/2012, at 10:02 AM, Jesse Reynolds wrote:
...
> I have reproduced this on my laptop in a VirtualBox VM running precise and
the same version of collectd. The details are here:
https://gist.github.com/2878994 ... 

Doing some more testing and I've managed to reproduce the problem with the
network and rrdtool plugins not activated. If I also deactivate the unixsock
plugin then the problem goes away. So I think this is something to do with
an interplay between the unixsock and exec plugins. 

Is anyone else using the exec and unixsock plugins with the collectd 4.10.1
as shipped with precise? 

Is it worth compiling 4.10.1 with debug symbols, reproducing, and attaching
gdb to the frozen process? 

Jesse




------------------------------

_______________________________________________
collectd mailing list
collectd at verplant.org
http://mailman.verplant.org/listinfo/collectd


End of collectd Digest, Vol 81, Issue 3
***************************************
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2178 / Virus Database: 2433/5048 - Release Date: 06/05/12




More information about the collectd mailing list