[collectd] 4.5.0 on solaris with gcc and the postgresql plugin

Admin collectd-info at internode.com.au
Tue Oct 14 06:17:17 CEST 2008


Hi all

Here are a few things I've come across with regards to running 4.5.0 on 
solaris

Building with gcc fails for two reasons.
The first is that gcc will throw pragma errors during compilation (as it does 
with most builds on solaris), and the AM_CFLAGS setting (-Wall -Werror) 
causes the build to fail. Fixed by changing this to -Werror only

--- src/Makefile.am.orig        Thu Sep  4 22:31:09 2008
+++ src/Makefile.am     Tue Oct  7 14:39:57 2008
@@ -10,7 +10,7 @@
 endif
 
 if COMPILER_IS_GCC
-AM_CFLAGS = -Wall -Werror
+AM_CFLAGS = -Werror
 endif

--- src/Makefile.in.orig        Fri Sep  5 18:23:51 2008
+++ src/Makefile.in     Tue Oct  7 14:40:22 2008
@@ -1150,7 +1150,7 @@
 top_builddir = @top_builddir@
 top_srcdir = @top_srcdir@
 SUBDIRS = $(am__append_1) $(am__append_2) $(am__append_3)
- at COMPILER_IS_GCC_TRUE@AM_CFLAGS = -Wall -Werror
+ at COMPILER_IS_GCC_TRUE@AM_CFLAGS = -Werror
 AM_CPPFLAGS = -DPREFIX='"${prefix}"' \
        -DCONFIGFILE='"${sysconfdir}/${PACKAGE_NAME}.conf"' \
        -DLOCALSTATEDIR='"${localstatedir}"' \

Any reference to networking headers (specifically sys/strft.h which most of 
them include) will cause building with gcc to fail with an error like the 
following

In file included from /usr/include/sys/stream.h:28,
                 from /usr/include/netinet/in.h:66,
                 from /usr/include/sys/socket.h:45,
                 from interface.c:32:
/usr/include/sys/strft.h:112:4: attempt to use poisoned "sprintf"
/usr/include/sys/strft.h:114:4: attempt to use poisoned "sprintf"
*** Error code 1

Stop.

Here is the offending code from sys/strft.h

        for (_ix = 0; _tot != 0ll && _ix < tdelta_t_sz; _ix++) {        \
                if ((_ix + 1) == tdelta_t_sz) {                         \
                        *_t = '>';                                      \
                } else if (_ix < 8) {                                   \
                        sprintf(_t, "< 0.%09llds", _ns);                \
                } else {                                                \
                        sprintf(_t, "< %lld.%.*ss", _ns / 1000000000ll, \
                            9 - (_ix - 8), "000000000");                \
                }                                                       \
                _n = ((tv[_ix][0] * 10000ll / _toc) + 5ll) / 10ll;      \
                _nl = _n / 10ll;                                        \


Fixed by surrounding the offending include with the DONT_POISON_SPRINTF_YET 
trick from the perl plugin. I don't know if this is the correct fix but it 
works.

eg for the interface plugin

--- src/interface.c.orig        Wed Sep 10 15:26:45 2008
+++ src/interface.c     Wed Sep 10 15:27:27 2008
@@ -20,7 +20,9 @@
  *   Sune Marcher <sm at flork.dk>
  **/
 
+#define DONT_POISON_SPRINTF_YET 1
 #include "collectd.h"
+#undef DONT_POISON_SPRINTF_YET
 #include "common.h"
 #include "plugin.h"
 #include "configfile.h"
@@ -32,6 +34,10 @@
 #  include <sys/socket.h>
 #endif
 
+#if __GNUC__
+# pragma GCC poison sprintf
+#endif
+
 /* One cannot include both. This sucks. */
 #if HAVE_LINUX_IF_H
 #  include <linux/if.h>


The postgresql plugin is missing a couple of PQclear() calls

--- src/postgresql.c.orig       2008-10-13 15:10:21.648228533 +1030
+++ src/postgresql.c    2008-10-13 17:26:08.884228200 +1030
@@ -417,8 +417,10 @@
        }
 
        rows = PQntuples (res);
-       if (1 > rows)
+       if (1 > rows) {
+               PQclear (res);
                return 0;
+       }
 
        cols = PQnfields (res);
        if (query->cols_num != cols) {
@@ -442,6 +444,7 @@
                                submit_gauge (db, col.type, col.type_instance, 
value);
                }
        }
+       PQclear (res);
        return 0;
 } /* c_psql_exec_query */


On the subject of the postgresql plugin on solaris, I have it working nicely 
on one machine (an 8/07 install of solaris) but not so well on an earlier 
system (11/06). On the 11/06 system, collectd seems to be reinitializing the 
postgresql plugin upon noticing some sort of kstat related change but keeping 
the original database connections open.

ie after enabling loglevel debug, I see the following in the log

[2008-10-13 19:40:56] postgresql: Sucessfully connected to database template1 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19121)
[2008-10-13 19:40:56] postgresql: Sucessfully connected to database database1 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19122)
[2008-10-13 19:40:56] postgresql: Sucessfully connected to database database2 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19123)
[2008-10-13 19:40:56] postgresql: Sucessfully connected to database database3 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19124)
[2008-10-13 19:40:56] postgresql: Sucessfully connected to database database4 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19125)
[2008-10-13 19:43:46] kstat chain has been updated
[2008-10-13 19:43:46] postgresql: Sucessfully connected to database template1 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19305)
[2008-10-13 19:43:46] postgresql: Sucessfully connected to database database1 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19306)
[2008-10-13 19:43:46] postgresql: Sucessfully connected to database database2 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19307)
[2008-10-13 19:43:46] postgresql: Sucessfully connected to database database3 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19308)
[2008-10-13 19:43:46] postgresql: Sucessfully connected to database database4 
(user pgsql) at server localhost:5432 (server version: 8.3.1, protocol 
version: 3, pid: 19309)

After the kstat change message and the notification of the new successful 
database connections being made, the old connections are still open. So every 
2 to 5 minutes or so the daemon makes (in this case) an additional 5 
connections to the database.

This is the section of collectd.c where the message comes from

#if HAVE_LIBKSTAT
static void update_kstat (void)
{
        if (kc == NULL)
        {
                if ((kc = kstat_open ()) == NULL)
                        ERROR ("Unable to open kstat control structure");
        }
        else
        {
                kid_t kid;
                kid = kstat_chain_update (kc);
                if (kid > 0)
                {
                        INFO ("kstat chain has been updated");
                        plugin_init_all ();
                }
                else if (kid < 0)
                        ERROR ("kstat chain update failed");
                /* else: everything works as expected */
        }

        return;
} /* static void update_kstat (void) */
#endif /* HAVE_LIBKSTAT */

I don't see the 'kstat chain has been updated' message on the 8/07 system so 
it could be a solaris bug, but I wonder about whether the postgresql plugin 
is missing something to tell it to drop the old connections (or whether 
multiple copies of the plugin are being initialised or something like that.)

Also on the subject of the postgresql plugin, would it be possible to make it 
an option as to whether the connections to the database are persistent or 
not?

Thanks for your input

Admin.



More information about the collectd mailing list