[collectd] New plugin for collectd : write_mysql

Mon May 7 19:44:17 CEST 2012

On 05/04/2012 07:07 AM, Cyril Feraudet wrote:
> Le 04.05.2012 08:52, Cyril Feraudet a écrit :
>
>> Le 03.05.2012 20:05, Daniel Hilst a écrit :
>>
>>     On 05/03/2012 03:46 PM, Daniel Hilst wrote:
>>
>>         On 02/14/2012 09:54 AM, Cyril Feraudet wrote:
>>
>>             Hi all, Here the write_mysql output plugin patch (from
>>             5.0.2) with modifications suggested by Octo. By the way,
>>             notification are handled, all memory used is freed on
>>             shutdown (thx Valgrind). Regards, Cyril Le 04.01.2012
>>             10:49, Cyril Feraudet a écrit :
>>
>>                 Hi Octo, Hi all, Thanks for your time. It will permit
>>                 to me to improve this plugin and my knowledge of C. Le
>>                 22.12.2011 01:06, Florian Forster aécrit :
>>
>>                     Hi Cyril, thank you very much for your patches!
>>                     I'm very sorry, but I can't give the plugin the
>>                     thorough review it deserves right now. But maybe
>>                     you find the following comments helpful. On Tue,
>>                     Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet
>>                     wrote:
>>
>>                         - with_mysql_libs=`$with_mysql_config --libs
>>                         2>/dev/null` +
>>                         with_mysql_libs=`$with_mysql_config --libs_r
>>                         2>/dev/null`
>>
>>                     This looks like a bug fix. Would it make sense to
>>                     add this change to the 4.10 branch? If so, would
>>                     you be willing to send a separate patch for this?
>>
>>                 You are right, I will submit a patch soon. write_mysql
>>                 is often called than mysql plugin and non thread safe
>>                 library make collectd crashing quickly.
>>
>>                         +CREATE TABLE `data` ( + `id` bigint(20) NOT
>>                         NULL auto_increment, + `timestamp` double NOT
>>                         NULL, + `host_id` int(11) NOT NULL, +
>>                         `plugin_id` int(11) NOT NULL, +
>>                         `plugin_instance` varchar(255) default NULL, +
>>                         `type_id` int(11) NOT NULL, + `typeinstance`
>>                         varchar(255) default NULL, + `dataset_id`
>>                         int(11) NOT NULL, + `value` double NOT NULL,
>>
>>                     One of the reasons why I didn't write a
>>                     "write_dbi" or "write_mysql" plugin before is
>>                     because I was uncertain about the schema to use.
>>                     Why did you chose to have a separate table for
>>                     host, plugin, plugin_instance, type, and
>>                     type_instance? I currently think that just having
>>                     an "identifier" table with five columns would be a
>>                     better trade- off between normalization and
>>                     complexity.
>>
>>                 The idea was to save space in `data` table and improve
>>                 querying in it. For now, I've 800MB `data` table
>>                 (MEMORY type) just for the last value of each metrics
>>                 (4207 servers / 462,395 metrics) using REPLACE instead
>>                 of INSERT statement. I think that SQL join are faster
>>                 on numeric id than varchar. But all of this things are
>>                 questionable and dependent of the final use. I think
>>                 about a customizable SQL statement in plugin
>>                 configuration.
>>
>>                         ATE TABLE `dataset` (
>>
>>                     Why do you store datasets in the database (in
>>                     addition to the types.db file)? Having the
>>                     definition in more than one place is bound to
>>                     create inconsistencies at some point
>>
>>                 This table is for readonly purpose and to make SQL
>>                 query independent of number of metrics in a type. It
>>                 also contain type added on the way by a plugin.
>>
>>                         diff --git a/src/write_mysql.c b/src/w
>>
>>                     /blockquote> typedef struct host_s host_t; +struct
>>                     host_s { + char
>>
>>                         name[DATA_MAX_NAME_LEN]; + int id; + host_t
>>                         *next_host; +};
>>
>>                     You will be doing a lot of lookups in this
>>                     cache(es). I think using a binary search tree (an
>>                     implementation if provided in src/utils_avltree.h)
>>                     would improve performance here.
>>
>>                 Thanks for the tips, Il will do that.
>>
>>                     int port =
>>
>>                         tatic int write_mysql_config (const char *key,
>>                         const char *value) { [...] + } else if
>>                         (strcasecmp ("Port", key) == 0) { + port =
>>                         value; You
>>
>>                     a char* to an int. This won't produce what you
>>                     expect. Use the function
>>                     service_name_to_port_number() from src/common.h.
>>                     +static int wri
>>
>>                 Your'e right I will fix it.
>>
>>                         t (void) { + conn = mysql_init(NULL); + if
>>                         (mysql_real_connect(conn, host, user, passwd,
>>                         database, port, NULL, 0) == NULL) { The conn
>>
>>                     be (re-)established in the write() callback, when
>>                     it is needed. Otherwise you will never recover
>>                     from connection failures. + if (!mysql_th
>>
>>                 Your'e right, I just issued it. I will fix it too.
>>
>>                         { + ERROR("write_mysql plugin: mysqlclient
>>                         Thread Safe OFF"); I wasn't
>>
>>                     s function. Good to know, thanks! +static int
>>                     add_host_id (
>>
>>                         me) { [...] + len = ssnprintf (query, sizeof
>>                         (query), "SELECT id FROM host WHERE name =
>>                         '%s'", hostname); I think we should
>>
>>                     statement with bound arguments (or whatever the
>>                     terminology). This way (a) MySQL doesn't have to
>>                     parse and optimize the query each time we need to
>>                     insert a host and (b) quoting is done for us by
>>                     the mysql library. +static int add_host_id (char *
>>
>>                 I will looking for. I never used it before.
>>
>>                         [...] + pthread_mutex_lock(&mutex); [...] + if
>>                         (row = mysql_fetch_row(result)) { + id =
>>                         atoi(row[0]); + mysql_free_result(result); +
>>                         pthread_mutex_unlock(&mutex); [...] +
>>                         pthread_mutex_unlock(&mutex); + return id; +}
>>                         Mutex is released twice
>>
>>                     +static int add_plugin_id (char
>>
>>                 I'll fix that.
>>
>>                         { +static int add_type_id (char *typename) { A
>>                         *lot* of duplicate code
>>
>>                     ite a function like: int add_string (const char
>>                     *table, const char *str); /* returns id on
>>                     success, less than zero on failure. */ +static int
>>                     write_mysql_write(con
>>
>>                         t *ds, const value_list_t *vl, + user_data_t
>>                         __attribute__((unused)) *user_data) { [...] +
>>                         len = ssnprintf (tmpquery, sizeof (tmpquery),
>>                         "INSERT INTO data " +
>>
>>                 "(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
>>
>>
>>                         + "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%lf)",
>>                         CDTIME_T_TO_DOUBLE (vl->time), host_id, +
>>                         plugin_id, vl->plugin_instance, type_id,
>>                         vl->type_instance, dataset_id );
>>
>>                     You really should prepare this statement. Parsing
>>                     this several thousand times per second will be a
>>                     major performance problem.
>>
>>                         +if (dso->type == DS_TYPE_GAUGE) { + len =
>>                         ssnprintf (query, sizeof (query), tmpquery,
>>                         vl->values[i].gauge); +} else { + if (rates ==
>>                         NULL) { + rates = uc_get_rate (ds, vl); + } +
>>                         if (isnan(rates[i])) { continue; } + len =
>>                         ssnprintf (query, sizeof (query), tmpquery,
>>                         rates[i]); +} +//INFO("toto: %d", toto);
>>                         +pthread_mutex_lock(&mutex);
>>                         +DEBUG("write_mysql plugin: %s", query);
>>                         +mysql_real_query(conn, query, len);
>>                         +pthread_mutex_unlock(&mutex);
>>
>>                     So you're inserting each data source in a element?
>>                     How can a client distinguish between the different
>>                     data sources? At least, you should add a
>>                     "ds_index" column or so. Thanks again and best
>>
>>                 There are dataset_id and type_id in `dataset` for
>>                 that. Here a query sample to get all metric in a type
>>                 in right order : select data.timestamp AS date,
>>                 dataset.name AS dataset_name, data.value AS value from
>>                 data, host, plugin, type, dataset where host.id =
>>                 data.host_id and plugin.id = data.plugin_id and
>>                 type.id = data.type_id and data.dataset_id =
>>                 dataset.id and host.name = 'eqds3pcold001' and
>>                 plugin.name = 'interface' and data.plugin_instance =
>>                 'eth0' and type.name = 'if_octets' In my case,
>>                 write_mysql is used to make, for example, sum of cpu
>>                 used on several thousand of server and re-inject it in
>>                 collectd to graph it. I've made a more flexible than
>>                 Ganglia application based on Collectd at work. My
>>                 compagny allow me to share all my work on collectd in
>>                 GPL like licence. I will Release a beta of my unamed
>>                 application soon a possible. Regards, Cyril Feraudet
>>                 _______________________________________________
>>                 collectd mailing list collectd at verplant.org
>>                 <mailto:collectd at verplant.org> collectd at verplant.org>
>>                 http://mailman.verplant.org/listinfo/collectd
>>
>>             _______________________________________________ collectd
>>             mailing list collectd at verplant.org
>>             <mailto:collectd at verplant.org>
>>             http://mailman.verplant.org/listinfo/collectd
>>
>>         Hello, I'm trying to compile collectd 5.0.2 with write_mysql
>>         pluing I've done this: http://pastebin.com/dPc6zgUH I'm on
>>         rhel 5.5 x86_64 Any idea?
>>
>>     I have to run libtoolize --force before compile
>>
>>     here is:
>>     tar vxf collectd-5.0.2.tar.bz2
>>     cd collectd-5.0.2
>>     patch -p1 -i ../0001-Adding-write_mysql-output-plugin.patch
>>     aclocal
>>     autoconf
>>     automake
>>     libtoolize  --force
>>     ./configure
>>     make
>>     make install all
>>
>>
>>     Now I'm facing this error:
>>     error] write_mysql plugin: Failed to bind param to statement : Statement
>>     not prepared / INSERT INTO data
>>     (date,host_id,plugin_id,plugin_instance,type_id,type_instance,dataset_id,value)VALUES
>>     (?,?,?,?,?,?,?,?)
>>
>>     What that means????
>>
>>     I tried Replace "false" without success
>>
>>     Thanks in advance,
>>
>>     Hilst
>>
>> This is an old patch, please use this one : git clone -b
>> cf/perfwatcher http://github.com/feraudet/collectd.git
>>
>> Regards,
>>
>> Cyril
>>
> Here a ready to build one
> https://github.com/downloads/feraudet/collectd/collectd-5.1.0-perfwatcher.tgz
>
> You have to re-create database from collectd/contrib/write_mysql.sql and
> take care to the new config block :
>
>   <Plugin write_mysql>
>       <Database master4example>
>            Host "localhost"
>            User "root"
>            Passwd ""
>            Database "collectd"
>            Port "3306"
>            Replace true
>        </Database>
>   </Plugin>
>
> Regards,
>
> Cyril
>

Hi, Cyril

The plugin is working with `Replace true' thanks!!

But when I change Replace to false it got this error:
[2012-05-07 17:04:31] [error] write_mysql plugin: Failed to execute 
re-prepared statement : Duplicate entry '1-1--1-free-1' for key 2 / 
INSERT INTO data 
(date,host_id,plugin_id,plugin_instance,type_id,type_instance,dataset_id,value) 
VALUES (?,?,?,?,?,?,?,?)

To by pass this I had to drop the data table and create it again with a 
foreign key in place of unique one, here is:

DROP TABLE IF EXISTS `data`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `data` (
   `id` bigint(20) NOT NULL AUTO_INCREMENT,
   `date` datetime NOT NULL,
   `host_id` int(11) NOT NULL,
   `plugin_id` int(11) NOT NULL,
   `plugin_instance` varchar(255) DEFAULT NULL,
   `type_id` int(11) NOT NULL,
   `type_instance` varchar(255) DEFAULT NULL,
   `dataset_id` int(11) NOT NULL,
   `value` double NOT NULL,
   PRIMARY KEY (`id`),
   FOREIGN KEY `host_id_3` (`host_id`) REFERENCES `host` (`id`),
   FOREIGN KEY `plugin_id_3` (`plugin_id`) REFERENCES `plugin` (`id`),
   FOREIGN KEY `type_id_3` (`type_id`) REFERENCES `type` (`id`),
   FOREIGN KEY `dataset_id_3` (`dataset_id`) REFERENCES `dataset` (`id`)
) ENGINE=MEMORY AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

Would that ruin something later?

Thanks in advance..

Hilst,

-- 
Follow the white rabbit!