[collectd] New plugin for collectd : write_mysql

Mon May 7 22:55:51 CEST 2012

On 7 mai 2012, at 19:44, Daniel Hilst wrote:

> On 05/04/2012 07:07 AM, Cyril Feraudet wrote:
>> Le 04.05.2012 08:52, Cyril Feraudet a écrit :
>> 
>>> Le 03.05.2012 20:05, Daniel Hilst a écrit :
>>> 
>>>    On 05/03/2012 03:46 PM, Daniel Hilst wrote:
>>> 
>>>        On 02/14/2012 09:54 AM, Cyril Feraudet wrote:
>>> 
>>>            Hi all, Here the write_mysql output plugin patch (from
>>>            5.0.2) with modifications suggested by Octo. By the way,
>>>            notification are handled, all memory used is freed on
>>>            shutdown (thx Valgrind). Regards, Cyril Le 04.01.2012
>>>            10:49, Cyril Feraudet a écrit :
>>> 
>>>                Hi Octo, Hi all, Thanks for your time. It will permit
>>>                to me to improve this plugin and my knowledge of C. Le
>>>                22.12.2011 01:06, Florian Forster aécrit :
>>> 
>>>                    Hi Cyril, thank you very much for your patches!
>>>                    I'm very sorry, but I can't give the plugin the
>>>                    thorough review it deserves right now. But maybe
>>>                    you find the following comments helpful. On Tue,
>>>                    Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet
>>>                    wrote:
>>> 
>>>                        - with_mysql_libs=`$with_mysql_config --libs
>>>                        2>/dev/null` +
>>>                        with_mysql_libs=`$with_mysql_config --libs_r
>>>                        2>/dev/null`
>>> 
>>>                    This looks like a bug fix. Would it make sense to
>>>                    add this change to the 4.10 branch? If so, would
>>>                    you be willing to send a separate patch for this?
>>> 
>>>                You are right, I will submit a patch soon. write_mysql
>>>                is often called than mysql plugin and non thread safe
>>>                library make collectd crashing quickly.
>>> 
>>>                        +CREATE TABLE `data` ( + `id` bigint(20) NOT
>>>                        NULL auto_increment, + `timestamp` double NOT
>>>                        NULL, + `host_id` int(11) NOT NULL, +
>>>                        `plugin_id` int(11) NOT NULL, +
>>>                        `plugin_instance` varchar(255) default NULL, +
>>>                        `type_id` int(11) NOT NULL, + `typeinstance`
>>>                        varchar(255) default NULL, + `dataset_id`
>>>                        int(11) NOT NULL, + `value` double NOT NULL,
>>> 
>>>                    One of the reasons why I didn't write a
>>>                    "write_dbi" or "write_mysql" plugin before is
>>>                    because I was uncertain about the schema to use.
>>>                    Why did you chose to have a separate table for
>>>                    host, plugin, plugin_instance, type, and
>>>                    type_instance? I currently think that just having
>>>                    an "identifier" table with five columns would be a
>>>                    better trade- off between normalization and
>>>                    complexity.
>>> 
>>>                The idea was to save space in `data` table and improve
>>>                querying in it. For now, I've 800MB `data` table
>>>                (MEMORY type) just for the last value of each metrics
>>>                (4207 servers / 462,395 metrics) using REPLACE instead
>>>                of INSERT statement. I think that SQL join are faster
>>>                on numeric id than varchar. But all of this things are
>>>                questionable and dependent of the final use. I think
>>>                about a customizable SQL statement in plugin
>>>                configuration.
>>> 
>>>                        ATE TABLE `dataset` (
>>> 
>>>                    Why do you store datasets in the database (in
>>>                    addition to the types.db file)? Having the
>>>                    definition in more than one place is bound to
>>>                    create inconsistencies at some point
>>> 
>>>                This table is for readonly purpose and to make SQL
>>>                query independent of number of metrics in a type. It
>>>                also contain type added on the way by a plugin.
>>> 
>>>                        diff --git a/src/write_mysql.c b/src/w
>>> 
>>>                    /blockquote> typedef struct host_s host_t; +struct
>>>                    host_s { + char
>>> 
>>>                        name[DATA_MAX_NAME_LEN]; + int id; + host_t
>>>                        *next_host; +};
>>> 
>>>                    You will be doing a lot of lookups in this
>>>                    cache(es). I think using a binary search tree (an
>>>                    implementation if provided in src/utils_avltree.h)
>>>                    would improve performance here.
>>> 
>>>                Thanks for the tips, Il will do that.
>>> 
>>>                    int port =
>>> 
>>>                        tatic int write_mysql_config (const char *key,
>>>                        const char *value) { [...] + } else if
>>>                        (strcasecmp ("Port", key) == 0) { + port =
>>>                        value; You
>>> 
>>>                    a char* to an int. This won't produce what you
>>>                    expect. Use the function
>>>                    service_name_to_port_number() from src/common.h.
>>>                    +static int wri
>>> 
>>>                Your'e right I will fix it.
>>> 
>>>                        t (void) { + conn = mysql_init(NULL); + if
>>>                        (mysql_real_connect(conn, host, user, passwd,
>>>                        database, port, NULL, 0) == NULL) { The conn
>>> 
>>>                    be (re-)established in the write() callback, when
>>>                    it is needed. Otherwise you will never recover
>>>                    from connection failures. + if (!mysql_th
>>> 
>>>                Your'e right, I just issued it. I will fix it too.
>>> 
>>>                        { + ERROR("write_mysql plugin: mysqlclient
>>>                        Thread Safe OFF"); I wasn't
>>> 
>>>                    s function. Good to know, thanks! +static int
>>>                    add_host_id (
>>> 
>>>                        me) { [...] + len = ssnprintf (query, sizeof
>>>                        (query), "SELECT id FROM host WHERE name =
>>>                        '%s'", hostname); I think we should
>>> 
>>>                    statement with bound arguments (or whatever the
>>>                    terminology). This way (a) MySQL doesn't have to
>>>                    parse and optimize the query each time we need to
>>>                    insert a host and (b) quoting is done for us by
>>>                    the mysql library. +static int add_host_id (char *
>>> 
>>>                I will looking for. I never used it before.
>>> 
>>>                        [...] + pthread_mutex_lock(&mutex); [...] + if
>>>                        (row = mysql_fetch_row(result)) { + id =
>>>                        atoi(row[0]); + mysql_free_result(result); +
>>>                        pthread_mutex_unlock(&mutex); [...] +
>>>                        pthread_mutex_unlock(&mutex); + return id; +}
>>>                        Mutex is released twice
>>> 
>>>                    +static int add_plugin_id (char
>>> 
>>>                I'll fix that.
>>> 
>>>                        { +static int add_type_id (char *typename) { A
>>>                        *lot* of duplicate code
>>> 
>>>                    ite a function like: int add_string (const char
>>>                    *table, const char *str); /* returns id on
>>>                    success, less than zero on failure. */ +static int
>>>                    write_mysql_write(con
>>> 
>>>                        t *ds, const value_list_t *vl, + user_data_t
>>>                        __attribute__((unused)) *user_data) { [...] +
>>>                        len = ssnprintf (tmpquery, sizeof (tmpquery),
>>>                        "INSERT INTO data " +
>>> 
>>>                "(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
>>> 
>>> 
>>>                        + "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%lf)",
>>>                        CDTIME_T_TO_DOUBLE (vl->time), host_id, +
>>>                        plugin_id, vl->plugin_instance, type_id,
>>>                        vl->type_instance, dataset_id );
>>> 
>>>                    You really should prepare this statement. Parsing
>>>                    this several thousand times per second will be a
>>>                    major performance problem.
>>> 
>>>                        +if (dso->type == DS_TYPE_GAUGE) { + len =
>>>                        ssnprintf (query, sizeof (query), tmpquery,
>>>                        vl->values[i].gauge); +} else { + if (rates ==
>>>                        NULL) { + rates = uc_get_rate (ds, vl); + } +
>>>                        if (isnan(rates[i])) { continue; } + len =
>>>                        ssnprintf (query, sizeof (query), tmpquery,
>>>                        rates[i]); +} +//INFO("toto: %d", toto);
>>>                        +pthread_mutex_lock(&mutex);
>>>                        +DEBUG("write_mysql plugin: %s", query);
>>>                        +mysql_real_query(conn, query, len);
>>>                        +pthread_mutex_unlock(&mutex);
>>> 
>>>                    So you're inserting each data source in a element?
>>>                    How can a client distinguish between the different
>>>                    data sources? At least, you should add a
>>>                    "ds_index" column or so. Thanks again and best
>>> 
>>>                There are dataset_id and type_id in `dataset` for
>>>                that. Here a query sample to get all metric in a type
>>>                in right order : select data.timestamp AS date,
>>>                dataset.name AS dataset_name, data.value AS value from
>>>                data, host, plugin, type, dataset where host.id =
>>>                data.host_id and plugin.id = data.plugin_id and
>>>                type.id = data.type_id and data.dataset_id =
>>>                dataset.id and host.name = 'eqds3pcold001' and
>>>                plugin.name = 'interface' and data.plugin_instance =
>>>                'eth0' and type.name = 'if_octets' In my case,
>>>                write_mysql is used to make, for example, sum of cpu
>>>                used on several thousand of server and re-inject it in
>>>                collectd to graph it. I've made a more flexible than
>>>                Ganglia application based on Collectd at work. My
>>>                compagny allow me to share all my work on collectd in
>>>                GPL like licence. I will Release a beta of my unamed
>>>                application soon a possible. Regards, Cyril Feraudet
>>>                _______________________________________________
>>>                collectd mailing list collectd at verplant.org
>>>                <mailto:collectd at verplant.org> collectd at verplant.org>
>>>                http://mailman.verplant.org/listinfo/collectd
>>> 
>>>            _______________________________________________ collectd
>>>            mailing list collectd at verplant.org
>>>            <mailto:collectd at verplant.org>
>>>            http://mailman.verplant.org/listinfo/collectd
>>> 
>>>        Hello, I'm trying to compile collectd 5.0.2 with write_mysql
>>>        pluing I've done this: http://pastebin.com/dPc6zgUH I'm on
>>>        rhel 5.5 x86_64 Any idea?
>>> 
>>>    I have to run libtoolize --force before compile
>>> 
>>>    here is:
>>>    tar vxf collectd-5.0.2.tar.bz2
>>>    cd collectd-5.0.2
>>>    patch -p1 -i ../0001-Adding-write_mysql-output-plugin.patch
>>>    aclocal
>>>    autoconf
>>>    automake
>>>    libtoolize  --force
>>>    ./configure
>>>    make
>>>    make install all
>>> 
>>> 
>>>    Now I'm facing this error:
>>>    error] write_mysql plugin: Failed to bind param to statement : Statement
>>>    not prepared / INSERT INTO data
>>>    (date,host_id,plugin_id,plugin_instance,type_id,type_instance,dataset_id,value)VALUES
>>>    (?,?,?,?,?,?,?,?)
>>> 
>>>    What that means????
>>> 
>>>    I tried Replace "false" without success
>>> 
>>>    Thanks in advance,
>>> 
>>>    Hilst
>>> 
>>> This is an old patch, please use this one : git clone -b
>>> cf/perfwatcher http://github.com/feraudet/collectd.git
>>> 
>>> Regards,
>>> 
>>> Cyril
>>> 
>> Here a ready to build one
>> https://github.com/downloads/feraudet/collectd/collectd-5.1.0-perfwatcher.tgz
>> 
>> You have to re-create database from collectd/contrib/write_mysql.sql and
>> take care to the new config block :
>> 
>>  <Plugin write_mysql>
>>      <Database master4example>
>>           Host "localhost"
>>           User "root"
>>           Passwd ""
>>           Database "collectd"
>>           Port "3306"
>>           Replace true
>>       </Database>
>>  </Plugin>
>> 
>> Regards,
>> 
>> Cyril
>> 
> 
> Hi, Cyril
> 
> The plugin is working with `Replace true' thanks!!
> 
> But when I change Replace to false it got this error:
> [2012-05-07 17:04:31] [error] write_mysql plugin: Failed to execute re-prepared statement : Duplicate entry '1-1--1-free-1' for key 2 / INSERT INTO data (date,host_id,plugin_id,plugin_instance,type_id,type_instance,dataset_id,value) VALUES (?,?,?,?,?,?,?,?)
> 
> To by pass this I had to drop the data table and create it again with a foreign key in place of unique one, here is:
> 
> DROP TABLE IF EXISTS `data`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `data` (
>  `id` bigint(20) NOT NULL AUTO_INCREMENT,
>  `date` datetime NOT NULL,
>  `host_id` int(11) NOT NULL,
>  `plugin_id` int(11) NOT NULL,
>  `plugin_instance` varchar(255) DEFAULT NULL,
>  `type_id` int(11) NOT NULL,
>  `type_instance` varchar(255) DEFAULT NULL,
>  `dataset_id` int(11) NOT NULL,
>  `value` double NOT NULL,
>  PRIMARY KEY (`id`),
>  FOREIGN KEY `host_id_3` (`host_id`) REFERENCES `host` (`id`),
>  FOREIGN KEY `plugin_id_3` (`plugin_id`) REFERENCES `plugin` (`id`),
>  FOREIGN KEY `type_id_3` (`type_id`) REFERENCES `type` (`id`),
>  FOREIGN KEY `dataset_id_3` (`dataset_id`) REFERENCES `dataset` (`id`)
> ) ENGINE=MEMORY AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
> 
> Would that ruin something later?
> 
> Thanks in advance..
> 
> Hilst,
> 
Hi,

You're right, the unique key is needed only for the REPLACE statement.

I only use the REPLACE statement to have the last value of each metric and perform data aggregation.

Regards,

Cyril