[collectd] New plugin for collectd : write_mysql

Wed Jan 4 09:49:27 CET 2012

Hi Octo, Hi all,

Thanks for your time. It will permit to me to improve this plugin and 
my
knowledge of C.

Le 22.12.2011 01:06, Florian Forster a écrit :

> Hi Cyril,
>
> thank you very much for your patches! I'm very sorry, but I can't 
> give
> the plugin the thorough review it deserves right now. But maybe you 
> find
> the following comments helpful.
>
> On Tue, Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet wrote:
>
>> - with_mysql_libs=`$with_mysql_config --libs 2>/dev/null`
>> + with_mysql_libs=`$with_mysql_config --libs_r 2>/dev/null`
>
> This looks like a bug fix. Would it make sense to add this change to 
> the
> 4.10 branch? If so, would you be willing to send a separate patch for
> this?

You are right, I will submit a patch soon. write_mysql is often called 
than
mysql plugin and non thread safe library make collectd crashing 
quickly.

>
>> +CREATE TABLE `data` ( + `id` bigint(20) NOT NULL auto_increment, +
>> `timestamp` double NOT NULL, + `host_id` int(11) NOT NULL, +
>> `plugin_id` int(11) NOT NULL, + `plugin_instance` varchar(255) 
>> default
>> NULL, + `type_id` int(11) NOT NULL, + `typeinstance` varchar(255)
>> default NULL, + `dataset_id` int(11) NOT NULL, + `value` double NOT
>> NULL,
>
> One of the reasons why I didn't write a "write_dbi" or "write_mysql"
> plugin before is because I was uncertain about the schema to use. Why
> did you chose to have a separate table for host, plugin,
> plugin_instance, type, and type_instance? I currently think that just
> having an "identifier" table with five columns would be a better 
> trade-
> off between normalization and complexity.

The idea was to save space in `data` table and improve querying in it.
For now, I've 800MB `data` table (MEMORY type) just for the last value 
of
each metrics (4207 servers / 462,395 metrics) using REPLACE instead of 
INSERT
statement. I think that SQL join are faster on numeric id than varchar.
But all of this things are questionable and dependent of the final use.
I think about a customizable SQL statement in plugin configuration.

>> ATE TABLE `dataset` (
> Why do you store datasets in the database (in addition to the 
> types.db
> file)? Having the definition in more than one place is bound to 
> create
> inconsistencies at some point

This table is for readonly purpose and to make SQL query independent of 
number
of metrics in a type. It also contain type added on the way by a 
plugin.

>
>> diff --git a/src/write_mysql.c b/src/w
> /blockquote> typedef struct host_s host_t; +struct host_s { + char
>> name[DATA_MAX_NAME_LEN]; + int id; + host_t *next_host; +};
>
> You will be doing a lot of lookups in this cache(es). I think using a
> binary search tree (an implementation if provided in
> src/utils_avltree.h) would improve performance here.

Thanks for the tips, Il will do that.

> int port =
>
>> tatic int write_mysql_config (const char *key, const char *value) {
>> [...] + } else if (strcasecmp ("Port", key) == 0) { + port = value;
>>
>> You
> a char* to an int. This won't produce what you expect. Use the 
> function
> service_name_to_port_number() from src/common.h. +static int wri

Your'e right I will fix it.

>
>> t (void) { + conn = mysql_init(NULL); + if (mysql_real_connect(conn,
>> host, user, passwd, database, port, NULL, 0) == NULL) {
>>
>> The conn
> be (re-)established in the write() callback, when it is needed.
> Otherwise you will never recover from connection failures. + if
> (!mysql_th

Your'e right, I just issued it. I will fix it too.

>
>> { + ERROR("write_mysql plugin: mysqlclient Thread Safe OFF");
>>
>> I wasn't
> s function. Good to know, thanks! +static int add_host_id (
>
>> me) { [...] + len = ssnprintf (query, sizeof (query), "SELECT id 
>> FROM
>> host WHERE name = '%s'", hostname);
>>
>> I think we should
> statement with bound arguments (or whatever the terminology). This 
> way
> (a) MySQL doesn't have to parse and optimize the query each time we 
> need
> to insert a host and (b) quoting is done for us by the mysql library.
> +static int add_host_id (char *
>

I will looking for. I never used it before.

>> [...] + pthread_mutex_lock(&mutex); [...] + if (row =
>> mysql_fetch_row(result)) { + id = atoi(row[0]); +
>> mysql_free_result(result); + pthread_mutex_unlock(&mutex); [...] +
>> pthread_mutex_unlock(&mutex); + return id; +}
>>
>> Mutex is released twice
> +static int add_plugin_id (char
>

I'll fix that.

>> { +static int add_type_id (char *typename) {
>>
>> A *lot* of duplicate code
> ite a function like: int add_string (const char *table, const char 
> *str);
> /* returns id on success, less than zero on failure. */ +static int
> write_mysql_write(con
>
>> t *ds, const value_list_t *vl, + user_data_t __attribute__((unused))
>> *user_data) { [...] + len = ssnprintf (tmpquery, sizeof (tmpquery),
>> "INSERT INTO data " +
>>
>
"(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
>> + "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%lf)", CDTIME_T_TO_DOUBLE
>> (vl->time), host_id, + plugin_id, vl->plugin_instance, type_id,
>> vl->type_instance, dataset_id );
>>
>>
>
> You really should prepare this statement. Parsing this several 
> thousand
> times per second will be a major performance problem.
>
>> +if (dso->type == DS_TYPE_GAUGE) {
>> +	len = ssnprintf (query, sizeof (query), tmpquery, 
>> vl->values[i].gauge);
>> +} else {
>> +	if (rates == NULL) {
>> +		rates = uc_get_rate (ds, vl);
>> +	}
>> +	if (isnan(rates[i])) { continue; }
>> +	len = ssnprintf (query, sizeof (query), tmpquery, rates[i]);
>> +}
>> +//INFO("toto: %d", toto);
>> +pthread_mutex_lock(&mutex);
>> +DEBUG("write_mysql plugin: %s", query);
>> +mysql_real_query(conn, query, len);
>> +pthread_mutex_unlock(&mutex);
>
> So you're inserting each data source in a
> element? How can a client distinguish between the different data 
> sources?
> At least, you should add a "ds_index" column or so. Thanks again and 
> best

There are dataset_id and type_id in `dataset` for that. Here a query 
sample to get
all metric in a type in right order :

select data.timestamp AS date, dataset.name AS dataset_name, data.value 
AS value
from data, host, plugin, type, dataset
where host.id = data.host_id
and plugin.id = data.plugin_id
and type.id = data.type_id
and data.dataset_id = dataset.id
and host.name = 'eqds3pcold001'
and plugin.name = 'interface'
and data.plugin_instance = 'eth0'
and type.name = 'if_octets'

In my case, write_mysql is used to make, for example, sum of cpu used 
on several thousand
of server and re-inject it in collectd to graph it.

I've made a more flexible than Ganglia application based on Collectd at 
work. My compagny
allow me to share all my work on collectd in GPL like licence. I will 
Release a beta of
my unamed application soon a possible.

Regards,

Cyril Feraudet