[collectd] New plugin for collectd : write_mysql

Tue Feb 14 10:54:01 CET 2012

Hi all, 

Here the write_mysql output plugin patch (from 5.0.2)
with modifications suggested by Octo. 

By the way, notification are
handled, all memory used is freed on shutdown (thx Valgrind). 

Regards,

Cyril 

Le 04.01.2012 10:49, Cyril Feraudet a écrit : 

> Hi Octo, Hi
all,
> 
> Thanks for your time. It will permit to me to improve this
plugin and 
> my
> knowledge of C.
> 
> Le 22.12.2011 01:06, Florian
Forster a écrit :
> 
>> Hi Cyril, thank you very much for your patches!
I'm very sorry, but I can't give the plugin the thorough review it
deserves right now. But maybe you find the following comments helpful.
On Tue, Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet wrote: 
>> 
>>>
- with_mysql_libs=`$with_mysql_config --libs 2>/dev/null` +
with_mysql_libs=`$with_mysql_config --libs_r 2>/dev/null`
>> This looks
like a bug fix. Would it make sense to add this change to the 4.10
branch? If so, would you be willing to send a separate patch for this?
>

> You are right, I will submit a patch soon. write_mysql is often
called 
> than
> mysql plugin and non thread safe library make collectd
crashing 
> quickly.
> 
>>> +CREATE TABLE `data` ( + `id` bigint(20) NOT
NULL auto_increment, + `timestamp` double NOT NULL, + `host_id` int(11)
NOT NULL, + `plugin_id` int(11) NOT NULL, + `plugin_instance`
varchar(255) default NULL, + `type_id` int(11) NOT NULL, +
`typeinstance` varchar(255) default NULL, + `dataset_id` int(11) NOT
NULL, + `value` double NOT NULL,
>> One of the reasons why I didn't
write a "write_dbi" or "write_mysql" plugin before is because I was
uncertain about the schema to use. Why did you chose to have a separate
table for host, plugin, plugin_instance, type, and type_instance? I
currently think that just having an "identifier" table with five columns
would be a better trade- off between normalization and complexity.
> 
>
The idea was to save space in `data` table and improve querying in it.
>
For now, I've 800MB `data` table (MEMORY type) just for the last value

> of
> each metrics (4207 servers / 462,395 metrics) using REPLACE
instead of 
> INSERT
> statement. I think that SQL join are faster on
numeric id than varchar.
> But all of this things are questionable and
dependent of the final use.
> I think about a customizable SQL statement
in plugin configuration.
> diff --git a/src/write_mysql.c b/src/w
/blockquote> typedef struct host_s host_t; +struct host_s { + char
"padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px;
width:100%">me) { [...] + len = ssnprintf (query, sizeof (query),
"SELECT id FROM host WHERE name = '%s'", hostname); I think we should
statement with bound arguments (or whatever the terminology). This way
(a) MySQL doesn't have to parse and optimize the query each time we need
to insert a host and (b) quoting is
> y the mysql library. +static int
add_host_id (char * 
> 
> I will looking for. I never used it before.
>
p;mutex); [...] + if (row = mysql_fetch_row(result)) { + id =
atoi(row[0]); + mysql_free_result(result); +
pthread_mutex_unlock(&mutex); [...] + pthread_mutex_unlock(&mutex); +
return id; +} Mutex is released twice +static int add_plugin_id (char

>> 
>> I'll fix that.
>> { +static int add_type_id (char *typename) { A
*lot* of duplicate code ite a function like: int add_string (const char
*table, const char *str); /* returns id on success, less than zero on
failure. */ +sta
> _mysql_write(con t *ds, const value_list_t *vl
> 
>>
INSERT INTO data " + 
>> 
>>
"(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
>>
+ "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%l
> _TO_DOUBLE (vl->time),
host_id, + plugin_id, vl->plugin_instance, type_id, vl->type_instance,
dataset_id ); You really should prepare this statement. Parsing this
several thousand times per second will be a major performance problem.
+if (dso->type == DS
> 
>> ates == NULL) { + rates = uc_get_rate (ds,
vl); + } + if (isnan(rates[i])) { continue; } + len = ssnprintf (query,
sizeof (query), tmpquery, rates[i]); +} +//INFO("toto: %d", toto);
+pthread_mutex_lock(&mutex); +DEBUG("write_mysql plugin: %s", query);
+mysql_real_query(conn, query, len); +pthread_mutex_unlock(&mutex); So
you're inserting each data source in a element? How can a client
distinguish between the different data sources? At least, you should add
a "ds_index" column or so. Thanks again and best 
>> 
>> There are
dataset_id and type_id in `dataset` for that. Here a query sample to get
all me
> e in right order : select data.timestamp AS date, dataset.name
AS dataset_name, data.value AS value from data, host, plugin, type,
dataset where host.id = data.host_id and plugin.id = data.plugin_id and
type.id = data.type_id and data.dataset_id = dataset.id and host.name =
'eqds3pcold001' and plugin.name = 'interface' and data.plugin_instance =
'eth0' and type.name = 'if_octets' In my case, write_mysql is 
> 
>>
raph it. I've made a more flexible than Ganglia application based on
Collectd at work. My compagny allow me to share all my work on collectd
in GPL like licence. I will Release a beta of my unamed application soon
a possible. Regards, Cyril Feraudet
_______________________________________________ collectd mailing list
collectd at verplant.org [1] http://mailman.verplant.org/listinfo/collectd
[2]

Links:
------
[1] mailto:collectd at verplant.org
[2]
http://mailman.verplant.org/listinfo/collectd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120214/7d2ebd11/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-Adding-write_mysql-output-plugin.patch
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120214/7d2ebd11/attachment-0001.ksh>