[collectd] New plugin for collectd : write_mysql

Cyril Feraudet collectd at feraudet.com
Tue Feb 14 10:54:01 CET 2012


Hi all, 

Here the write_mysql output plugin patch (from 5.0.2)
with modifications suggested by Octo. 

By the way, notification are
handled, all memory used is freed on shutdown (thx Valgrind). 



Le 04.01.2012 10:49, Cyril Feraudet a écrit : 

> Hi Octo, Hi
> Thanks for your time. It will permit to me to improve this
plugin and 
> my
> knowledge of C.
> Le 22.12.2011 01:06, Florian
Forster a écrit :
>> Hi Cyril, thank you very much for your patches!
I'm very sorry, but I can't give the plugin the thorough review it
deserves right now. But maybe you find the following comments helpful.
On Tue, Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet wrote: 
- with_mysql_libs=`$with_mysql_config --libs 2>/dev/null` +
with_mysql_libs=`$with_mysql_config --libs_r 2>/dev/null`
>> This looks
like a bug fix. Would it make sense to add this change to the 4.10
branch? If so, would you be willing to send a separate patch for this?

> You are right, I will submit a patch soon. write_mysql is often
> than
> mysql plugin and non thread safe library make collectd
> quickly.
>>> +CREATE TABLE `data` ( + `id` bigint(20) NOT
NULL auto_increment, + `timestamp` double NOT NULL, + `host_id` int(11)
NOT NULL, + `plugin_id` int(11) NOT NULL, + `plugin_instance`
varchar(255) default NULL, + `type_id` int(11) NOT NULL, +
`typeinstance` varchar(255) default NULL, + `dataset_id` int(11) NOT
NULL, + `value` double NOT NULL,
>> One of the reasons why I didn't
write a "write_dbi" or "write_mysql" plugin before is because I was
uncertain about the schema to use. Why did you chose to have a separate
table for host, plugin, plugin_instance, type, and type_instance? I
currently think that just having an "identifier" table with five columns
would be a better trade- off between normalization and complexity.
The idea was to save space in `data` table and improve querying in it.
For now, I've 800MB `data` table (MEMORY type) just for the last value

> of
> each metrics (4207 servers / 462,395 metrics) using REPLACE
instead of 
> statement. I think that SQL join are faster on
numeric id than varchar.
> But all of this things are questionable and
dependent of the final use.
> I think about a customizable SQL statement
in plugin configuration.
> diff --git a/src/write_mysql.c b/src/w
/blockquote> typedef struct host_s host_t; +struct host_s { + char
"padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px;
width:100%">me) { [...] + len = ssnprintf (query, sizeof (query),
"SELECT id FROM host WHERE name = '%s'", hostname); I think we should
statement with bound arguments (or whatever the terminology). This way
(a) MySQL doesn't have to parse and optimize the query each time we need
to insert a host and (b) quoting is
> y the mysql library. +static int
add_host_id (char * 
> I will looking for. I never used it before.
p;mutex); [...] + if (row = mysql_fetch_row(result)) { + id =
atoi(row[0]); + mysql_free_result(result); +
pthread_mutex_unlock(&mutex); [...] + pthread_mutex_unlock(&mutex); +
return id; +} Mutex is released twice +static int add_plugin_id (char

>> I'll fix that.
>> { +static int add_type_id (char *typename) { A
*lot* of duplicate code ite a function like: int add_string (const char
*table, const char *str); /* returns id on success, less than zero on
failure. */ +sta
> _mysql_write(con t *ds, const value_list_t *vl
INSERT INTO data " + 
+ "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%l
> _TO_DOUBLE (vl->time),
host_id, + plugin_id, vl->plugin_instance, type_id, vl->type_instance,
dataset_id ); You really should prepare this statement. Parsing this
several thousand times per second will be a major performance problem.
+if (dso->type == DS
>> ates == NULL) { + rates = uc_get_rate (ds,
vl); + } + if (isnan(rates[i])) { continue; } + len = ssnprintf (query,
sizeof (query), tmpquery, rates[i]); +} +//INFO("toto: %d", toto);
+pthread_mutex_lock(&mutex); +DEBUG("write_mysql plugin: %s", query);
+mysql_real_query(conn, query, len); +pthread_mutex_unlock(&mutex); So
you're inserting each data source in a element? How can a client
distinguish between the different data sources? At least, you should add
a "ds_index" column or so. Thanks again and best 
>> There are
dataset_id and type_id in `dataset` for that. Here a query sample to get
all me
> e in right order : select data.timestamp AS date, dataset.name
AS dataset_name, data.value AS value from data, host, plugin, type,
dataset where host.id = data.host_id and plugin.id = data.plugin_id and
type.id = data.type_id and data.dataset_id = dataset.id and host.name =
'eqds3pcold001' and plugin.name = 'interface' and data.plugin_instance =
'eth0' and type.name = 'if_octets' In my case, write_mysql is 
raph it. I've made a more flexible than Ganglia application based on
Collectd at work. My compagny allow me to share all my work on collectd
in GPL like licence. I will Release a beta of my unamed application soon
a possible. Regards, Cyril Feraudet
_______________________________________________ collectd mailing list
collectd at verplant.org [1] http://mailman.verplant.org/listinfo/collectd


[1] mailto:collectd at verplant.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120214/7d2ebd11/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-Adding-write_mysql-output-plugin.patch
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120214/7d2ebd11/attachment-0001.ksh>

More information about the collectd mailing list