[collectd] New plugin for collectd : write_mysql
Cyril Feraudet
collectd at feraudet.com
Wed Jan 4 09:49:27 CET 2012
Hi Octo, Hi all,
Thanks for your time. It will permit to me to improve this plugin and
my
knowledge of C.
Le 22.12.2011 01:06, Florian Forster a écrit :
> Hi Cyril,
>
> thank you very much for your patches! I'm very sorry, but I can't
> give
> the plugin the thorough review it deserves right now. But maybe you
> find
> the following comments helpful.
>
> On Tue, Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet wrote:
>
>> - with_mysql_libs=`$with_mysql_config --libs 2>/dev/null`
>> + with_mysql_libs=`$with_mysql_config --libs_r 2>/dev/null`
>
> This looks like a bug fix. Would it make sense to add this change to
> the
> 4.10 branch? If so, would you be willing to send a separate patch for
> this?
You are right, I will submit a patch soon. write_mysql is often called
than
mysql plugin and non thread safe library make collectd crashing
quickly.
>
>> +CREATE TABLE `data` ( + `id` bigint(20) NOT NULL auto_increment, +
>> `timestamp` double NOT NULL, + `host_id` int(11) NOT NULL, +
>> `plugin_id` int(11) NOT NULL, + `plugin_instance` varchar(255)
>> default
>> NULL, + `type_id` int(11) NOT NULL, + `typeinstance` varchar(255)
>> default NULL, + `dataset_id` int(11) NOT NULL, + `value` double NOT
>> NULL,
>
> One of the reasons why I didn't write a "write_dbi" or "write_mysql"
> plugin before is because I was uncertain about the schema to use. Why
> did you chose to have a separate table for host, plugin,
> plugin_instance, type, and type_instance? I currently think that just
> having an "identifier" table with five columns would be a better
> trade-
> off between normalization and complexity.
The idea was to save space in `data` table and improve querying in it.
For now, I've 800MB `data` table (MEMORY type) just for the last value
of
each metrics (4207 servers / 462,395 metrics) using REPLACE instead of
INSERT
statement. I think that SQL join are faster on numeric id than varchar.
But all of this things are questionable and dependent of the final use.
I think about a customizable SQL statement in plugin configuration.
>> ATE TABLE `dataset` (
> Why do you store datasets in the database (in addition to the
> types.db
> file)? Having the definition in more than one place is bound to
> create
> inconsistencies at some point
This table is for readonly purpose and to make SQL query independent of
number
of metrics in a type. It also contain type added on the way by a
plugin.
>
>> diff --git a/src/write_mysql.c b/src/w
> /blockquote> typedef struct host_s host_t; +struct host_s { + char
>> name[DATA_MAX_NAME_LEN]; + int id; + host_t *next_host; +};
>
> You will be doing a lot of lookups in this cache(es). I think using a
> binary search tree (an implementation if provided in
> src/utils_avltree.h) would improve performance here.
Thanks for the tips, Il will do that.
> int port =
>
>> tatic int write_mysql_config (const char *key, const char *value) {
>> [...] + } else if (strcasecmp ("Port", key) == 0) { + port = value;
>>
>> You
> a char* to an int. This won't produce what you expect. Use the
> function
> service_name_to_port_number() from src/common.h. +static int wri
Your'e right I will fix it.
>
>> t (void) { + conn = mysql_init(NULL); + if (mysql_real_connect(conn,
>> host, user, passwd, database, port, NULL, 0) == NULL) {
>>
>> The conn
> be (re-)established in the write() callback, when it is needed.
> Otherwise you will never recover from connection failures. + if
> (!mysql_th
Your'e right, I just issued it. I will fix it too.
>
>> { + ERROR("write_mysql plugin: mysqlclient Thread Safe OFF");
>>
>> I wasn't
> s function. Good to know, thanks! +static int add_host_id (
>
>> me) { [...] + len = ssnprintf (query, sizeof (query), "SELECT id
>> FROM
>> host WHERE name = '%s'", hostname);
>>
>> I think we should
> statement with bound arguments (or whatever the terminology). This
> way
> (a) MySQL doesn't have to parse and optimize the query each time we
> need
> to insert a host and (b) quoting is done for us by the mysql library.
> +static int add_host_id (char *
>
I will looking for. I never used it before.
>> [...] + pthread_mutex_lock(&mutex); [...] + if (row =
>> mysql_fetch_row(result)) { + id = atoi(row[0]); +
>> mysql_free_result(result); + pthread_mutex_unlock(&mutex); [...] +
>> pthread_mutex_unlock(&mutex); + return id; +}
>>
>> Mutex is released twice
> +static int add_plugin_id (char
>
I'll fix that.
>> { +static int add_type_id (char *typename) {
>>
>> A *lot* of duplicate code
> ite a function like: int add_string (const char *table, const char
> *str);
> /* returns id on success, less than zero on failure. */ +static int
> write_mysql_write(con
>
>> t *ds, const value_list_t *vl, + user_data_t __attribute__((unused))
>> *user_data) { [...] + len = ssnprintf (tmpquery, sizeof (tmpquery),
>> "INSERT INTO data " +
>>
>
"(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
>> + "VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%lf)", CDTIME_T_TO_DOUBLE
>> (vl->time), host_id, + plugin_id, vl->plugin_instance, type_id,
>> vl->type_instance, dataset_id );
>>
>>
>
> You really should prepare this statement. Parsing this several
> thousand
> times per second will be a major performance problem.
>
>> +if (dso->type == DS_TYPE_GAUGE) {
>> + len = ssnprintf (query, sizeof (query), tmpquery,
>> vl->values[i].gauge);
>> +} else {
>> + if (rates == NULL) {
>> + rates = uc_get_rate (ds, vl);
>> + }
>> + if (isnan(rates[i])) { continue; }
>> + len = ssnprintf (query, sizeof (query), tmpquery, rates[i]);
>> +}
>> +//INFO("toto: %d", toto);
>> +pthread_mutex_lock(&mutex);
>> +DEBUG("write_mysql plugin: %s", query);
>> +mysql_real_query(conn, query, len);
>> +pthread_mutex_unlock(&mutex);
>
> So you're inserting each data source in a
> element? How can a client distinguish between the different data
> sources?
> At least, you should add a "ds_index" column or so. Thanks again and
> best
There are dataset_id and type_id in `dataset` for that. Here a query
sample to get
all metric in a type in right order :
select data.timestamp AS date, dataset.name AS dataset_name, data.value
AS value
from data, host, plugin, type, dataset
where host.id = data.host_id
and plugin.id = data.plugin_id
and type.id = data.type_id
and data.dataset_id = dataset.id
and host.name = 'eqds3pcold001'
and plugin.name = 'interface'
and data.plugin_instance = 'eth0'
and type.name = 'if_octets'
In my case, write_mysql is used to make, for example, sum of cpu used
on several thousand
of server and re-inject it in collectd to graph it.
I've made a more flexible than Ganglia application based on Collectd at
work. My compagny
allow me to share all my work on collectd in GPL like licence. I will
Release a beta of
my unamed application soon a possible.
Regards,
Cyril Feraudet
More information about the collectd
mailing list