[collectd] New plugin for collectd : write_mysql

Florian Forster octo at collectd.org
Thu Dec 22 00:06:54 CET 2011


Hi Cyril,

thank you very much for your patches! I'm very sorry, but I can't give
the plugin the thorough review it deserves right now. But maybe you find
the following comments helpful.

On Tue, Dec 20, 2011 at 03:03:43PM +0200, Cyril Feraudet wrote:
> -	with_mysql_libs=`$with_mysql_config --libs 2>/dev/null`
> +	with_mysql_libs=`$with_mysql_config --libs_r 2>/dev/null`

This looks like a bug fix. Would it make sense to add this change to the
4.10 branch? If so, would you be willing to send a separate patch for
this?

> +CREATE TABLE `data` (
> +  `id` bigint(20) NOT NULL auto_increment,
> +  `timestamp` double NOT NULL,
> +  `host_id` int(11) NOT NULL,
> +  `plugin_id` int(11) NOT NULL,
> +  `plugin_instance` varchar(255) default NULL,
> +  `type_id` int(11) NOT NULL,
> +  `typeinstance` varchar(255) default NULL,
> +  `dataset_id` int(11) NOT NULL,
> +  `value` double NOT NULL,

One of the reasons why I didn't write a "write_dbi" or "write_mysql"
plugin before is because I was uncertain about the schema to use. Why
did you chose to have a separate table for host, plugin,
plugin_instance, type, and type_instance? I currently think that just
having an "identifier" table with five columns would be a better trade-
off between normalization and complexity.

> +CREATE TABLE `dataset` (

Why do you store datasets in the database (in addition to the types.db
file)? Having the definition in more than one place is bound to create
inconsistencies at some point ...

> diff --git a/src/write_mysql.c b/src/write_mysql.c

> +typedef struct host_s host_t;
> +struct host_s {
> +	char	name[DATA_MAX_NAME_LEN];
> +	int	id;
> +	host_t	*next_host;
> +};

You will be doing a lot of lookups in this cache(es). I think using a
binary search tree (an implementation if provided in
src/utils_avltree.h) would improve performance here.

> int port = 0;
> [...]
> +static int write_mysql_config (const char *key, const char *value) {
> [...]
> +	} else if (strcasecmp ("Port", key) == 0) {
> +		port = value;

You're assining a char* to an int. This won't produce what you expect.
Use the function service_name_to_port_number() from src/common.h.

> +static int write_mysql_init (void) {
> +	conn = mysql_init(NULL);
> +	if (mysql_real_connect(conn, host, user, passwd, database, port, NULL, 0) == NULL) {

The connection should be (re-)established in the write() callback, when
it is needed. Otherwise you will never recover from connection failures.

> +	if (!mysql_thread_safe()) {
> +		ERROR("write_mysql plugin: mysqlclient Thread Safe OFF");

I wasn't aware of this function. Good to know, thanks!

> +static int add_host_id (char *hostname) {
> [...]
> +	len = ssnprintf (query, sizeof (query), "SELECT id FROM host WHERE name = '%s'", hostname);

I think we should prepare that statement with bound arguments (or
whatever the terminology). This way (a) MySQL doesn't have to parse and
optimize the query each time we need to insert a host and (b) quoting is
done for us by the mysql library.

> +static int add_host_id (char *hostname) {
> [...]
> +	pthread_mutex_lock(&mutex);
> [...]
> +	if (row = mysql_fetch_row(result)) {
> +		id = atoi(row[0]);
> +		mysql_free_result(result);
> +		pthread_mutex_unlock(&mutex);
> [...]
> +	pthread_mutex_unlock(&mutex);
> +	return id;
> +}

Mutex is released twice in this path.

> +static int add_plugin_id (char *pluginname) {
> +static int add_type_id (char *typename) {

A *lot* of duplicate code. Can't we write a function like:

  int add_string (const char *table, const char *str);
  /* returns id on success, less than zero on failure. */

> +static int write_mysql_write(const data_set_t *ds, const value_list_t *vl,
> +		user_data_t __attribute__((unused)) *user_data) {
> [...]
> +		len = ssnprintf (tmpquery, sizeof (tmpquery), "INSERT INTO data "
> +		"(timestamp,host_id,plugin_id,plugin_instance,type_id,typeinstance,dataset_id,value)"
> +		"VALUES (%.3f,%d,%d,'%s',%d,'%s',%d,%%lf)", CDTIME_T_TO_DOUBLE (vl->time), host_id, 
> +		plugin_id, vl->plugin_instance, type_id, vl->type_instance, dataset_id );

You really should prepare this statement. Parsing this several thousand
times per second will be a major performance problem.

> +		if (dso->type == DS_TYPE_GAUGE) {
> +			len = ssnprintf (query, sizeof (query), tmpquery, vl->values[i].gauge);
> +		} else {
> +			if (rates == NULL) {
> +				rates = uc_get_rate (ds, vl);
> +			}
> +			if (isnan(rates[i])) { continue; }
> +			len = ssnprintf (query, sizeof (query), tmpquery, rates[i]);
> +		}
> +		//INFO("toto: %d", toto);
> +		pthread_mutex_lock(&mutex);
> +		DEBUG("write_mysql plugin: %s", query);
> +		mysql_real_query(conn, query, len);
> +		pthread_mutex_unlock(&mutex);

So you're inserting each data source in a separate statement? How can a
client distinguish between the different data sources? At least, you
should add a "ds_index" column or so.

Thanks again and best regards,
—octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x0C705A15
http://octo.it/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20111222/b587a778/attachment.pgp>


More information about the collectd mailing list