[collectd] patch for email stats from postfix and amavisd-new, dns stats from powerdns

Florian Forster octo at verplant.org
Mon Dec 3 18:56:55 CET 2007


Hi Luke,

sorry for replying so late - I was without internet connection over the
weekend.

On Thu, Nov 29, 2007 at 11:28:33AM -0800, Luke Heberling wrote:
> It's for decent table distribution in the face of lesser quality hash
> routines which often have a bias.

I don't want to delve into hashing functions now, I have too much stuff
going on for that :/ For such non-obvious stuff, please put a comment
there. It doesn't need to be long or explain a lot, something like this
would be perfectly okay:
 /*
  * Use primes as table length, because they perform better with many
  * hash functions.
  * [Optional:] See discussion on the ML:
  * http://mailman.verplant.org/pipermail/collectd/2007-December/001373.html
  */

> Yes, artificial limits are annoying. Larry Wall would be proud. We can
> maybe get a twofer by using power of two table lengths.

Another question: Please don't get me wrong - I really do apprechiate
all the work you've already put into this - but I want to know what's
going on:
Why do you use hash tables in the first place? There's an AVL-tree which
is used for caches in various places. The depth (i. e. steps it takes
for a negative answer, which equals the worst case positive answer) is
29 for 491e6 entries, i. e. not very much. Also, hash tables are not
sorted, but an MRU-cache implies some sorting, doesn't it? Is the
performance gain you get for a very large number of entries really worth
all the code and possible bugs? Or is this not the point?

> I like this approach because it could probably hide the FILE* pointer
> entirely.  Which brings to mind another method which goes a little
> further. We could abstract even the read loop using a function
> pointer.

I like that idea. The utils_tail code could start a new thread which
calls a callback with each new line read and the line reading, the
buffer allocating and all the other annoying details would be completely
hidden from the other code.. I'll look into that as soon as I have time
for it..

> from header:
>     cu_tail *create( char *name, int flags, int buffer_len );
>     void destroy( cu_tail *tail );
>     void tail ( cu_tail *tail, process_line_func func );
>     char *buffer( cu_tail *tail );
>     int buffer_len( cu_tail *tail );

I was more thinking along the lines of:
  typedef (int) (*cut_callback_t) (const char *, size_t, void *);
  cu_tail_t *cut_create (const char *name, cut_callback_t callback,
    int flags, void *user_data);
  int cut_close (cu_tail_t *, void **ret_user_data);

> By the way, what is the cu_ prefix to the cu_tail_* functions in your
> example? I'm presuming "common utility" or similar as a modifier to
> create a unique name in the case it is used in projects other than
> collectd.

Yeah, that's supposed to mean `collectd utils'. My implementation of the
AVL-trees used the same names as the implementation found in some
Solaris installations. So I started using a prefix to prevent this..

> Is this a prefix which would be well served on the mru_* functions?

As you can see in the `utils_llist.h' this isn't done entirely
consequent in collectd. Some prefix (such as `mru_') is a must, but you
don't have to use `cu_mru_' or `cmru_'. Changing the prefix later is
easy enough (think: sed -i ;)

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071203/5a22a6dd/attachment.pgp 


More information about the collectd mailing list