[collectd] patch for email stats from postfix and amavisd-new, dns stats from powerdns

Luke Heberling collectd at c-ware.com
Sat Dec 8 21:33:24 CET 2007


Florian Forster wrote:
> 
> why do you need to access the most recent element? The caches, as they
> are implemented in some plugins, search the whole AVL tree for ``old''
> elements every now or then and removes them. I guess you want to do
> something similar?

It's the least recently accessed element, so that it can be removed when
a new item is inserted.  In this case, I remove the eldest X entries
rather than all entries older than X. This would probably require two
traversals of the avl list if it was not in order by access time.
Maintaining dual avl lists I think would be a good trade of memory to
performance if the values were shared and only the keys differed.

>> Perhaps the thread creation could be an option, because in some cases 
>> such as mine, it's not really necessary.
> 
> It makes it much easier though: The thread could leave the file
> descriptor open and call select(2) on it and check if the inode has
> changed every few seconds. On Linux `inotify' could be used for an
> intelligent alternative to polling. If someone feels up to it, maybe
> some FAM (file activity/alteration monitor) support could be added at
> some point, too.
> 

If the amavis and postfix plugins were each tailing /var/log/syslog,
would two threads be needed? I imagine so, because if one plugin had a
less efficient callback routine or just a lot of data to parse, it
shouldn't hold up the plugin which has less data to parse. In this vein,
what happens in a situation where the read thread falls behind? The
module_read function would return the current values which could result
in incorrect readings. In the case of polling, it would just take longer
to get caught up, and would take advantage of collectd's well-tested
thread pool.

> I think keeping them up to date may actually be easier than handling the
> ``file was moved'' case each time it's called.. And using select,
> inotify or FAM this shouldn't be much overhead.
> 

The `File was moved' check is simply a stat call and comparison of the
inode. Really pretty simple. See the attachment for an example.

inotify, select or fam may be more efficient than polling every
collection interval, but more so for files which are rarely written to.
In this use case, we are tailing log files like syslog and mail.log,
which in most cases will be written to during nearly every single
collection interval. Are these features worth the extra lines of code
and runtime overhead of extra threads per plugin instance? There's a
case where the file name may not exist when you want to start tailing
it, and when/if it appears you want to start tailing it. Not sure
how/whether inotify/fam deal with this, clearly select would not handle it.

There may be a case where a plugin tails a number of files which can be
configured. In this case, there may end up being many instances of the
cu_tail object, maybe too many to want a thread per. It starts to look
like there should be one thread doing a select, inotify or fam loop, and
dispatching reads to a thread pool. This starts to look suspiciously
like the behavior that is already functioning and tested in collectd. At
this point, it seems that polling is a good tradeoff that looks even
better if the files being tailed are expected to have significant activity.

Attached is the tail module I've been working with that does polling.
I've kept it very simple, and the runtime overhead is minimal, probably
even close to optimal assuming that there are changes to be read after
most collection intervals.

In any case, I defer to your expertise in this area. My argument here
should not belie my willingness to help implement this in the way you
prefer.

Sebastian Harl wrote:
> As this sounds like a really interesting subject, I'm pretty sure I
will come
> up with something as soon as I have some time for it ;-)
>

Glad to see that you're on it. I'll be watching to see what you come up
with.

Luke Heberling
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utils_tail.c
Type: text/x-csrc
Size: 2817 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071208/baf15d20/attachment-0002.c 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_tail.c
Type: text/x-csrc
Size: 546 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071208/baf15d20/attachment-0003.c 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utils_tail.h
Type: text/x-chdr
Size: 2190 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071208/baf15d20/attachment-0001.h 


More information about the collectd mailing list