[collectd] tail plugin

Luke Heberling collectd at c-ware.com
Mon Feb 25 09:30:47 CET 2008


Florian Forster wrote:
> Hi Luke,
>
> I've looked at your patches yesterday and today and made some changes to
> them. I feel a bit bad for changing your code so much. While looking at
> the actual plugins I though ``hm, using regular expressions this should
> be a lot simpler'' and started writing a `match' so simplify the
> handling of regular expressions. And then I got carried away from there
> :/ I hope you forgive me ;)
>
forgiven :)
> On Mon, Feb 18, 2008 at 03:24:04PM -0800, Luke Heberling wrote:
>> utils_tail.diff
>> A utility library for watching the end of a log file.
>
> I've basically kept `utils_tail.[ch]' as they were, except for some
> improvements in error handling.
> Maybe the logic could be improved a bit though: If the inode of a file
> changes, the old filehandle is closed and the file is reopened.
> Shouldn't we read the old filehandle to the end first?
Yes.  I'll fix that, test it in isolation, and send a patch.
>
>
>
> As you can see the tail/parsing stuff is _completely_ out of the plugin
> now. Now, since it comes down to a regular expression - why not let the
> user do this? This way he can parse logfiles we don't dream about. So
> I've written a new `tail' plugin which is basically a configuration
> frontend for the `tail_match_add_match_simple' function.
>
> The config looks like this (simply copied from the collectd.conf(5)
> manpage):
> <Plugin "tail">
> <File "/var/log/exim4/mainlog">
> Instance "exim"
> <Match>
> Regex "S=([1-9][0-9]*)"
> DSType "CounterAdd"
> Type "ipt_bytes"
> Instance "total"
> </Match>
> <Match>
> Regex "\\<R=local_user\\>"
> DSType "CounterInc"
> Type "email_count"
> Instance "local_user"
> </Match>
> </File>
> </Plugin>
>
> Nice-to-have features would be:
> - Make the submatch to use configurable. Then you could do something
> like:
> "relay=(cyrus|imap|pop3), delay=([1-9][0-9]*)", use submatch #2
> - Use a submatch as the type-instance. E. g.:
> "R=([A-Za-z0-9_-]+)", ds_type = CounterInc, type_instance = #1
> (Automatically count how often each of the Exim `routers' was used;
> use its name as type-instance)
Looks like a very useful development. It might be nice to be
able to format it more like:

<Plugin "tail">
<File "/var/log/exim4/mainlog">
    <Instance "total">
       <Match>
        Regex "S=([1-9][0-9]*)"
        DSType "CounterAdd"
        Type "ipt_bytes"
        Name "total"
       </Match>
    </Instance>
    <Instance "users">
        <Match>
        Regex "\\<R=local_user\\>"
        DSType "CounterInc"
        Type "email_count"
        Name "local_user"
        </Match>
        <Match
        Regex "\\<R=remote_user\\>"
        Type "email_count"
        DSType "CounterInc"
        Name "remote_user"
        </Match>
    </Instance>
</File>

Let me know if this makes sense.  I'm not sure what to infer from the
use of the instance option under both the File and Match options in
your example.

>
>> {amavis,postfix,powerdns}.diff:
>> The main plugins.
>
> These plugins are only needed, if the data cannot be collected using the
> `tail' plugin. Two situations come to mind, where this may become
> necessary:
> - The regex may match multiple times in one line. E. g. there are
> multiple file sizes in one line. This could be implemented in the
> `match' object by reapplying the regular expression to the remainder
> of the string until it doesn't match anymore.
> - You need to see multiple lines for one data point. For example the
> size of an email is in one line and the type, e. g. ham vs. spam, is
> on another.
In amavis, the only reason you would need this is for the
total number of messages scanned. Amavis can log a "scan"
for the same message more than once (At least when users
are allowed to configure their own sensitivity settings) so
you need to count unique identifiers to get this.  I wouldn't
want to do without this number, but it's not the most
important counter.

In postfix, the size of the message is logged on receipt, so if
you want to know how many bytes you've delivered then you
have to remember the size of the message from when you
received it.

In my plugins, I accomplish this with a cache built from a
linked list and an avl tree.  Linked list to quickly remove the
eldest entry when it becomes too big and avl tree to quickly
find an entry by identifier.

>
> Unfortunately I don't have amavis or postfix running anywhere, so I
> can't tell if such a special plugin is necessary. Judging from the
> sources the plugins could be turned into sample config files, though,
> which I would of course love to include in contrib/.
I've got many postfix, several amavis machines and can find
a test bed for whatever you come up with.

I've attached some perl modules I was using (about two years
ago it seems) to do my rrd work.  They include some regular
expressions and might help by giving another perspective
on this.

I also implemented something very similar to these modules
in c using pcre, but was not able to meet what I thought was a
reasonable performance standard and ditched it.  It could
barely keep up with the log file under peak load on my busier
domains and kept the cpu above 50% most of the time. (On
admittedly less than stellar hardware, even for the time.)

Luke




More information about the collectd mailing list