[collectd] Generalized ignorelist functionality

Mon Nov 27 16:54:50 CET 2006

Hi Florian,
this can serve also as a starting point of the 'regex' documentation.

POSIX 'EXTENDED' syntax bits:
RE_CONTEXT_INDEP_ANCHORS - '^' and '$' are special characters
RE_CONTEXT_INDEP_OPS - '*', '+', '?' and '{' are repetition operators if
they are not first in a regular expression
RE_CHAR_CLASSES - use character classes ([:class:])
RE_DOT_NEWLINE - match-any-character matches newline
RE_DOT_NOT_NULL - match-any-character does not match a null character
RE_INTERVALS - regex recognizes interval operators
RE_NO EMPTY_RANGES - range must be valid
RE_NO_BK_BRACES - '{', '}' are interval operators
RE_NO_BK_PARENS - '(', ')' are group operators
RE_NO_BK_VBAR - '|' is alternation operator
RE_UNMATCHED_RIGHT_PAREN_ORD - regex guesses missing matching
close-group operator

We can use also REG_ICASE (strcasecmp()) for case insensibility. Current
version is 'case sensitive' for both regex and string (strcmp()).

The results for the 'string' conversion is to prefix these characters if
present in the 'string': '^', '$', '.', '*', '+', '?', '{', '}', '[',
']', '(', ')', '|' and '\' and use '^', '$' directly for the full match.

It looks very simple, isn't it?

Florian Forster napsal(a):
> Hi Lubos,
> 
> On Fri, Nov 24, 2006 at 12:29:36PM +0100, Lubo?? Stan??k wrote:
>> The current version is satisfactory. The users simply continue using
>> 'string' identification and use '/regex/' when they need it. The regex
>> entry is intuitive enough due to the '//' delimiters as it is used by
>> many other tools.
> 
> I guess so. I'm not so happy that slashes inside the regex don't need to
> be escaped, but I think they _may_ be escaped, which is good enough ;)
> 

;)

>> Release a new version with the current implementation and we will see
>> the feedback.
> 
> This is ecactly what I don't want to do. There's hardly anything more
> annoying than an interface that's changing regularly. And I don't like
> good interfaces with bad things in them, to not break backwards
> compatibility either. So I'd like to do it right right from the start ;)
> 

OK, we can discuss it more.

>> I do not want to stand for other users in our two man show. :)
> 
> It's always better to get at least one other opinion than to simply do
> it somehow and then learn that you missed an (important) aspect ;)
> 

Is there any other developer that is able to comment current draft for
using POSIX regular expressions for string comparison?

>> I have thought about parameter checking tool that would load config
>> file and plugins the same way the daemon would do and run one step
>> without calling plugins' write(). The one step would report all on
>> 'stderr'. All means every collectable entry, result of ignorelist
>> match, all errors and so on.
> 
> I don't know. This sound like a lot of work. Doing a syntactic check
> shouldn't be a problem, but actually checking the meaning of the
> configuration is propably next to impossible.
> 
> If I run `apache2ctl -t' they only do a syntactic check, too. I have to
> take care of the correctness of `<FilesMatch ...>' sections myself.
> 

I did not think about syntactic check.
I meant something like '-dryrun' or '--verbose' or extended 'mode Log'
for one COLLECTD_STEP loop. The user can see what values to what RRDs
are collected, what ignorelist entry matches and so on. It would be very
simple to catch this output and build a proper configuration from it.
It could work well with your intent to provide several 'log' plugins.

Best regards,
Lubos