Re[2]: [yaala] So, lets actually do something...

Thu, 4 Dec 2003 13:02:02 +0600

On Sunday, December 01, 2002, 7:47:13 AM, Florian wrote:
FF> Sorry, what do you mean by "chose your branch end as start point"??
I mean that we (me) should through (almost) all my code away.
Thgen take your yaala 0.5.2 release and modify it feature-by-feature
to make new usable release.

>> The setup module should then parse those options and create internal
>> data structure, other then main::config.
>> IMHO main::config should hold only string presentation,
>> and other structure required to hold parsed parameters.
>> This structure avoids need of alot of functions like
>> setup::datalabels, setup::datatypes, etc (from yaala-qmax).

FF> I think there should be a subroutine to access configuration options and
FF> an interface for parser modules to register with that module. Other
FF> modules (data-storage, report-generation) should then be able to query
FF> that subroutine for configuration settings.
Making such subrotine and registration is issue like of another
datapool.
I guess, simple setup::conf{$parameter} will be enough.

FF> Also, there should be a public subroutine for parsing (i.e. "importing")
FF> the config file, so we can easily parse a secondary config file, which
FF> hold special settings for that particular module (parser modules mostly I
FF> guess).
Parser itself has no interface at all - it should only provide data
labels and types, and then use data::store()
There's nothing to configure in general.
Those parser::extra data is possible to deal with usual aggregations
(reporting period, count of days), others will be solvable with
extended aggregations.
Just these expressions should grouped BY whole report.
Egg:
count(records) where destination =~ /nimda_attack_match/ AS "Number of Nimda attacks"
etc...

>> The data module should be able to store tree-structured data pool -
>> to allow specifyng aggregation for any key combination.
>> I suggested [ \@keysequence, \%tree, \$exprdef ] and going to
>> implement this soon.

FF> Ok, cool :) I wonder how good this structure could be:
FF>   $hash->{$aggr}{"$key1:$val1;$key2:$val2;...keyN:valN;"}
FF> I put "$aggr" first because I assume that accessing a very small hash
FF> first, and then a big one is more efficient than accessing a huge hash
FF> first and then a (just as) very small one.
Those days I've tried
 $hash->{'key'.'val'}{'key'.'val'}...
and
 $hash->{'key'.'val'.'key'.'val'}...
and some other variants.
All consumes alot of memory,
and the longer hashkeys are the more time spent.

I guess, the optimal way is:
  $hash->{$keysequence_def}{$val1}{$val2}{$val3}{$expr_def}
$expr_def is at last, because when there're several expressions for
the same keysequence, data::ptr should return reference to hash for
further indexing:
  $ref = data::ptr( [foo bar baz], [fooval, barval, bazval] )
  $ref->{$expr_1} = evaluate;
  $ref->{$expr_2} = evaluate;

This hash  indexed with expr-defs is slightly unefficient and multiplies
unefficiency when repeated on every tree leaf.
But this speed up referencing process alot.

>> Later i'll post my suggestions about new selection expr.

FF> For some random thoughts of mine see above. It's just some brainstorming
FF> ideas that I got at 2:45 at night, so if it's all crap don't mind it
FF> please ;)

<from above:>
FF> One thing that would really make things easier was, if keyfield-selection
FF> and selection of aggregations would be on one line. Something like:
FF>   select: field0, field1 by sum(aggr);
Vice versa :)

select <aggregations> by <keys> where <filters>;
similar to SQL, that is:
<aggregation> is function like SUM(foo) or ranadom expression like
    SUM(foo)/COUNT(bar) and may be affixed by "AS" to denote
    formatting and label:
    select SUM(foo)/COUNT(bar) AS "Average foo per bar":number
<key> is label of countable data field in log record,
    an also may be affixed by AS "label":format
    OR (something like) random expression evaluating to countable
    values, egg boolean: (http_result =~ /^4../ ) AS "Fail"
<filter> is any expression over aggregations or keys, include/exclude
    particular data from to be reported (and to be included in
    "More N skipped" row).

This requires full-featured grammar parser.
My polish calculus with simple precedence is appropriable for this.

FF> Ok, good night, have fun hacking and I'll dig into a config-module
FF> tomorrow ;)
Still have no time to code :(
But i hope i'll find out some resources,
cos i realy need those features for work.

-- 
qMax