<html><body>
<p>Hi everyone,</p>
<p>as already anounced on IRC, I'd like to bring the percentile calculation available in the new statsd module to the tail module.</p>
<p>Reading through the source it seems to be that there is one main difference between the way in tail.so and statsd;(if I get that correctly?)</p>
<p>The tail infrastructure does all calculation one by one, and it seems that only one gauge can be added per regex, so if one wants Average & Max, the regex has to be there twice?</p>
<p>In contrast, utils_latency adds values to a data-row, and can run several calculations on one dataset for reporting once the aggregation timespan has passed, and values are to be delivered.</p>
<p> </p>
<p>So I started knitting eveything together as I thought was right (see patch) and did some tests; however, are the results correct?</p>
<p> <File "/home/willi/test.log"><br /> Instance "blarg"<br /> <Match><br /> Regex "S=([1-9][0-9]*)"<br /> DSType "GaugePercentile"<br /> Type "gauge"<br /> Instance "percentile"<br /> </Match><br />was my test config, and I generated a logfile using</p>
<p>for z in `seq 1 1000000`; do sleep 0.1; for j in `seq 1 10`; do for i in `seq 1 10`; do printf "S=1${i}000\n" >> test.log; done; touch test.log ; sleep 1; done; done</p>
<p>I hacked in some test code to print out the calculated gauges: </p>
<p> values[0].gauge = CDTIME_T_TO_DOUBLE (<br /> latency_counter_get_min(match_value->Counter));<br /> fprintf(stderr, "Min: %f\n", values[0].gauge);<br /> latency_counter_get_max -> fprintf(stderr, "Max: %f\n", values[0].gauge);<br /> latency_counter_get_average -> fprintf(stderr, "Avg: %f\n", values[0].gauge);<br /> latency_counter_get_percentile(match_value->Counter, 95)); -> fprintf(stderr, "perc: %f\n", values[0].gauge);</p>
<p>and it would always give me s.th. like this:</p>
<p>Min: 0.000010<br />Max: 0.000102<br />Avg: 0.000023<br />perc: 0.001000</p>
<p>while the input data looks like this:</p>
<p>S=11000<br />S=12000<br />S=13000<br />S=14000<br />S=15000<br />S=16000<br />S=17000<br />S=18000<br />S=19000<br />S=110000</p>
<p>so, for the MIN-value one can clearly see that this should be something with '11' in it, regardless of where the float pushes the decimal point while the debug print reads Min: 0.000010 ?</p>
<p> </p>
<p>So, my questions are:</p>
<p> - do I feed in the data correctly?</p>
<p> - do I read out the data correctly?</p>
<p> - am I right that the data series can be used to calculate several gauges in a row?</p>
<p> - is my debug printf flawed? how should it look?</p>
<p> - to aproach a final state, should all (or most) calculations of tail be replaced by the latency one?</p>
<p> - if the old should live alongside the new ones, make DSType 'Gauges' and have some more settings with a list of the gauges to calculate?</p>
<p>TIA,</p>
<p>Wilfried Goesgens</p>
</body></html>