[collectd] patch for email stats from postfix and amavisd-new, dns stats from powerdns

Luke Heberling collectd at c-ware.com
Sun Dec 2 09:28:21 CET 2007


Florian Forster wrote:
> Why primes? 2^k-1 I would have understood, basically anything that
> has to with padding and avoiding extra bytes, but primes? What do
> they have to do with it?
I've done some experimenting. Here are some results.
I've abstracted the calc_table_len code into a function that is passed 
into the creating of the hash table.  I then added two more hash 
algorithms and pulled the hash related functions into their own source 
file, utils_hash.{c,h}.  Then did some testing of the hash distribution 
for each hash/table_length algorithm with the 98569 words in the 
american english list from the scowl project 
(http://wordlist.sourceforge.net). The mru cache uses a chained hash 
table so it's easy to look at the number of collisions.

                               Collisions
Hash/Table_Length Algo      0     1     2    3   4   5  6
sdbm/prime                  60883 14826 2293 257 23  2
sdbm/power_two              46313 17353 4528 815 125 10 3
sdbm/power_two_minus_1      46180 17615 4348 839 129 19
djb2/prime                  60846 14606 2411 288 24  1
djb2/power_two              46913 17408 4354 787 114 10
djb2/power_two_minus_1      46684 17470 4351 805 110 18 2
jenkins/prime               60949 14593 2369 290 31  2
jenkins/power_two           46254 17602 4292 846 148 15 3
jenkins/power_two_minus_1   60883 14826 2293 257 23  2

Clearly the prime table length provides the best distribution among all 
the hash algorithms. But the jenkins hash performed pretty much as well 
with the power-of-two-minus-one table length, indicating that it is a 
preferred hash function, at least with this data set.

I like the idea of a separate utils_hash file, as well as the pluggable 
table length calculation functions.  If you have the time, let me know 
anything that jumps out at you about that organization or the utils_hash 
and utils_mrucache modules. Meantime, I'm continuing to test the 
rewritten amavis/postfix/powerdns plugins.

Luke
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests.patch
Type: text/x-diff
Size: 9843 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071202/3e089517/attachment-0003.patch 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utils_hash.patch
Type: text/x-diff
Size: 14497 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071202/3e089517/attachment-0004.patch 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utils_mrucache.patch
Type: text/x-diff
Size: 17177 bytes
Desc: not available
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20071202/3e089517/attachment-0005.patch 


More information about the collectd mailing list