[collectd-changes] collectd, the system statistics collection daemon: Changes to 'master'
Florian Forster
octo at verplant.org
Tue Aug 18 21:40:50 CEST 2009
src/collectd.conf.pod | 8 ++++++++
src/rrdtool.c | 36 ++++++++++++++++++++++++++++++++++--
2 files changed, 42 insertions(+), 2 deletions(-)
New commits:
commit c35203c82560eba66bb901aa22c5170fb8c389fb
Merge: 4151d975ca93af4570a1ca97a0408ba446ea7485 2bca2a511c1636bf448112d081d115ebc01b5ed4
Author: Florian Forster <octo at leeloo.lan.home.verplant.org>
Date: Tue Aug 18 21:39:03 2009 +0200
Merge branch 'mg/jitter'
commit 2bca2a511c1636bf448112d081d115ebc01b5ed4
Author: Florian Forster <octo at leeloo.lan.home.verplant.org>
Date: Tue Aug 18 21:38:01 2009 +0200
collectd.conf(5): Document the new `RandomTimeout' option.
commit d278a40cab2bcb6bb0387176d087ce13cd3e843b
Author: Florian Forster <octo at leeloo.lan.home.verplant.org>
Date: Tue Aug 18 21:23:21 2009 +0200
rrdtool plugin: Optimize away the ârandom_timeout_modâ variable.
commit bdcac4078f8052b8e4f425a1e5aea3957551e0d3
Author: Mariusz Gronczewski <xani666 at gmail.com>
Date: Tue Aug 18 21:18:06 2009 +0200
rrdtool plugin: Call rand(3) less often.
2009/8/18 Florian Forster <octo at verplant.org>:
> Hi Mariusz,
>
> On Mon, Aug 17, 2009 at 02:20:29AM +0200, Mariusz Gronczewski wrote:
>> i was thinking how to "spread out" writes to rrd files a bit, because
>> now its big spike every CacheTimeout or little smaller "square" on
>> graph if u use WritesPerSecond.
>
> in general I like your patch, thank you very much for posting it :)
> I have some doubts about calling rand() in such a busy place though,
> since getting random numbers is potentially costly. Also, rand(3) is not
> thread-safe, though I don't think that's really an issue for us.
Yeah good point, but that would be probably noticable on very slow
(like PIII 800 slow) machines with tons of rrd, and then machine would
run out of disk bandwidth first.
> Maybe a solution would be to add a ârandom_timeoutâ member to the
> ârrd_cache_tâ struct, too. This member is then set when creating the
> entry and set again right after the values have been removed. That way
> rand(3) is only called once for each write instead of calling for every
> check.
Yeah, very good idea, i didnt thougth about that (well tbh. i didnt
looked much into "interiors" of rrdtool plugin). Ive implemented it in
attached patch, so far ive been testing it for about 1 hour and works
pretty well.
> As an interesting sidenote: With the above approach, the random write
> times are distributed âuniformâ, i. e. every delay from 0 to max-1
> seconds has the same probability. With your code, I think the actual
> time a value is written follows a ânormalâ distribution (you know, that
> famous bell curve). So I'd expect the above approach to spread the value
> quicker.
Yup, exactly as u said, its much quicker like that.
Im wondering how config variable should be called, name
"RandomTimeout" dont mean anything useful ("random timeout of what?"),
maybe TimeoutSpread ? RandomizeTimeout ?
commit fd48357ddeb1b58d5795015e845f3105a7ba3103
Author: Mariusz Gronczewski <xani666 at gmail.com>
Date: Mon Aug 17 02:20:29 2009 +0200
Random write timeout for rrdtool plugin
Hi,
i was thinking how to "spread out" writes to rrd files a bit, because
now its big spike every CacheTimeout or little smaller "square" on
graph if u use WritesPerSecond. So ive written little patch which
"spreads out" writing by changing Cache timeout every time rrdtool
plugin finds data to save. Basically instead of moving data older than
CacheTimeout to write queue it moves it if its older than CacheTimeout
+- RandomTimeout. What it changes?
Without it, gathered data is "synchronised" with eachother, for
example (CacheTimeout = 600):
1.collectd starts
2. after 10 minutes, data from all plugins get "too old" and get
pushed into write queue and get saved
3. after another 10 minutes, same thing, all data "ages" at same time
and get saved in one big chunk
With it (RandomTimeout=300) it works like that
1. collectd starts
2. after 5 minutes some data (lets call it A) starts to go into write queue
3. after 10 minutes from start about 50% (on average) data is saved
(lets call it B)
4. finally, after 15 minutes, all "leftover" data gets saved (lets call it C)
5. next "cycle"
6. data A ages first (cos it was put to disk first) and like before,
some of it gets writen earlier, some of it gets written later)
7. after that data B ages and like before writes are spread over 10 mins
8. same with C
so first cycle (looking at i/o) looks like sinus, next 10 minute cycle
is same sinus but flattened a bit and so on (looks like fading sinus),
and after few cycles it gives pretty much same amount on writes per
sec, no ugly spikes.
Effect looks like that:
http://img24.imageshack.us/img24/7294/drrawcgi.png
(after few more h it will be more "smooth")
Regards
Mariusz
Signed-off-by: Florian Forster <octo at huhu.verplant.org>
More information about the collectd-changes
mailing list