[collectd] long sleeps when using collectd and ntpd

Yves Mettier ymettier at free.fr
Tue Sep 24 13:24:50 CEST 2013


Hello,

I have already noticed this problem too.
Forget about ntpd. Some may try to understand what ntpd has to do with 
it and maybe suggest solutions for ntpd. The problem is the time on the 
machine.

How to reproduce :
0/ disable ntpd (so you can see that there is nothing to do with ntpd)
1/ Set the date&time in the future on your server (it's "t0", set it to 
t1, for example t1="t0 + 2 hours")
2/ Wait 2 or 3 minutes that Collectd collects and send data to the main 
Collectd server (and break your rrd files)
3/ Set the date&time back to the correct value (t0 + the time elapsed 
since step 1)
4/ no data will be written inside the rrd files until the real date&time 
reaches t1 (in our example, we loose 2 hours of data).

The problem is inside the rrd files.
You can run "rrdtool last <your rrd file>.rrd" and see the timestamp of 
the last value written.
As far as I know rrd files and rrdtool, you cannot write data with 
timestamps before that value.
So if you set a bad date&time in the future, it will continue to work. 
But when you set a date&time in the past (or like in the example, back 
from the future), it's broken.

Bigger problem when the date&time goes crazy and set a date years later. 
When this happens, you can consider your rrd file as corrupted.


Well, here is my version of the description of the problem.
If anybody has an idea on how to fix, I'm interested too.

Regards,
Yves


Le 2013-09-24 10:53, danta a écrit :
> We have some problems using collectd and ntpd.
> 
> Whenever ntpd sets the clock in the past (because of a clock drift),
> collectd sleeps until it is back at it's 'normal' time.
> To give an example:
> - Suppose it's 10:00am,
> - ntpd sees a large clock drift and set the clock to 09:45am
> - collectd will sleep till 10:00am
> 
> When trying to debug the problem, we found that the do_loop function
> in collectd.c determines the amount of time to sleep based on times
> abtained from the "gettimeofday" function. Was this a design choice?
> Wouldn't it be better to use a monotonic clock?
> 
> Greetz,
> Lode
> 
> _______________________________________________
> collectd mailing list
> collectd at verplant.org
> http://mailman.verplant.org/listinfo/collectd

-- 
- Homepage       - http://ymettier.free.fr                             -
- GPG key        - http://ymettier.free.fr/gpg.txt                     -
- C en action    - http://ymettier.free.fr/livres/C_en_action_ed2.html -
- Guide Survie C - http://www.pearson.fr/livre/?GCOI=27440100673730    -



More information about the collectd mailing list