[collectd] ntpd plugin complaining [not FIXED yet]

Luboš Staněk lubek at users.sourceforge.net
Wed Nov 1 21:00:07 CET 2006


Hi,

Florian Forster napsal(a):
> Hi Lubos,
> 
> sorry for my slow reply :/
> 

Acknowledged.
:)


> On Tue, Oct 31, 2006 at 07:16:35PM +0100, Lubo?? Stan??k wrote:
>> When I modified the function parameters to sockaddr_in6, I got
>> unresolved numeric addresses , ex: ::2f70:726f:632f:3130:3635:342f and
>> .rrd files. And the famous "ntpd plugin: getnameinfo failed: ai_family
>> not supported" was a history.
>> I did not tested it but I am convinced that in case of the working IPV6
>> only network the unmodified call returns empty peername.
> 
> Though you were a little mystic here, I got the point: We don't pass a
> `struct sockaddr_in6' here, but a `struct in6_addr'. Since the first
> four bytes of this structure don't match either `AF_INET' or `AF_INET6'
> the routine returns with an appropriate error. The compiler did not
> catch this, since you need to explicitely cast the `sockaddr_in6' to a
> `sockaddr'..
> 
> I've written a patch which copies the `in6_addr' into a `sockaddr_in6'
> and then passes this, hopefully resolving the problem.
> 

Good.
You translated my remarks right.
But you were probably too fast with releasing the new version fix.
You will have to repeat it again. The ntpd plugin patch results in
another problem maybe more serious.


>> The test should be whether the system supports and works with IPV6 and
>> only after successful test we should try to resolve the ipv6 address
>> returned in the ntpd response. In all other case we should use only
>> IPV4.
> 
> `getnameinfo' is defined in RFC2553, named `Basic Socket Interface
> Extensions for IPv6', thus such a test is not reasonable. Besides, a
> system without IPv6 addresses should still be able to resolve IPv6-
> addresses - it's just another QType being set..
> 

You are right.
My evil system is capable to resolve IPV6 addresses.
I verified it because my preferred stratum 1 server tik.cesnet.cz
provided IPV6 address.


I did the same modification on ntpd like you yesterday.
The rest of the remark was that I got rid of the famous "getnameinfo
failed: ai_family not supported" but I found out another problem.
My .rrd directory contains files like (also time_dispersion and
time_offset .rrd files):
delay-::.rrd
delay-::c021:0:100:0:9a99:9999.rrd
delay-0:6e73::6e65:7400:7265:6e74.rrd
delay-0.0.0.0.rrd
delay-100::.rrd
delay-119.121.0.0.rrd
delay-::145.2.0.0.rrd

Moreover the number of files increases.

The values "::" and "0.0.0.0" mean probably that the structure is zero
filled. Some of them look like reference clocks. The rest seems to be
some garbage. I would bet for a text in some of them.

So the conclusion is that you must do some fields validation procedure
before processing IPV6 address.

First I thought that one of the servers returns the garbage.
I checked all of them one by one. No troubles with any of more than 30.
Later I returned to the previous ntp.conf and the garbage appeared again.

Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/time_offset-::.rrd: illegal attempt to update using time 1162408699
when last update time is 1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/time_dispersion-::.rrd: illegal attempt to update using time
1162408699 when last update time is 1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/delay-::.rrd: illegal attempt to update using time 1162408699 when
last update time is 1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/time_offset-0.0.0.0.rrd: illegal attempt to update using time
1162408699 when last update time is 1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/time_dispersion-0.0.0.0.rrd: illegal attempt to update using time
1162408699 when last update time
is 1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/delay-0.0.0.0.rrd:
illegal attempt to update using time 1162408699 when last update time is
1162408699 (minimum one second step)
Nov  1 20:18:20 ls collectd[31163]: rrd_update failed:
ntpd/time_offset-::.rrd: illegal attempt to update using time 1162408699
when last update time is 1162408699 (minimum one second step)


You are probably curious what is in my ntp.conf that is causing these
troubles.

server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org

These entries are in the standard installation of the ntpd in Fedora
Core distribution at least from version FC3 (I have FC4 and FC5 on my
systems). Although I have been using preferred servers near my place for
many years, I have left them there.
"A man with a watch knows what time it is. A man with two watches is
never sure" - http://www.pool.ntp.org/join/configuration.html

It seems that the problem is caused by the combination of the RRD DNS
servers .pool.ntp.org and ntpd server.
I have not found any information about ntpd's behavior in such case. But
it seems it returns invalid information in the query response. It maybe
prepares to switch the servers and returns partially filled structures.
There must be some flag (like REFCLOCK_MASK) which indicates the invalid
content.

I am sorry but I have not found a solution so far. I am glad that I have
found the problem source.


Best regards,
Lubos



More information about the collectd mailing list