[collectd] collectd 4.2.4 network issues with Solaris 8

Eric LeBlanc eleblanc at taleo.com
Thu Feb 14 18:10:06 CET 2008


Hi,

There you go:

I opened gdb:
---------------------
Core was generated by `./collectd'.
Program terminated with signal 10, Bus error.
#0  0xffffffff7ae026fc in write_part_number (ret_buffer=0xffffffff796090a0, 
ret_buffer_len=0xffffffff796090a8, type=1, value=1202937442) at network.c:340
340             pn.head->type = htons (type);


I did a backtrace:
---------------------------
(gdb) bt
#0  0xffffffff7ae026fc in write_part_number (ret_buffer=0xffffffff796090a0, 
ret_buffer_len=0xffffffff796090a8, type=1, value=1202937442) at network.c:340

#1  0xffffffff7ae04c20 in add_to_buffer (buffer=0xffffffff7af07335 "", 
buffer_size=1011, vl_def=0xffffffff7af07160, type_def=0xffffffff7af07738 "",
    ds=0x100184f90, vl=0xffffffff796092b0) at network.c:1098

#2  0xffffffff7ae05160 in network_write (ds=0x100184f90, 
vl=0xffffffff796092b0) at network.c:1178

#3  0x000000010000cf74 in plugin_dispatch_values 
(name=0xffffffff7bc01ab8 "df", vl=0xffffffff796092b0) at plugin.c:686

#4  0xffffffff7bc014d8 in df_submit (df_name=0xffffffff796098f0 "root", 
df_used=1489756160, df_free=1591100416) at df.c:133

#5  0xffffffff7bc01898 in df_read () at df.c:200
#6  0x000000010000b644 in plugin_read_thread (args=0x0) at plugin.c:184
#7  0xffffffff7d61ece0 in _thread_start () from /usr/lib/64/libthread.so.1
#8  0xffffffff7d61ece0 in _thread_start () from /usr/lib/64/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


The output of --enable-debug:
-----------------
[xxxxxxxxx at rootasp]: ./collectd
[2008-02-14 11:43:12] type = memory
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/memory.so
[2008-02-14 11:43:12] type = network
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/network.so
[2008-02-14 11:43:12] type = ntpd
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/ntpd.so
[2008-02-14 11:43:12] type = swap
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/swap.so
[2008-02-14 11:43:12] type = syslog
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/syslog.so
[2008-02-14 11:43:12] type = users
[2008-02-14 11:43:12] file = /opt/collectd/lib/collectd/users.so
[2008-02-14 11:43:12] type = network, key = Server, value = xxxxxxxxx 23826
[2008-02-14 11:43:12] node = qcrrd1.asp.rsft.net, service = 23826
[2008-02-14 11:43:12] return (0)
[2008-02-14 11:43:12] type = network, key = TimeToLive, value = 128
[2008-02-14 11:43:12] return (0)
[2008-02-14 11:43:12] type = network, key = Forward, value = false
[2008-02-14 11:43:12] return (0)
[2008-02-14 11:43:12] type = network, key = CacheFlush, value = 1800.000000
[2008-02-14 11:43:12] return (0)
[2008-02-14 11:43:12] hostname_g = xxxxx;
[2008-02-14 11:43:12] interval_g = 10;
[xxxxxxxx at rootasp]:

Do you need anyting else?

Thanks!

E.

On Wednesday 13 February 2008 15:34, Florian Forster wrote:
> Hi Eric,
>
> On Tue, Feb 12, 2008 at 11:37:16AM -0500, Eric LeBlanc wrote:
> > I got three errors that I easily fixed by modifing the source code:
>
> thanks for the hints/fixes, I'll apply them shortly. :)
>
> > The parameter of isspace() function *really* want an integer on
> > Solaris...  It seems that we must cast explicitly.
>
> Okay, weird, but what the heck, it doesn't break anything ;)
>
> > unixsock.c: In function `us_handle_client':
> > unixsock.c:615: warning: control reaches end of non-void function
>
> It's a bit of a catch-22 really - if you add a `return (NULL);' there
> other compilers (the Sun CC for example) complain that ``a statement is
> never reached''.. *sigh*
>
> > Here the output of the pstack of the core file:
> > =====================================================================
> > [xxxxx1 at rootasp]: pstack core
> > -----------------  lwp# 4 / thread# 5  --------------------
> >  ffffffff79d024c4 write_part_number (ffffffff786090c0, ffffffff786090c8,
> > 1, 47b1c628, 8, ff0000) + c8
> >  ffffffff79d04694 add_to_buffer (ffffffff79e068a8, 400, ffffffff79e06680,
> > ffffffff79e06cb8, 10012bb90, ffffffff786092b0) + 108
> >  ffffffff79d04ba0 network_write (10012bb90, ffffffff786092b0, 0,
> > ffffffff7aa014d0, 0, 0) + d8
> >  000000010000c9cc ???????? (ffffffff7aa01ab8, ffffffff786092b0,
> > ffffffffffffffff, 0, 6466, ffffffff7860938c)
> >  ffffffff7aa014d0 df_submit (ffffffff786098f0, 100181520,
> > ffffffffffffffff, 0, 2f000001, ffffffff786098f0) + 1dc
> >  ffffffff7aa01890 df_read (0, 0, ffffffff7d720000, 0, 0, 0) + 3ac
> >  000000010000b1c4 ???????? (0, ffffffff7d5093a1, 0, 0, 0, 1000)
> >  ffffffff7d61ecd8 _thread_start (0, 0, 0, 0, 0, 0) + 40
>
> Since all other processes are either sleeping or waiting on some mutex
> it's likely that the problem is in the network plugin, but I'm afraid I
> can hardly tell what's going on here. Would you re-compile with
> ``--enable-debug'' to get backtrace with debugging sumbols? Also adding
> `-O0' to the CFLAGS would help - the `write_part_number' originally only
> has four arguments, apparently the compiler is passing something else on
> the stack here..
>
> > If needed, I can provide you a core file.
>
> Sure, go ahead - though I doubt I can use it much.. But it's worth a try
> ;)
>
> Regards,
> -octo

-- 
Eric LeBlanc <eleblanc at taleo.com>
Unix System Administrator
Taleo inc.



More information about the collectd mailing list