[collectd] Issues with network buffers with Solaris

Eric LeBlanc eleblanc at taleo.com
Wed Jun 4 21:53:46 CEST 2008


Hi,

I installed collectd on a linux server (which create rrd) and on a solaris.  
Both run on 4.4.1.

It seems that there's a limit with buffers in network plugin on Solaris 
client.  Or at least, if we have more than 1024 chars, the remaining is 
discarded (it's my pure guess).

Before all, here is the type.db config:

sol_disk_iops                   read_iops:GAUGE:0:4294967295, 
write_iops:GAUGE:0:4294967295, total_iops:GAUGE:0:4294967295
sol_disk_throughput             read_throughput:GAUGE:0:4294967295, 
write_throughput:GAUGE:0:4294967295, total_throughput:GAUGE:0:4294967295
sol_disk_avg_svctime            total_iowait:GAUGE:0:4294967295, 
total_ioactive:GAUGE:0:4294967295
sol_disk_avg_iops               wait_iops_avg:GAUGE:0:4294967295, 
active_iops_avg:GAUGE:0:4294967295, total_iops_avg:GAUGE:0:4294967295
sol_tcp_stats                   tcpActiveOpen:GAUGE:0:4294967295, 
tcpPassiveOpen:GAUGE:0:4294967295, tcpACurrEstab:GAUGE:0:4294967295, 
tcpInSeg_rate:GAUGE:0:4294967295, tcpOutSeg_rate:GAUGE:0:4294967295
sol_udp_stats                   InDatagrams_rate:GAUGE:0:4294967295, 
OutDatagrams_rate:GAUGE:0:4294967295, InErrors:GAUGE:0:4294967295
sol_iface_stats                 rx_rate:GAUGE:0:4294967295, 
rxerrs:GAUGE:0:4294967295, tx_rate:GAUGE:0:4294967295, 
txerrs:GAUGE:0:4294967295
sol_cpu_process                 running:GAUGE:0:4294967295, 
blocked:GAUGE:0:4294967295, waiting:GAUGE:0:4294967295
sol_cpu_usage                   user:GAUGE:0:100, system:GAUGE:0:100, 
idle:GAUGE:0:100
sol_avail_swap                  free_swap:GAUGE:0:4294967295
sol_free_list                   free_list:GAUGE:0:4294967295
sol_swaping                     swapout:GAUGE:0:4294967295
sol_scanrate                    scanrate:GAUGE:0:4294967295


Here is the example of print to the collectd daemon:


hostXXXXX/cpu_usage/sol_cpu_usage interval=2 N:93:7:0
hostXXXXX/mem_stats/sol_avail_swap interval=2 N:26754864
hostXXXXX/mem_stats/sol_free_list interval=2 N:10950552
hostXXXXX/mem_stats/sol_swaping interval=2 N:0
hostXXXXX/mem_stats/sol_scanrate interval=2 N:0
hostXXXXX/disk_stats_md10/sol_disk_iops interval=2 N:0.0:0.0:0
hostXXXXX/disk_stats_md10/sol_disk_throughput interval=2 N:0.0:0.0:0
hostXXXXX/disk_stats_md10/sol_disk_avg_svctime interval=2 N:0.0:0.0
hostXXXXX/disk_stats_md10/sol_disk_avg_iops interval=2 N:0.0:0.0:0
hostXXXXX/disk_stats_md30/sol_disk_iops interval=2 N:0.0:1.5:1.5
hostXXXXX/disk_stats_md30/sol_disk_throughput interval=2 N:0.0:16.0:16
hostXXXXX/disk_stats_md30/sol_disk_avg_svctime interval=2 N:0.0:9.1
hostXXXXX/disk_stats_md30/sol_disk_avg_iops interval=2 N:0.0:0.0:0
hostXXXXX/disk_stats_md40/sol_disk_iops interval=2 N:0.0:42.9:42.9
hostXXXXX/disk_stats_md40/sol_disk_throughput interval=2 N:0.0:343.2:343.2
hostXXXXX/disk_stats_md40/sol_disk_avg_svctime interval=2 N:2.6:7.2
hostXXXXX/disk_stats_md40/sol_disk_avg_iops interval=2 N:0.1:0.3:0.4
hostXXXXX/iface_stats/sol_iface_stats interval=2 N:317:0:230:0
hostXXXXX/network_stats/sol_tcp_stats interval=2 N:6:0:1165:61:60
hostXXXXX/network_stats/sol_udp_stats interval=2 N:0:0:0

Now, below, take a look at the modification time (6th column), on Linux 
server, which is receiving all statistical data from the Solaris box:

./cpu_usage:
-rw-r--r--  1 root root 745344 Jun  4 15:36 sol_cpu_process.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:36 sol_cpu_usage.rrd
./disk_stats_md10:
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_avg_iops.rrd
-rw-r--r--  1 root root 497384 Jun  4 15:35 sol_disk_avg_svctime.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_iops.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_throughput.rrd
./disk_stats_md30:
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_avg_iops.rrd
-rw-r--r--  1 root root 497384 Jun  4 15:35 sol_disk_avg_svctime.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_iops.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:35 sol_disk_throughput.rrd
./disk_stats_md40:
-rw-r--r--  1 root root 745344 Jun  4 15:21 sol_disk_avg_iops.rrd
-rw-r--r--  1 root root 497384 Jun  4 15:21 sol_disk_avg_svctime.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:21 sol_disk_iops.rrd
-rw-r--r--  1 root root 745344 Jun  4 15:21 sol_disk_throughput.rrd
./iface_stats:
-rw-r--r--  1 root root 993304 Jun  4 15:36 sol_iface_stats.rrd
./mem_stats:
-rw-r--r--  1 root root 249424 Jun  4 15:36 sol_avail_swap.rrd
-rw-r--r--  1 root root 249424 Jun  4 15:36 sol_free_list.rrd
-rw-r--r--  1 root root 249424 Jun  4 15:35 sol_scanrate.rrd
-rw-r--r--  1 root root 249424 Jun  4 15:35 sol_swaping.rrd
./network_stats:
-rw-r--r--  1 root root 1241264 Jun  4 15:35 sol_tcp_stats.rrd
-rw-r--r--  1 root root  745344 Jun  4 15:35 sol_udp_stats.rrd

Sometimes, it's the rrd files from network_stats that aren't updated, sometime 
it's disk_stats_md40 (which is currently the case).

If you want the debug log of the collectd daemon or if you need any others 
details, just ask me.

Thank you very much for your help!

E.
-- 
Eric LeBlanc <eleblanc at taleo.com>
Unix System Administrator
Taleo inc.



More information about the collectd mailing list