[collectd] 5.4 tcp write_graphite gets stuck in closed_wait

ryanL ryan.landry at gmail.com
Fri Oct 18 21:01:22 CEST 2013


i managed to solve this. i don't know why, exactly, but only loading
snmp, syslog, and write_graphite plugins was the culprit. one of the
following has stopped the closed_wait situation, even though i am not
using any of them.

+LoadPlugin syslog
+LoadPlugin cpu
+LoadPlugin interface
+LoadPlugin load
+LoadPlugin memory
+LoadPlugin network

can anyone explain that to me?

On Thu, Oct 17, 2013 at 5:49 PM, ryanL <ryan.landry at gmail.com> wrote:
> heya. i've compiled 5.4 for linux (centos) at commit 0a161fcfd, and
> seem to be having a problem that does not exist at 5.1.
>
> my collectd is pretty barebones, just doing snmp polling against
> network devices every 60s. when first starting it up, i get an
> established TCP connection to my graphite collector and values get
> written. then, we get stuck. i can see in tcpdump that collectd is
> polling the network and getting values, but can't write to graphite.
>
> i see this:
>
> # while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
> grep TCP; done
> collectd 7996 produser   10u  IPv4           36198298      0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser   10u  IPv4           36198298      0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser   10u  IPv4           36198298      0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser   10u  IPv4           36198298      0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser   10u  IPv4           36198298      0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
>
> it stays in this state forever until i restart collectd. upon doing so
> i'll get one initial blast of collected data, and then we're jammed
> again.
>
> my relevant collectd config:
>
> <Plugin write_graphite>
>   <Carbon>
>     Host "graphite-collector"
>     Port "2003"
>     Protocol "tcp"
>     Prefix "collectd."
>     StoreRates false
>     AlwaysAppendDS false
>     Postfix ""
>     EscapeCharacter "_"
>   </Carbon>
> </Plugin>
>
> on the collectd 5.1 box, it remains like this:
>
> $ while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
> grep TCP; done
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root    9u  IPv4          561435650      0t0       TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
>
> any ideas, or further info i can give you guys?
>
> thanks!
>
> ryan



More information about the collectd mailing list