[collectd] 5.4 tcp write_graphite gets stuck in closed_wait
ryanL
ryan.landry at gmail.com
Fri Oct 18 21:01:22 CEST 2013
i managed to solve this. i don't know why, exactly, but only loading
snmp, syslog, and write_graphite plugins was the culprit. one of the
following has stopped the closed_wait situation, even though i am not
using any of them.
+LoadPlugin syslog
+LoadPlugin cpu
+LoadPlugin interface
+LoadPlugin load
+LoadPlugin memory
+LoadPlugin network
can anyone explain that to me?
On Thu, Oct 17, 2013 at 5:49 PM, ryanL <ryan.landry at gmail.com> wrote:
> heya. i've compiled 5.4 for linux (centos) at commit 0a161fcfd, and
> seem to be having a problem that does not exist at 5.1.
>
> my collectd is pretty barebones, just doing snmp polling against
> network devices every 60s. when first starting it up, i get an
> established TCP connection to my graphite collector and values get
> written. then, we get stuck. i can see in tcpdump that collectd is
> polling the network and getting values, but can't write to graphite.
>
> i see this:
>
> # while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
> grep TCP; done
> collectd 7996 produser 10u IPv4 36198298 0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser 10u IPv4 36198298 0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser 10u IPv4 36198298 0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser 10u IPv4 36198298 0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
> collectd 7996 produser 10u IPv4 36198298 0t0
> TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
>
> it stays in this state forever until i restart collectd. upon doing so
> i'll get one initial blast of collected data, and then we're jammed
> again.
>
> my relevant collectd config:
>
> <Plugin write_graphite>
> <Carbon>
> Host "graphite-collector"
> Port "2003"
> Protocol "tcp"
> Prefix "collectd."
> StoreRates false
> AlwaysAppendDS false
> Postfix ""
> EscapeCharacter "_"
> </Carbon>
> </Plugin>
>
> on the collectd 5.1 box, it remains like this:
>
> $ while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
> grep TCP; done
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
> collectd 5638 root 9u IPv4 561435650 0t0 TCP
> 10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
>
> any ideas, or further info i can give you guys?
>
> thanks!
>
> ryan
More information about the collectd
mailing list