[collectd] 5.4 tcp write_graphite gets stuck in closed_wait
Michael Hart
michael.hart at arcticwolf.com
Fri Oct 18 15:00:55 CEST 2013
This is similar, but not quite the same behaviour as the bug I've got open, also with write_graphite and TCP behaviour. Mine is that write_graphite doesn't recover if the graphite carbon-cache is restarted. (https://github.com/collectd/collectd/issues/430).
I know this isn't the answer, but switching to UDP is a workaround. I don't like the perceived unreliability though, as I'm quite dependant on the metrics being available.
cheers
mike
--
Michael Hart
Arctic Wolf Networks
M: 226.388.4773
On 2013-10-17, at 8:49 PM, ryanL <ryan.landry at gmail.com<mailto:ryan.landry at gmail.com>> wrote:
heya. i've compiled 5.4 for linux (centos) at commit 0a161fcfd, and
seem to be having a problem that does not exist at 5.1.
my collectd is pretty barebones, just doing snmp polling against
network devices every 60s. when first starting it up, i get an
established TCP connection to my graphite collector and values get
written. then, we get stuck. i can see in tcpdump that collectd is
polling the network and getting values, but can't write to graphite.
i see this:
# while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
grep TCP; done
collectd 7996 produser 10u IPv4 36198298 0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser 10u IPv4 36198298 0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser 10u IPv4 36198298 0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser 10u IPv4 36198298 0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser 10u IPv4 36198298 0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
it stays in this state forever until i restart collectd. upon doing so
i'll get one initial blast of collected data, and then we're jammed
again.
my relevant collectd config:
<Plugin write_graphite>
<Carbon>
Host "graphite-collector"
Port "2003"
Protocol "tcp"
Prefix "collectd."
StoreRates false
AlwaysAppendDS false
Postfix ""
EscapeCharacter "_"
</Carbon>
</Plugin>
on the collectd 5.1 box, it remains like this:
$ while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
grep TCP; done
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root 9u IPv4 561435650 0t0 TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
any ideas, or further info i can give you guys?
thanks!
ryan
_______________________________________________
collectd mailing list
collectd at verplant.org<mailto:collectd at verplant.org>
http://mailman.verplant.org/listinfo/collectd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20131018/14f12fff/attachment.html>
More information about the collectd
mailing list