[collectd] 5.4 tcp write_graphite gets stuck in closed_wait

Michael Hart michael.hart at arcticwolf.com
Fri Oct 18 15:00:55 CEST 2013


This is similar, but not quite the same behaviour as the bug I've got open, also with write_graphite and TCP behaviour. Mine is that write_graphite doesn't recover if the graphite carbon-cache is restarted. (https://github.com/collectd/collectd/issues/430).

I know this isn't the answer, but switching to UDP is a workaround. I don't like the perceived unreliability though, as I'm quite dependant on the metrics being available.

cheers
mike

--
Michael Hart
Arctic Wolf Networks
M: 226.388.4773

On 2013-10-17, at 8:49 PM, ryanL <ryan.landry at gmail.com<mailto:ryan.landry at gmail.com>> wrote:

heya. i've compiled 5.4 for linux (centos) at commit 0a161fcfd, and
seem to be having a problem that does not exist at 5.1.

my collectd is pretty barebones, just doing snmp polling against
network devices every 60s. when first starting it up, i get an
established TCP connection to my graphite collector and values get
written. then, we get stuck. i can see in tcpdump that collectd is
polling the network and getting values, but can't write to graphite.

i see this:

# while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
grep TCP; done
collectd 7996 produser   10u  IPv4           36198298      0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser   10u  IPv4           36198298      0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser   10u  IPv4           36198298      0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser   10u  IPv4           36198298      0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)
collectd 7996 produser   10u  IPv4           36198298      0t0
TCP 10.1.12.2:53798->10.101.3.213:2003 (CLOSE_WAIT)

it stays in this state forever until i restart collectd. upon doing so
i'll get one initial blast of collected data, and then we're jammed
again.

my relevant collectd config:

<Plugin write_graphite>
 <Carbon>
   Host "graphite-collector"
   Port "2003"
   Protocol "tcp"
   Prefix "collectd."
   StoreRates false
   AlwaysAppendDS false
   Postfix ""
   EscapeCharacter "_"
 </Carbon>
</Plugin>

on the collectd 5.1 box, it remains like this:

$ while sleep 1; do pgrep collectd | xargs sudo /usr/sbin/lsof -Pnp |
grep TCP; done
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)
collectd 5638 root    9u  IPv4          561435650      0t0       TCP
10.101.3.9:51249->10.101.3.213:2003 (ESTABLISHED)

any ideas, or further info i can give you guys?

thanks!

ryan

_______________________________________________
collectd mailing list
collectd at verplant.org<mailto:collectd at verplant.org>
http://mailman.verplant.org/listinfo/collectd

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20131018/14f12fff/attachment.html>


More information about the collectd mailing list