[collectd] Max number of UDP sockets per collectd-process (NOT the file descriptor limit)

Teet Talviste teet.talviste at elion.ee
Sun Apr 15 10:51:08 CEST 2012


On Sunday 15 April 2012 00:29:14 David Halko wrote:
> Do you know if your patch has been added to the collectd tree?
no, not yet atleast
> I guess the bottleneck is I/O going to the disk (have you tried
> caching RRD?) or the network connection (have you tried bulkwalk?)
Of course i use rrdcached, but the sheer amount of rrd files i have(over 450k 
in one installation)... 
Network connection speed generally isn't going to be a problem, not for me 
atleast. One should watch out for devices which are just slow to respond, 
though. 
One more thing i've observed and one should consider, would be how the 
interval is implemented: [start snmp requests]->[done with snmp]->[wait 
interval seconds]->[start snmp again]. So when some device takes really long 
time to get all the data, the actual time between data points is much bigger 
than the interval.
> 
> Thanks for your insight!
> 
> On 4/14/12, Teet Talviste <teet.talviste at elion.ee> wrote:
> > Depends, if you use mostly snmp polling with 5min interval, the 
performance
> > impact should be negligible, if any. I use it with 6600+ switches and the
> > bottleneck is still IO.
> > Timing the threads would be rather difficult. Colllectd actually uses a read
> > thread perl host polled... So, be sure to increase the read-thread 
variable
> > if
> > you have slow snmp hosts...
> >> Hi Teet,
> >>
> >> That's a nice little patch!
> >>
> >> What is the performance impact to adding all of those open/close
> >> sessions, per device poll?
> >>
> >> Hi Stian,
> >>
> >> Does this work for you, without breaking up the collection into
> >> smaller polling groups?
> >>
> >> Can you "time" the multiple threads and "time" the single unified
> >> thread, so we can see the user/real/sys time of each scenario?
> >>
> >> Thanks - Dave
> >> http://netmgt.blogspot.com/
> >>
> >> On 4/14/12, Teet Talviste <teet.talviste at elion.ee> wrote:
> >> > You can take a look at this, maybe it helps you
> >> >
> >> >
> > 
https://github.com/frogmaster/collectd/commit/67c4863e0aaadaa103ee07e49a17a1510e8d4eaf
> >> >
> >> >> Found this handy anecdote on
> >> >> http://collectd.org/wiki/index.php/Plugin:SNMP
> >> >>
> >> >> "Maximum number of hosts
> >> >> While collectd and the SNMP plugin don't have any limitation on the
> >> >> number of hosts you can configure, the library used by the SNMP
> >> >> plugin, libnetsnmp, uses the select(2) system call. This system call
> >> >> uses a fixed-size bitfield to hold file descriptors. On many systems
> >> >> this limits the number of hosts you can query with the SNMP plugin to
> >> >> 1024 (for example when using the GNU libc).
> >> >>
> >> >> To solve this issue, the netsnmp library must be changed. A solution
> >> >> would be to switch to the poll(2) system call which doesn't have a
> >> >> static limit on the largest file descriptor it can handle."
> >> >>
> >> >> So my current work-around and using several collectd processes seems
> >> >> to be a permanent one :-)
> >> >>
> >> >> Brgds
> >> >> Stian Øvrevåge
> >> >>
> >> >> On Thu, Apr 12, 2012 at 11:21, Stian Øvrevåge <sovrevage at gmail.com>
> > wrote:
> >> >> > Hi list,
> >> >> >
> >> >> > Banging my head against the wall for weeks now trying to get a
> >> >> > working
> >> >> > medium scale collectd-installation working...
> >> >> >
> >> >> > I thought I had fixed the max number of sockets/connecting when
> >> >> > tuning
> >> >> > /etc/security/limits.conf. It now reads:
> >> >> >
> >> >> >    ulimit -n
> >> >> >    32768
> >> >> >
> >> >> > I have the instances of collectd now. One of them is set to poll 
2300
> >> >> > hosts. Of which an unknown number is offline at any time. I'm
> >> >> > watching
> >> >> > strace as well as netstat and everything seems fine and "netstat
> >> >> > -anop
> >> >> > udp|wc -l" counts the number of udp sockets created until the number
> >> >> > hits about 1092. Here it stalls and syslog logs thousand lines of
> >> >> >
> >> >> >    "Apr 12 11:07:41 collectd-new collectd[1488]: snmp plugin: host
> >> >> > x.y.z: snmp_sess_synch_response failed:"
> >> >> >
> >> >> > within a few seconds. Number of UDP sockets from then on are stable.
> >> >> >
> >> >> > If I also start the other two instances the number of sockets grows
> >> >> > to
> >> >> > 1292. Which leads me to believe that there is a per-process(or
> >> >> > thread?) limit somewhere.
> >> >> >
> >> >> > Information on the internet on the issue is scarce other than the
> >> >> > file
> >> >> > descriptor limit which I believe is unrelated.
> >> >> >
> >> >> > Regards,
> >> >> > Stian Øvrevåge
> >>
> >> _______________________________________________
> >> collectd mailing list
> >> collectd at verplant.org
> >> http://mailman.verplant.org/listinfo/collectd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20120415/55a66076/attachment.html>


More information about the collectd mailing list