[collectd] Max number of UDP sockets per collectd-process (NOT the file descriptor limit)

David Halko davidhalko at gmail.com
Sat Apr 14 23:29:14 CEST 2012


Do you know if your patch has been added to the collectd tree?

I guess the bottleneck is I/O going to the disk (have you tried
caching RRD?) or the network connection (have you tried bulkwalk?)

Thanks for your insight!

On 4/14/12, Teet Talviste <teet.talviste at elion.ee> wrote:
> Depends, if you use mostly snmp polling with 5min interval, the performance
> impact should be negligible, if any. I use it with 6600+ switches and the
> bottleneck is still IO.
> Timing the threads would be rather difficult. Colllectd actually uses a read
> thread perl host polled... So, be sure to increase the read-thread variable
> if
> you have slow snmp hosts...
>> Hi Teet,
>>
>> That's a nice little patch!
>>
>> What is the performance impact to adding all of those open/close
>> sessions, per device poll?
>>
>> Hi Stian,
>>
>> Does this work for you, without breaking up the collection into
>> smaller polling groups?
>>
>> Can you "time" the multiple threads and "time" the single unified
>> thread, so we can see the user/real/sys time of each scenario?
>>
>> Thanks - Dave
>> http://netmgt.blogspot.com/
>>
>> On 4/14/12, Teet Talviste <teet.talviste at elion.ee> wrote:
>> > You can take a look at this, maybe it helps you
>> >
>> >
> https://github.com/frogmaster/collectd/commit/67c4863e0aaadaa103ee07e49a17a1510e8d4eaf
>> >
>> >> Found this handy anecdote on
>> >> http://collectd.org/wiki/index.php/Plugin:SNMP
>> >>
>> >> "Maximum number of hosts
>> >> While collectd and the SNMP plugin don't have any limitation on the
>> >> number of hosts you can configure, the library used by the SNMP
>> >> plugin, libnetsnmp, uses the select(2) system call. This system call
>> >> uses a fixed-size bitfield to hold file descriptors. On many systems
>> >> this limits the number of hosts you can query with the SNMP plugin to
>> >> 1024 (for example when using the GNU libc).
>> >>
>> >> To solve this issue, the netsnmp library must be changed. A solution
>> >> would be to switch to the poll(2) system call which doesn't have a
>> >> static limit on the largest file descriptor it can handle."
>> >>
>> >> So my current work-around and using several collectd processes seems
>> >> to be a permanent one :-)
>> >>
>> >> Brgds
>> >> Stian Øvrevåge
>> >>
>> >> On Thu, Apr 12, 2012 at 11:21, Stian Øvrevåge <sovrevage at gmail.com>
> wrote:
>> >> > Hi list,
>> >> >
>> >> > Banging my head against the wall for weeks now trying to get a
>> >> > working
>> >> > medium scale collectd-installation working...
>> >> >
>> >> > I thought I had fixed the max number of sockets/connecting when
>> >> > tuning
>> >> > /etc/security/limits.conf. It now reads:
>> >> >
>> >> >    ulimit -n
>> >> >    32768
>> >> >
>> >> > I have the instances of collectd now. One of them is set to poll 2300
>> >> > hosts. Of which an unknown number is offline at any time. I'm
>> >> > watching
>> >> > strace as well as netstat and everything seems fine and "netstat
>> >> > -anop
>> >> > udp|wc -l" counts the number of udp sockets created until the number
>> >> > hits about 1092. Here it stalls and syslog logs thousand lines of
>> >> >
>> >> >    "Apr 12 11:07:41 collectd-new collectd[1488]: snmp plugin: host
>> >> > x.y.z: snmp_sess_synch_response failed:"
>> >> >
>> >> > within a few seconds. Number of UDP sockets from then on are stable.
>> >> >
>> >> > If I also start the other two instances the number of sockets grows
>> >> > to
>> >> > 1292. Which leads me to believe that there is a per-process(or
>> >> > thread?) limit somewhere.
>> >> >
>> >> > Information on the internet on the issue is scarce other than the
>> >> > file
>> >> > descriptor limit which I believe is unrelated.
>> >> >
>> >> > Regards,
>> >> > Stian Øvrevåge
>>
>> _______________________________________________
>> collectd mailing list
>> collectd at verplant.org
>> http://mailman.verplant.org/listinfo/collectd



More information about the collectd mailing list