[collectd] network plugin: Missing values when under heavy load

Florian Forster octo at verplant.org
Tue Jan 22 16:05:27 CET 2008


Hello everybody,

I've heard complaints about this once in a while but was never able to
reproduce the problem - until just now: When under very heavy load (>200
RRD-files updated each second) some data points appear to go missing
which results in gaps in the generated graphs.

The reason, at least for me, is that the socket buffer is too small so
that further incoming packets are discarded. A quick and dirty fix for
this is to increase the default buffer size, under Linux (and
potentially other UNIXes as well) this can be done using sysctl(8):
 # sysctl -w net.core.rmem_max=16777216
 # sysctl -w net.core.rmem_default=1048576
This will increase the maximum allowed buffer size to 16MByte and the
default buffer size to 1MByte which was big enough for me.

The problem is that there is one thread to receive the data _and_
dispatch it to write plugins in the network plugin. If this thread takes
too much long to dispatch values the buffer fills up.
Since the rrdtool plugin and the unixsock plugin were both loaded then
this problem occured I cannot say which one of them is too slow or if
it's the combination of the two. I'll investigate that, though.

I see the following options to solve this problem in a more elegant way
than to fiddle with (global) system parameters. Of course, these options
should not be seen as mutually exclusive:
- Use one or more separate threads in the network plugin to dispatch the
  values to write functions. This would effectively move the buffer from
  the OS to the daemon where it could grow dynamically.
  Alternatively one could think about multiple receive threads which
  dispatch values themselves, or a combination of the two ideas. I think
  this would target the problem at hand best.
- Move that buffer into the global `dispatch' function, i. e. have a
  write thread which reads values from a queue and dispatches them to
  write plugins. 
- Use the `SO_RCVBUF' (and/or `SO_RCVBUFFORCE', Linux specific) socket
  option. This should be a user option, of course. This would require
  that the administrator sets the upper bound for this value high
  enough, but I assume that people with this kind of problem are capable
  of that. The Linux version could use `SO_RCVBUFFORCE' to override the
  maximum value without further administrative interaction.
- Improve the speed of the write functions in the rrdtool and/or
  unixsock plugins.
  Possibly this could be done by using some kind of `locked' flag
  instead of locking the entire cache for the entire time of the update.

I'd be grateful if someone with the above problem could give a short
feedback whether the above ``fix'' eliminated that gaps in his case and
which values he chose for the buffer size.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20080122/5f1308e9/attachment.pgp 


More information about the collectd mailing list