[collectd] Processes plugin

Thu Jan 14 02:55:04 CET 2010

Since the 4.9.0 upgrade, I see this popping up on all of my boxes:

Jan 13 20:35:39 server collectd[8501]: rrdtool plugin: rrd_update_r
(/var/lib/collectd/rrd/server/processes-httpd/ps_disk_octets.rrd)
failed: not a simple integer: '-1719325917'

It's not happening every collectd interval but it looks like once
every 3-7 intervals, and always seems to be the ps_disk_octets metric.

Here's a grab of non-"nan" for "processes-httpd/ps_disk_octets.rrd MAX":

1263426420: 8.9397714667e+06 8.8867349333e+06
1263426480: 8.9397714667e+06 8.8867349333e+06
1263426540: 2.8722361756e+06 2.8611358122e+06
1263426600: 2.8722361756e+06 2.8611358122e+06
1263429840: 6.9840988933e+06 6.9629671933e+06
1263429900: 6.9840988933e+06 6.9629671933e+06
1263429960: 4.1813118933e+06 4.1651611100e+06
1263430020: 2.4956140633e+06 2.4853952000e+06
1263432180: 1.5469198067e+07 1.5460964333e+07
1263432240: 3.5285845300e+06 3.5101065467e+06

There are big holes there and the 'nan' rows are about 60% of the
file. The biggest recorded value is 28858203.767. The lowest number
reported in the error message is -2147483522 (ranges all the way up to
-5). Presumably something's overflowing :)

Other background: These are all Debian Etch, running collectd 4.9.0.
They're all 32-bit boxes, all running fairly new linux kernels, all
with "CONFIG_TASK_IO_ACCOUNTING=y". The example above is from a box
running 2.6.32.3, but I see this happening on other boxes regardless
of the kernel (even down to 2.6.29.x and beyond).

The above example is a pretty heavily loaded web server. Though it's
serving *only* read-only web traffic, it does write a good deal of
logs out, so it's not impossible for it to have very high IO numbers.

This isn't a big deal, just a minor annoyance, but I figured I'd mention it.