[collectd] disk_time broken for Linux

Mon May 30 02:20:07 CEST 2011

Hi,

when trying to find out what exactly disk_time is supposed to do/be I
stumbled upon this post.
Because there were no replies I looked at the code to figure out myself
what is happening.

Thorsten von Eicken wrote:
> But under Linux, the code very clearly calculates the average time
> spent per I/O read and write, which is nice, but which must be
> represented as a GAUGE, not DERIVE. Am I missing something here or is
> it indeed broken?

It seems like it's kind of both:
if (diff_read_ops != 0)
  ds->avg_read_time += (diff_read_time + (diff_read_ops / 2))
      / diff_read_ops;

The part right of "+=" calculates the average amount of time per read
operation in the last update interval - it's basically
diff_read_time/diff_read_ops, but round up for values >= x.5.
But because of the "+=" this value is *added* to the "global"
avg_read_time counter, so it's a DERIVE and not a GAUGE.
Dunno if this really makes sense, because, unlike the other disk related
values (disk_octets etc) time doesn't play a role anymore in the values
a write plugin receives.
E.g. for disk_octets the write plugin can take one value and its
timestamp (v_1 and t_1) and a later value (not necessarily from the very
next update) (v_i and t_i) and calculate the average number of
octets/second with (v_i-v_1)/(t_i-t_1) - this kind of calculation
doesn't make any sense for disk_time, because dividing its value by the
the time doesn't produce a meaningful value - if I didn't miss anything.
You can only use v_i-v_{i-1} to get the last "avg amount of time per
read operation" value, t_i and t_{i-1} won't help you.
(v_i-v_{i-j})/(j+1) may give the average value for the last j update
intervals, but this implies that you know j (which, unlike the time,
isn't saved within the valuelist).

I hope this made sense and was helpful.

Cheers,
- Daniel