[collectd] [PATCH] lpar plugin: use pool_idle_time to account for cpu pool usage

Sun Sep 26 14:38:40 CEST 2010

Hi Florian,

> > The current implementation uses pool_busy_time (expressed in ns) but
> > experience shows this metric isn't accurate: It shows lower cpu usage
> > for the entire pool than the sum of the participating lpars.
> > Using pool_idle_time (expressed in clock ticks) in contrast is almost
> > a perfect match.
> 
> thanks for the update! :) So what you're saying is that "busy + idle"
> may not be equal to "max"?

Not quite, just that the calculations that the plugin did with the
pool_busy_time parameter did not give the expected result. This may be
because the calculations are somehow wrong or because the parameter
itself accounts for something we are not aware of (maybe power saving,
as you suggested).

I suspect the calculations, as pool_busy_time and pool_idle_time are
expressed in different units, and though the calculations are the
same...

>  If so, What happens to the missing CPU
> cycles? Would it make sense to keep track of this separately? Something
> like "missing = max - (idle + busy)" could be used, for example.
> 
> I think I remember something about ticks varying in the time they
> consume, due to power-saving facilities built into the CPUs. This would
> explain why the (physical) CPU time available to the cluster is measured
> in nanoseconds rather than ticks. Also, if there are more and shorter
> ticks in the same wallclock time due to power-saving measures, this
> would explain the perceived lower CPU usage when converting the ns back
> to ticks using a larger "ns per tick" constant. So maybe the "missing"
> metric above could be named "power_save". What do you think?
> 
> Regarding the patch, I'd like to propose one tweak:
> 
> > -#define NS_TO_TICKS(ns) ((ns) / XINTFRAC)
> > [...]
> > +		pool_idle_cpus = (double) (lparstats.pool_idle_time - lparstats_old.pool_idle_time) / XINTFRAC / (double) ticks;
> 
> I'd really like to keep this macro: "diff / XINTFRAC / ticks" doesn't do
> a good job at describing to the reader what's going on. With the macro
> this becomes "NS_TO_TICKS (diff) / ticks": you can see without looking
> at the macro's implementation that "diff" is converted from nanoseconds
> to ticks and then divided by ticks, which results in a ratio.

I totally agree with you. I removed I macro not because I didn't like it
but because I didn't know how to properly name it: according to
libperfstat.h pool_idle_time is in 'clock ticks' (which is not the same
as physical processor ticks by a factor of XINTFRAC). I didn't want to
name the macro TICKS_TO_TICKS()...
By the way, this seems to indicate that the calculation I used to
convert ns to processor ticks was wrong, which in turn could explain why
the graphs didn't match.

Regards,

Aurélien Reynaud