[collectd] [PATCH] New plugin - lpar

Aurélien Reynaud collectd at wattapower.net
Tue Sep 7 11:50:09 CEST 2010


Hi Florian,



I saw how you modified my code and I find it indeed much more elegant to
directly use raw counters as they are calculated and reported by the
system.

> Many people expect the CPU usage to be in percent. You can easily
> calculate that in the front-end as
> 
>   percent busy = 100.0 * busy / (busy + idle + <other states>)

However I beg to differ here. I think what most people want is to know
what fraction of the available processing power is being used at a given
moment. In the standard case a percentage fits perfectly, with 100%
usage meaning "the processor I am considering is fully used". It is
however implicit that we are considering the processing power of ONE
processor. This is what the standard cpu plugin already does, and this
is the whole raison d'être of the lpar plugin.

The processing power of an LPAR is not 1 nor even an integer number of
processors, so a percentage won't do: 30% usage of 2.1 CPU is not the
same as 30% usage of 0.3 CPU...

What I am saying here is that we cannot just compute a ratio with the
raw counters, we need to have the final result expressed in CPUs. And
for this we must have one metric of which we know the value both in CPUs
and in counters, so that we can compute a ratio in the frontend and
scale the graphs accordingly.

The only metric I can think of is entitled processor capacity. It can be
reported directly as a gauge, and we can compute the corresponding
processor ticks in the plugin.

> 
> It makes sense with your explanation above. I'd track it in the way
> described above, though, i.e. in terms of "processor ticks not available
> to the partition" rather than in it's absolute form. Does that make
> sense to you?

As I tried to explain above, assuming we report entitlement as a gauge,
we'll need to have it reported somehow as counters also.

Instead of reporting it directly, we could assume :

entitled = syst + user + wait + idle + unav

... but this is not always true. As "uncapped" LPARs can consume more
than they are entitled to, this sum cannot be considered a constant.

We could directly report cpu-entitled as a counter (calculated as
ent_proc_cap * time_diff in the plugin as you already did) but IMHO
there would be little value-added: its only justification would be to
help calculate a ratio together with the "entitled" gauge. Why not
report the ratio directly then?

What I propose instead is to report percent-entitled and entitled as
gauges. Then we could show in the graphs:

- entitled(CPUs)
- consumed(CPUs) = entitled(CPUs) * percent-entitled / 100

(which in itself is already very useful) and then assuming 
  consumed(ticks) = syst(ticks) + user(ticks) + wait(ticks)
                    + idle(ticks)
go into further detail with

- syst(CPUs) = syst(ticks) * consumed(CPUs) / consumed(ticks)
- user(CPUs) = user(ticks) * consumed(CPUs) / consumed(ticks)
- wait(CPUs) = wait(ticks) * consumed(CPUs) / consumed(ticks)
- idle(CPUs) = idle(ticks) * consumed(CPUs) / consumed(ticks)
- unav(CPUs) = max ( 0, entitled(CPUs) - consumed(CPUs))

I will send you patches against ar/lpar implementing this shortly (I
need some time to code, compile and test on production machines).


Please tell me if this suits you or any suggestions you may have.


Regards,

Aurélien Reynaud




More information about the collectd mailing list