[collectd] RFC: Changes to data sources and naming schema

Mon Sep 23 21:42:23 CEST 2013

Hi list,

First of all, shout-out to octo and jeremy katz for making the hackathon
happen, great stuff and a great opportunity to meet you all.

Here are my answers and comments:

1) OK. Fully in favor, I don't think the extra disk space will be much of a
problem, it will greatly simplify the API.
2) OK. My gut initially said no, but rather because I hadn't wrapped my
head around the fact that gauge still was there and provided all necessary
information. I churned to find use cases where this would be interesting to
have.
3) I'm am strongly in favor of solution 2, because it is the one that would
allow the most flexible way of interacting with other outputs than rrdtool
and graphite. Resolving to something ressembling a path name is a task that
concerns mostly:
  - the csv output plugin
  - the rrd output plugin
  - the write_graphite output plugin

I think there is a way to make this work out for these plugins as well as
discussed saturday.

The proposed way of doing it was to have plugins hint at the way a name
could be construed. The clear advantage of this is approach is that an
internal mangling DSL could use the fields and
it would ease interop with tools such as riemann, logstash or librato.

Serialisation is another debate :)

Cheers,
  - pyr

On Mon, Sep 23, 2013 at 8:12 PM, Florian Forster <octo at collectd.org> wrote:

> [TLDR: Do you have a use-case for raw counter values?]
>
> Good morning everybody,
>
> we had a great time at the Hackathon [0] in Berlin yesterday. Thanks
> again to everyone!
>
> Amongst the ideas we discussed were some fundamental changes to the way
> metrics are represented. These ideas might eventually result in a
> collectd version 6, but hold you breath just yet – no actual coding has
> been done in that direction, we're just collecting design ideas at the
> moment.
>
>
> 1) Get rid of multiple "data sources" per metric.
>
> Some metrics, e.g. the "if_octets" metrics from the "interface" plugin
> and the "load" metric from the "load" plugin have multiple "data
> sources". The "if_octets" metrics has data sources "rx" and "tx" for
> received and transmitted bytes.
>
> We would like to remove this functionality altogether. Rather than one
> metric with two values, we would like the "interface" plugin to create
> two metrics with one value each. Since version 5.0 this is mostly how
> metrics are defined and only few cases are left, now we would like to
> actually remove the functionality. We reached a consensus on this so
> it's essentially a done deal.
>
> Pro:
>
>   * A lot of collectd code becomes a lot easier (less bugs)
>   * A lot of front-end and graphing code becomes a lot easier (more
>     and better front-ends)
>   * Mapping of collectd metrics to names used by other systems,
>     e.g. Graphite, is easier / more consistent
>   * Splitting up existing RRD files by "data source" is a solved
>     problem; writing a migration script is fairly simple
>   * A point which causes much confusion for new users is resolved
>
> Contra:
>
>   * Building a backwards compatibility layer for this is going to be
>     hard
>
>
> 2) Calculate the rate of counters / DERIVEs early on and after that only
>    handle gauge values.
>
> Right now, values come in four flavors: GAUGE and DERIVE, and two more
> special cases which are hardly ever used. These numbers are passed
> through the daemon as they are, i.e.:
>   * The CPU plugin gets a counter of how many ticks / jiffies the CPU
>     has spent in user mode since some unspecified time in the past.
>   * This number if "dispatched" as a DERIVE type value.
>   * The output plugins will write this absolute number.
>
> However, in the case of DERIVE (and COUNTER) values these actual
> absolute numbers are meaningless. In order to do anything meaningful
> with them, the difference between two values (and their respective
> times) is calculated, which results in the averaged _rate_ of change.
> This is what output plugins do if they have an enabled "StoreRates"
> setting. But not only there: Threshold checking, scaling, aggregation;
> all of these operate on the _rate_ rather than the absolute number.
>
> We would like to change the way DERIVEs are handled within collectd:
> Instead of keeping the original absolute values, we would like to
> calculate the rate as early as possible, possibly within the read
> plugins, and only handle the rate form there on.
>
> We only came up with one use case where having the raw counter values is
> beneficial: If you want to calculate the average rate over arbitrary
> time spans, it's easier to look up the raw counter values for those
> points in time and go from there. However, you can also sum up the
> individual rates to reach the same result. Finally, when handling
> counter resets / overflows within this interval, integrating over /
> summing rates is trivial by comparison.
>
> Do you have any other use-case for raw counter values?
>
> Pro:
>
>   * Handling of values becomes easier.
>   * The rate is calculated only once, in contrast to potentially several
>     times, which might be more efficient (currently each rate conversion
>     involves a lookup call).
>   * Together with (1), this removes the need for having the "types.db",
>     which could be removed then. We were in wild agreement that this
>     would be a worthwhile goal.
>
> Contra:
>
>   * Original raw value is lost. It can be reconstructed except for a
>     (more or less) constant offset, though.
>
>
> 3) Changes to the naming schema.
>
> This we discussed the most and the most diverse. Currently, collectd has
> a very static naming schema consisting of host, plugin, type and two
> optional fields, "plugin instance" and "type instance". This works well
> in many cases, but has some drawbacks and limitations. For example, the
> Varnish plugin puts the Varnish server and the subcomponent into the
> "plugin instance", which is not ideal.
>
> We discussed two alternatives:
>
>   * Use a path (or, expressed more sciency, an ordered list of strings)
>     to identify metrics. A CPU metric could look like this:
>
>       "/example.com/cpu/0/idle"
>
>   * Use an (unordered) set of key-value pairs to identify metrics. You
>     can think of this as a JSON object that only has string members, if
>     you like. We would likely make at least two fields mandatory, for
>     example "source" (or "host") and "metric" (or "name"). A CPU metric
>     could looke like this, for example:
>
>       {
>         "source": "example.com", // required
>         "metric": "cpu usage",   // required
>         "cpu-id": "0",           // optional
>         "cpu-state": "idle"      // optional
>       }
>
> When filtering or aggregating, the first option would require to use
> indexes, for example "get metrics where index 0 is 'example.com'". Here,
> "index 0" refers to the "source". The second alternative would allow us
> to use names (rather than indexes) to refer to a specific part of the
> name, e.g. "get metrics where 'source' is 'example.com'".
>
> I'd love to hear what people think about the entire topic of naming.
> There are good reasons for either schema and there are also good reasons
> for staying with the current concept and live with its flaws. Which
> schema would meet your needs best and why? What are those needs?
>
> Best regards,
> —octo
>
>
> [0] <https://collectd.org/wiki/index.php/Hackathon_2013_Berlin>
> --
> collectd – The system statistics collection daemon
> Website: http://collectd.org
> Google+: http://collectd.org/+
> GitHub:  https://github.com/collectd
> Twitter: http://twitter.com/collectd
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIcBAEBAgAGBQJSQISTAAoJEFCaZHVExRxoE3kP/10l+4xZW5Fxnu0joUESJJnP
> XSeC+SPy+i84jqgeB03Rmyy4GEyjqfHfrGaVwF56OiivhUgisU7DtiCppCyOl4aA
> xBM6z1L21QAcxgKh1VEfZJdhQtO1g3lyBK+ofA+S7RbU/k4+TNGeztx6dFpqoypJ
> wwtowpyLSsGbCzUmMeibWRjjuUhPrK0M8hnrlR9zG0wi3+Jp31wUfK0eKwcB4LIU
> aOPlu3Yp1JFmSNpfPEKS9ZAc6tmRvEqyOpRC6ujidEd/8+qGtgLaADfh9KxgTzwv
> 6BtufwXQ+a/g31Gbs0pg0JZp1XCIwJ9EtBN3FdfM2pSThMnrrIBRiveZ/1Oh92uk
> z3mEiZOR/b8Np8G6Xu50TC8zZU7mO1kVlbCAN1e+pJZn0sdzmctc3tRBbASe77CF
> BLiVQ/XpQe037DJrwFCnfPgxqbPxf6tGsDyhIW3gf5aSQUUIOlTppqVmfds+8NhM
> 7B87YxCuXa9XHNZSjZT6F3iUAqbGhye5Niwn4U39DPmM5dfT4ufwsfUJBpOhD2JK
> T/WUCZZJQ006yW0BYTHinn3eyU3tlIwWiamqw7YQq4DPIHvB/AtT4AeMRb0x2rUZ
> 36HFXWhFTsq6FzRS/0KGPly0SLBPlz63ibY3vdpJYDMlqphF163T8jt50HMaCzzY
> 2gcNwILoHn7InuXfnKXt
> =XJpV
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> collectd mailing list
> collectd at verplant.org
> http://mailman.verplant.org/listinfo/collectd
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20130923/95307260/attachment-0001.html>