[collectd] Version 6: update and call for help

Fri Sep 25 20:11:40 CEST 2020

Hi all,

I have made some progress towards a new major version, collectd 6, and wanted
to give an update. It is now in a state that individual plugins need to be
migrated, and I'm relying on community contributions for this.

The new major version has two fundamental changes from version 5, breaking
backwards compatibility:

1.  Metrics are identified by a name and a set of labels, instead of the
    host/plugin/type schema used by version 5.

    For example, the version 5 metric `example.com/cpu-0/percent-user` of the
    "cpu plugin" will be reported as:
    `cpu_usage_percent{instance="example.com",cpu="0",state="user"}`

2.  Support for distribution metrics has been added by a team of Google interns
    this summer. Distribution metrics allow to aggregate scalar metrics with a
    high variance, such as response latency, with high precision. This is often
    used when measuring latency Service Level Indicators (SLIs).

    This code will be merged to the "collectd-6.0" branch once all remaining
    pull requests of this project have been merged.

The code for 1) is in the "collectd-6.0" branch, the code for 2) is in the
"google-interns-2020" branch. I will merge the distribution metrics into the
"collectd-6.0" branch once the remaining pull requests are merged, hopefully
~end of September.

## Call to action

The next step is to migrate the ~170 plugins to the new API, so that they can
make use of the new features. This is a lot of work and I have to rely on the
community to make meaningful progress here. There is a Github project (similar
to a Kanban board) at [0], which has a tracking bug for each plugin. There is
also a spreadsheet giving an overview at [1].

For "read plugins", this mostly means coming up with good metric names and
labels. Some plugins, for example, encode multiple pieces of information into
one "instance" field, which should be broken up into multiple labels. Ideally,
the metrics produced by collectd plugins would be identical to metrics produced
by the appropriate Prometheus exporter, if one exists. You can use the "cpu"
plugin as a reference.

For "write plugins", this means changing the write callback from accepting a
data_set_t* and value_list_t* to accepting a metric_family_t* instead, and to
handle the new distribution metrics. The distinction between DERIVE and COUNTER
no longer exists and can be removed. You can use the "write_stackdriver" plugin
as a reference.

I think it's likely that not all plugins will be migrated by the time we want
to release version 6 (~end of year). Cleaning out some unmaintained plugins is
probably a good thing overall.

Long story short: if you care about a specific plugin and want to see it in
6.0, please set aside some time in Q4 to help migrate it.

Best regards and happy hacking :)
—octo

[0] https://github.com/orgs/collectd/projects/1
[1] https://docs.google.com/spreadsheets/d/1ss4NJeJ00CwAmGgIRGQ2EJM9LFifPPrJcq_c0tGN_vY/edit
-- 
collectd – The system statistics collection daemon
Website: https://collectd.org/
GitHub:  https://github.com/collectd
Twitter: https://twitter.com/collectd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20200925/0d7d6d13/attachment.sig>