[collectd] On rrdtool CacheTimeout

Trent W. Buck twb-mailman-collectd at cyber.com.au
Thu Jul 7 06:15:52 CEST 2011


I rolled out collectd 4.8, running in each of about 20 LXC jails.  The
I/O load was crippling the server, so I reduced polling with "Interval
60" in each jail.

However I would prefer to

 - poll every 10s (the default);

 - batch write RRDs, such that any given RRD is only written once
   every ten minutes; and

 - distribute these writes more-or-less evenly over time, i.e. avoid a
   huge I/O spike on the tenth minute.

IIUC that means I should use this config:

    Interval      10
    CacheTimeout  600
    RandomTimeout 300

But this doesn't seem to be improving matters.  Polling dm-25 (the
filesystem mounted at /var/lib/collectd) every ten seconds, before the
change:

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    dm-25           163.70        16.80      1292.80        168      12928
    dm-25            88.90         4.00       707.20         40       7072
    dm-25           279.20         8.80      2224.80         88      22248
    dm-25           302.40         8.80      2410.40         88      24104
    dm-25           122.40         7.20       972.00         72       9720
    dm-25           386.90         7.20      3088.00         72      30880

After the change:

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    dm-25            33.20        11.20       254.40        112       2544
    dm-25           292.10        12.00      2324.80        120      23248
    dm-25           297.50        15.20      2364.80        152      23648
    dm-25            41.90         8.80       326.40         88       3264
    dm-25           428.30         4.00      3422.40         40      34224
    dm-25           283.60         4.80      2264.00         48      22640

Is that wrong, or am I just expecting too much of buffered rrd writes?



More information about the collectd mailing list