[collectd] Finding corrupted RRD files (was: Collect scalability)

Florian Forster octo at verplant.org
Thu Jan 8 09:37:46 CET 2009


Hi,

On Wed, Jan 07, 2009 at 01:31:41PM -0600, Jason wrote:
> We have recently been experiencing crashes which, though I have not
> yet thoroughly investigated, may be similar to this.  Do you have a
> script handy which would validate rrd files in mass and determine
> which ones may be the offenders?

I looked at the logs to find out the precise time when collectd started
crashing. I then searched for a file that had not been updated since
about that time.

On Thu, Jan 08, 2009 at 02:32:49PM +1100, Lindsay Holmwood wrote:
> Running file against corrupted RRDs has worked for me in the past.
> Valid RRDs will show up with "RRDTool DB version 0003", while
> corrupted ones will show "data".

Didn't try that, but `rrdtool info' did suffer from the same problem. If
you want to be sure, `rrdtool fetch' is probably the most thorough
method to test for an RRD file's integrity.

So what I did was something like:

  find /var/lib/collectd/rrd -name '*.rrd' -mtime 4 | while read FILE
  do
    rrdtool fetch "$FILE" AVERAGE >/dev/null || echo "$FILE"
  done

`-mtime 4' only narrows down the files to be examined with `4' being the
value learned from reading the logs. If you don't know this, you should
at least exclude all files that have been modified during the last
`2 * $CacheTimeout' seconds - hopefully that'll be the vast majority of
files.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20090108/eddbe876/attachment.pgp 


More information about the collectd mailing list