[collectd] Detecting drive failure with collectd and the SMART plugin?
Andrew J. Leer
helpdeskaleer at gmail.com
Tue Dec 5 16:43:00 CET 2017
I setup an influxdb that receives input from several drives on a machine
running collectd and it's smart plugin.
I started looking at the influxdb database to see what measurements /
series (SMART attributes) showed up...
I narrowed it down to the following for each drive on each host:
airflow-temperature-celsius
command-timeout
current-pending-sector
end-to-end-error
hardware-ecc-recovered
head-flying-hours
high-fly-writes
offline-uncorrectable
power-cycle-count
power-on-hours
raw-read-error-rate
reallocated-sector-count
reported-uncorrect
runtime-bad-block-total
seek-error-rate
spin-retry-count
spin-up-time
start-stop-count
temperature-celsius-2
total-lbas-read
total-lbas-written
udma-crc-error-count
I also found a list of Critical SMART attributes on Wikipedia
<https://en.wikipedia.org/wiki/S.M.A.R.T.> (you can sort them as such),
but I'm uncertain of how they relate to what I'm receiving from collectd.
I'd like to be able to detect when a drive is about to go bad, and
Wikipedia says that I have about a 50% chance to detecting it if I use
such attributes to graph changes in something like grafana.
Unfortunately though, I'm also uncertain of which attributes might be
non-sense (on my drives) and which ones are real; and where to find such
information... Also there doesn't seem to be much documentation about
how the attributes in influxdb relate to those listed in
wikipedia...(though I would think it has something to do with the source
code for the plugin and the identifier listed in wikipedia)
Please let me know if you know anything about where to start on this.
--
Thank you,
Andrew J. Leer
Git Hub: http://bit.ly/aleer_github
Stack Exchange: http://bit.ly/aleer_stk_exch
Linked-In: http://bit.ly/2d5D1DF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20171205/d71b79ce/attachment.html>
More information about the collectd
mailing list