[collectd] Detecting drive failure with collectd and the SMART plugin?

Andrew J. Leer helpdeskaleer at gmail.com
Tue Dec 5 16:43:00 CET 2017


I setup an influxdb that receives input from several drives on a machine 
running collectd and it's smart plugin.

I started looking at the influxdb database to see what measurements / 
series (SMART attributes) showed up...

I narrowed it down to the following for each drive on each host:

airflow-temperature-celsius
command-timeout
current-pending-sector
end-to-end-error
hardware-ecc-recovered
head-flying-hours
high-fly-writes
offline-uncorrectable
power-cycle-count
power-on-hours
raw-read-error-rate
reallocated-sector-count
reported-uncorrect
runtime-bad-block-total
seek-error-rate
spin-retry-count
spin-up-time
start-stop-count
temperature-celsius-2
total-lbas-read
total-lbas-written
udma-crc-error-count

I also found a list of Critical SMART attributes on Wikipedia 
<https://en.wikipedia.org/wiki/S.M.A.R.T.> (you can sort them as such), 
but I'm uncertain of how they relate to what I'm receiving from collectd.

I'd like to be able to detect when a drive is about to go bad, and 
Wikipedia says that I have about a 50% chance to detecting it if I use 
such attributes to graph changes in something like grafana.

Unfortunately though, I'm also uncertain of which attributes might be 
non-sense (on my drives) and which ones are real; and where to find such 
information... Also there doesn't seem to be much documentation about 
how the attributes in influxdb relate to those listed in 
wikipedia...(though I would think it has something to do with the source 
code for the plugin and the identifier listed in wikipedia)

Please let me know if you know anything about where to start on this.

-- 
Thank you,

Andrew J. Leer

Git Hub: http://bit.ly/aleer_github

Stack Exchange: http://bit.ly/aleer_stk_exch

Linked-In: http://bit.ly/2d5D1DF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20171205/d71b79ce/attachment.html>


More information about the collectd mailing list