Hi Florian,<br><br>Many thanks for your help. I think your comments have got it sorted out. So, here is the story:<br><br>There is no hardware raid here - sorry for forgetting to state this clearly. I've got 2 SATA drives connected to normal SATA headers on the motherboard, and then have regular software raid running in linux to provide mirroring. I believe the SATA controller is VIA chipset, and according to dmesg the driver being used is:
<br><br>--paste--<br>scsi0 : sata_via<br>scsi1 : sata_via<br>---endpaste<br><br>HDD temp is installed and running in daemon mode, and is configured to recognize these drives correctly / report temperatures. Collectd is configured to use hddtemp module to poll this data.
<br><br>Now that I test HDDtemp, it begins to tell the story:<br><br>(1) verify that /var/log/messages is clear of scsi error messages - yup<br>(2) connect to hddtemp daemon to check output<br>(3) repeat step one - bingo, we have errors that look familar logged to the messages file.
<br><br>thus:<br><br>---paste---<br>[root@docs log]# nc <a href="http://127.0.0.1">127.0.0.1</a> 7634<br>|/dev/sda|ST3250824AS|32|C|<br><br>[root@docs log]# tail messages<br>...<br>Mar 22 09:39:07 docs crond(pam_unix): session closed for user root
<br>Mar 22 09:39:40 docs kernel: SCSI error : <0 0 0 0> return code = 0x8000002<br>Mar 22 09:39:40 docs kernel: Invalid sda: sense key No Sense<br>Mar 22 09:39:40 docs kernel: Additional sense: Filemark detected<br>
<br>---endpaste---<br><br>after waiting patiently for 5 minutes, with collectd running but the hddtemp module disabled, there are no further messages thrown.<br><br>So - it seems the culprit is indeed the hddtemp program, and however it is pulling the temp data via SMART monitoring.
<br><br>as an interesting aside/test: using "smartctl" to evaluate the SMART status of the HDD, I can see .. all the correct SMART monitoring data.. but then looking into the messages file, there is NO error logged (as was the case with the hddtemp doing the smart poll to the HDD). Clearly they must be doing things in somewhat different manner..
<br><br>Anyhow. My problem is thus solved - it seems I have to abandon the hddtemp monitoring for now :-)<br><br>Many thanks for the help! <br><br>---Tim Chipman<br><br><br>---------- Forwarded message ----------<br>From: Florian Forster <
<a href="mailto:firstname.lastname@example.org">email@example.com</a>><br>To: "The system statistics collection daemon &quot; collectd&quot; ' list." <<a href="mailto:firstname.lastname@example.org">email@example.com
</a>><br>Date: Tue, 21 Mar 2006 21:42:03 +0100<br>Subject: Re: [collectd] Query: Odd "scsi err" messages to /var/log/messages while collectd is running<br>Hello Tim,<br><br>On Tue, Mar 21, 2006 at 04:23:14PM -0400, Tim Chipman wrote:
<br>> (CentOS 4.2 x86_64, sempron3300+ /64bit with 1gig ram, mirrored 250gb<br>> SATA HDDs)<br><br>what RAID controller/which kernel module are you using for SCSI access<br>to the drives? Which plugins do you use in collectd? Are the messages
<br>really apearing every two minutes or is syslog summarizing anything?<br><br>> If I stop the collectd daemon, the messages stop piling up. If I<br>> restart the daemon, the messages resume. Hence my belief they are
<br>> related.<br><br>That sounds weird.. collectd itself doesn't access the SCSI bus at any<br>time. However, it may be configured to query `hddtempd' which, I<br>_think_, uses special SCSI commands to get to the SMART data in the
<br>hardware/disk. If you're using the `hddtemp' plugin try to disable it<br>and check if the problem persists.<br>If that didn't change anything, or if you never had the `hddtemp' plugin<br>in the first place, try disabling all but one plugin (I'd suggest
<br>something very easy for that one, like the `load' plugin). Does that<br>change anything?<br><br>Right now I don't have any more ideas, but maybe the diagnostic steps<br>above reveal something.. Good luck :)<br><br>Regards,
<br>-octo<br>--<br>Florian octo Forster<br>Hacker in training<br>GnuPG: 0x91523C3D<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://verplant.org/" target="_blank">http://verplant.org/</a><br>