[collectd] Can you help me find my bottleneck?

Fri Nov 2 17:06:49 CET 2012

All,
We're collecting a fair amount of data and using Cacti to graph it. It's on
a really beefy machine with lots of memory and the box just seems to be
loping along just fine. The problem is, we're seeing at least an hour delay
before graphs are updated and our customers are very unhappy. We're running
on a dual, 6-core machine with HT turned on, 48G of memory, and writing RRD
data to a gig-attached Netapp. The box is effectively idle:

08:54:31 AM     CPU     %user     %nice   %system   %iowait    %steal
%idle
09:01:01 AM     all      0.39      0.00      0.56      3.59      0.00
95.46
09:01:31 AM     all      0.89      0.00      2.36      3.72      0.00
93.03
09:02:01 AM     all      0.36      0.00      0.48      3.68      0.00
95.48
09:02:31 AM     all      0.35      0.00      0.38      3.78      0.00
95.48
09:03:01 AM     all      0.79      0.00      0.47      3.77      0.00
94.97
09:03:31 AM     all      0.36      0.00      0.44      3.84      0.00
95.36

Where should I start to look to diagnose this? I've written a small script
to force a flush and though it succeeds, it appears to have no effect at
all. Am I in I/O hell, but just don't know it because I've got so much CPU?

Here's my script:

#!/usr/bin/perl
use Collectd::Unixsock ();
#
$sock = Collectd::Unixsock->new ();
if (!$sock)  {
    return;
 }

$status = $sock->flush (timeout=>-1);
if (!$status)  {
    cluck ("FLUSH failed: " . $sock->{'error'});
    $sock->destroy ();
    return;
}
$sock->destroy ();

Thanks,

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.verplant.org/pipermail/collectd/attachments/20121102/95f0cd6c/attachment.html>