[collectd] Strange SNMP collection glitches

Florian Forster octo at verplant.org
Tue Jan 19 09:10:55 CET 2010


Hi Mirko,

On Mon, Jan 18, 2010 at 10:20:20AM +0100, Mirko Buffoni wrote:
> Time,RX,TX
> 1263460943,1909957261,4284020628
> 1263460958,1910070689,4285672398
> 1263460973,1910121416,8511173

this looks like a half-normal overflow. Half-normal, because the
difference between the first two values is roughly 1.652 million while
the difference between the second and third value (assuming a 32 bit
wrap-around) is 17.806 million, i.e. one magnitude higher. Still,
17.8 MByte in 15 seconds looks reasonable to me.

> 1263461033,1910459026,15791119
> 1263461048,1910530275,994360

This, on the other hand, is a problem. Because you're using COUNTER, the
RRD library will assume an overflow and calculate the TX-rate as:

  ((2^32 - 15791119) + 994360) / (1263461048 - 1263461033)

That's roughly 285 MByte/s or 2.28 GBit/s.

> 1263632003,2244968360,4283484312
> 1263632018,2245014854,4283679452
> 1263632033,2246634960,11923400
> 1263632048,2248276888,1382390
> 1263632063,2249221976,5028316
> 1263632078,2249312207,5370610

Interesting, this looks like a normal overflow and a reset right
afterwards.

> As you can see, there is no router reset (rx is still a valid value),
> while TX counter goes nuts for some time after overflowing.

From the values you provided, it looks a bit like the problem always
appears shortly after a regular 32bit overflow.

> I have three different Zyxel SHDSL routers  (Different firmwares) but
> they have this same behavior.  Could it be a bug in the firmware?

I doubt that this behavior is normal. So, yes, I'd say blame the
hardware ;)

> For now I solved by fixing a maximum range to the rrd database, so I
> have blanks in place of peaks.  I'd be glad to hear if this behavior
> can be corrected in some other way or not.

Well, from collectd's point of view you can do only two things:

 1) Set the maximum value to the actual speed of the link (plus some
    percent for safety). Pro: Regular overflows (which appear to happen
    regularly to you) are handled gracefully. Con: Need to maintain
    maximum value.

 2) Use DERIVE instead of COUNTER and set the minimum value to zero.
    Pro: Works with arbitrary link speeds. Con: You'll use the values of
    the resets AND the overflows.

Unfortunately, making sense out of bogus data is simply impossible, so
keeping the data out of the RRD archives is the only thing we can do
here.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20100119/5f41aa6f/attachment.pgp 


More information about the collectd mailing list