[collectd] memcached timeout (Was: Problems with collectd and KVM)

Joost Cassee joost at cassee.net
Mon Aug 25 17:20:01 CEST 2008


On 24-08-08 22:26, Florian Forster wrote:

> On Sat, Aug 23, 2008 at 06:47:40PM +0200, Joost Cassee wrote:
> 
>> while :
>> do
>>  echo quit | nc localhost 11211
>>  echo -n '.'
>>  sleep 1
>> done
> 
> This isn't exactly the same: The plugin sends a command, then waits for
> a response. It's this waiting for the response that's timing out. So
> it'd be necessary to send a command that is followed by a response and
> check if this response is actually received - and after which period..

I have rewritten the checking command to something more collectd-like. I
attached the script below this message. I left is running for a while
and there are no 'hickups' in the output of this command, while the
collectd log shows several timeouts. If you have better tests I would be
glad to try them.

I also followed the network traffic with tcpdump. This log shows two
successful checks followed by a failed one:

17:08:35.759686 IP localhost.34851 > localhost.11211: S 2456209718:2456
17:08:35.759835 IP localhost.11211 > localhost.34851: S 2456904633:2456
17:08:35.759872 IP localhost.34851 > localhost.11211: . ack 1 win 257 <
17:08:35.761300 IP localhost.34851 > localhost.11211: P 1:8(7) ack 1 wi
17:08:35.761407 IP localhost.11211 > localhost.34851: . ack 8 win 256 <
17:08:35.761554 IP localhost.11211 > localhost.34851: P 1:508(507) ack
17:08:35.761614 IP localhost.34851 > localhost.11211: . ack 508 win 265
17:08:35.761633 IP localhost.34851 > localhost.11211: F 8:8(0) ack 508
17:08:35.761674 IP localhost.11211 > localhost.34851: F 508:508(0) ack
17:08:35.761694 IP localhost.34851 > localhost.11211: . ack 509 win 265
17:08:55.744909 IP localhost.34853 > localhost.11211: S 2771690040:2771
17:08:55.745094 IP localhost.11211 > localhost.34853: S 2770373795:2770
17:08:55.745178 IP localhost.34853 > localhost.11211: . ack 1 win 257 <
17:08:55.745637 IP localhost.34853 > localhost.11211: P 1:8(7) ack 1 wi
17:08:55.745761 IP localhost.11211 > localhost.34853: . ack 8 win 256 <
17:08:55.745922 IP localhost.11211 > localhost.34853: P 1:508(507) ack
17:08:55.747676 IP localhost.34853 > localhost.11211: . ack 508 win 265
17:08:55.814877 IP localhost.34853 > localhost.11211: F 8:8(0) ack 508
17:08:55.815018 IP localhost.11211 > localhost.34853: F 508:508(0) ack
17:08:55.815073 IP localhost.34853 > localhost.11211: . ack 509 win 265
17:09:05.743724 IP localhost.34854 > localhost.11211: S 2928641327:2928
17:09:05.743783 IP localhost.11211 > localhost.34854: S 2930081841:2930
17:09:05.743806 IP localhost.34854 > localhost.11211: . ack 1 win 257 <
17:09:05.743952 IP localhost.34854 > localhost.11211: P 1:8(7) ack 1 wi
17:09:05.744007 IP localhost.11211 > localhost.34854: . ack 8 win 256 <
17:09:05.746539 IP localhost.11211 > localhost.34854: P 1:508(507) ack
17:09:05.806702 IP localhost.34854 > localhost.11211: . ack 508 win 265
17:09:05.813521 IP localhost.34854 > localhost.11211: F 8:8(0) ack 508
17:09:05.813573 IP localhost.34854 > localhost.11211: R 9:9(0) ack 508

Notice the failed tear-down of the last TCP session? Why would the
client send a RESET? It does not look like a timeout problem.

> I have to admit, though, that all things considered this feels more and
> more like a problem in memcached.. Have you checked the memcached
> mailing list for reports of a similar problem?

There are just some routing and DNS problems on the memcached list
causing timeouts. I did find some blogposts with problems from Rails
[1], but that probably indicates problems with the Ruby client library.

Regards,

Joost

[1] http://geekblog.vodpod.com/?p=80

---

#!/usr/bin/python

import pexpect
import time

while True:
    t = time.time()
    e = pexpect.spawn('nc localhost 11211')
    e.sendline('stats')
    e.expect('END')
    e.sendline('quit')
    print time.strftime("%H:%M:%S", time.localtime()),
    print time.time() - t
    time.sleep(1)

-- 
Joost Cassee
http://joost.cassee.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 544 bytes
Desc: OpenPGP digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20080825/114b5de5/attachment.pgp 


More information about the collectd mailing list