[collectd] collected 4.10.1 stops writing and high CPU

Jesse Reynolds jesse at bulletproof.net
Thu Jan 17 16:12:58 CET 2013

On 17/01/2013, at 12:56 PM, Jesse Reynolds <jesse at bulletproof.net> wrote:

> On 16/01/2013, at 11:53 PM, Florian Forster <octo at collectd.org> wrote:

>> If this happens again, can you record collectd's I/O, especially which
>> files it opens? Something along these lines should do the trick:
>> # strace -ttt -e trace=open -o collectd.strace -p $COLLECTD_PID -s 2048

OK, so it just happened again. I ran this strace for about five minutes and nothing at all was logged. So its not trying to open any files or anything. 

I then ran it without the -e trace=open, like this:

strace -ttt -o collectd.strace -p $COLLECTD_PID -s 2048

Resulting in nothing very exciting:

$ cat collectd.strace
1358431235.605828 restart_syscall(<... resuming interrupted call ...>) = 0
1358431245.525278 nanosleep({9, 999965000}, 0x7fffb2c5c5b0) = 0
1358431255.525596 nanosleep({9, 999957000}, 0x7fffb2c5c5b0) = 0
1358431265.525787 nanosleep({9, 999966000}, 0x7fffb2c5c5b0) = 0
1358431275.526013 nanosleep({9, 999962000}, 0x7fffb2c5c5b0) = 0
1358431285.526236 nanosleep({9, 999962000}, 0x7fffb2c5c5b0) = 0
1358431295.526480 nanosleep({9, 999962000}, 0x7fffb2c5c5b0) = 0
1358431305.526725 nanosleep({9, 999962000}, 0x7fffb2c5c5b0) = 0
1358431315.527015 nanosleep({9, 999959000},  <unfinished ...>

I didn't leave that one running for very long - just over a minute. 

... I have noticed that in both cases of this problem there's been one big burst of disk writes about 25 minutes after the problem starts, but its short lived. I'm not sure what that's about. 

disk writes last six hrs: http://f.cl.ly/items/2x0v1C1V32182P3r373W/Screen%20Shot%202013-01-18%20at%201.10.03%20AM.png
disk writes last three days: http://f.cl.ly/items/3t3T3I2Z1a2a3s0W0h0r/Screen%20Shot%202013-01-18%20at%201.35.44%20AM.png

Perhaps the strace needs to be run for the full duration of an occurrence of the problem to see what its trying to open. That might be a bit tricky to organise but perhaps leaving the strace running continuously won't be too dangerous? 


More information about the collectd mailing list