[collectd] Bug#422208: /etc/init.d/collectd doesn't stop all the daemons

Wed Aug 8 23:06:22 CEST 2007

Hi,

On Thu, Jun 14, 2007 at 09:43:31PM +0200, Bas Zoetekouw wrote:
> > > > Hmm... I cannot reproduce this. It works fine on all machines I'm running
> > > > collectd on. Please note that it might take some time (a couple of seconds)
> > > > for collectd to shut down cleanly. Could you please verify this?
> > > 
> > > Ah, that indeed seems to be the case.  It's a bit confusing though.  Are
> > > you sure this isn't going to lead to weird race problems?
> > 
> > I cannot think of any race conditions. Have you run into any problems so far?
> > What kind of problems are you thinking about?
> 
> Well, maybe if the user wants to restart the daemon before the old one
> is exited?  Won't the port still be taken then?

Well, a socket might still be in use if the restart takes place before the
former process has terminated. However, the appropriate plugin should retry
opening the connection for a couple of iterations in that case. If it does not
do so a separate bug should be filed and the plugin should be fixed.

> > The shutdown time should have been decreased a fair amount in version 3.10.4.
> > Can you estimate that amount of time it takes for you? I'm going to
> > investigate if it should be further decreased. Any other opinions on this?
> 
> Actually, since I set up collectd on my machines, I've never noticed it
> anymore.  It's jsut when you tinker with the config files and start and
> stop the daemon lots of time that it gets noticable.  
> 
> As the behaviour is not what most uses would expect when running an
> init.d "stop" script, maybe it would be better to just avoid it at all
> by letting the init script wait for the daemon to exit?  Squid seem to
> do it like that...

Starting with version 4 the code which writes to the RRD files caches updates
to the files. In large setups the cache size might get quite big and a large
amount of data might have to be flushed when shutting down the daemon. As the
amount of time might vary a lot depending on your settings and the number of
"clients" I cannot think of any reasonable timeout to wait for the daemon to
stop. Possibly waiting infinitely is not a good idea imho and exiting with a
non-zero status is not really what you want in that case either as e.g. a
restart action would fail as well.

Can anybody think of a good solution? I'd really appreciate some more
opinions.

Cheers,
Sebastian

-- 
Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety.         -- Benjamin Franklin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20070808/461453d2/attachment.pgp