[collectd] Python plugin threading behavior question
Sven Trenkel
collectd at semidefinite.de
Fri Dec 28 06:12:32 CET 2012
On 20.12.2012 12:59, Jevgenij Tsoi wrote:
> Looking at documentation, there is an explicit warning that the python
> plugins must be thread safe. Also,
> several plugins written in python (one being collectd-carbon) explicitly
> use a threading.Lock() to do some operations.
>
> Looking at logs, for read-only plugins, it seems that only one thread is
> calling the read callback at a time. (Even with multiple read-threads
> and explicit sleep in read callback longer than the interval)
Yes, collectd itself does not create any *Python* threads. But while
there is only one Python thread by default there are a bunch of
different OS threads. As a Python programmer you don't have to care
about this as your Python code will *always* own the GIL when it is run.
But it does mean that if you have multiple callbacks they might be run
in parallel*. Python creates Dummy Python thread objects when this happens.
Here's an example to demonstrate, a plugin that registers 3 read
callbacks that log some text, sleep and log some more text:
[2012-12-28 04:49:24] import
[2012-12-28 04:49:24] config('Module'<collectd.Config root node >,)
[2012-12-28 04:49:24] init()
[2012-12-28 04:49:24] Initialization complete, entering read-loop.
[2012-12-28 04:49:24] read1 begin (5 sec sleep) (thread Dummy-1) ()
[2012-12-28 04:49:24] read3 begin (3 sec sleep) (thread Dummy-2) ()
[2012-12-28 04:49:24] read2 begin (2 sec sleep) (thread Dummy-3) ()
[2012-12-28 04:49:26] read2 end()
[2012-12-28 04:49:27] read3 end()
[2012-12-28 04:49:29] read1 end()
[2012-12-28 04:49:34] read1 begin (5 sec sleep) (thread Dummy-1) ()
[2012-12-28 04:49:34] read2 begin (2 sec sleep) (thread Dummy-3) ()
[2012-12-28 04:49:34] read3 begin (3 sec sleep) (thread Dummy-2) ()
[2012-12-28 04:49:36] read2 end()
[2012-12-28 04:49:37] read3 end()
[2012-12-28 04:49:39] read1 end()
As you can see, the 3 callbacks are run in parallel. They are not even
run in the same order in every loop. And while it is true here that
every read callback always gets the same thread assigned to it, this is
an implementation details of the collectd plugin dispatcher and might or
might not change at any time.
> So my 2 questions are:
>
> In practice, only write plugins need proper locking and I can still
> count on the GIL in all cases?
> (Meaning, only complex non atomic operations need to be synchronized)
You will always hold the GIL when your code is run, but as seen above,
multiple callbacks might be run in parallel. The collectd dispatcher
makes only one guarantee here: A callback that was registered only once
will not be called again before the previous call has returned.
> And finally, the python module seems to be imported twice?
> (At first I thought it once per read-thread, because i had 2, but it is
> regardless of threads)
> The module is loaded twice, the configure callback is called twice, BUT
> the initialize-callback is called only once. Why is it behaving this way?
This doesn't happen to me, as the output above shows. Are you sure your
config file does not contain that block twice? Or if it's in a separate
file that's imported from the main config file, is that config file
imported twice?
*Note: Of course Python code can never be truly executed in parallel,
thanks to the GIL. But the instructions of your code might be executed
in an unpredictable order. Also note that this will not appear in the
interactive mode, as it puts the Python interpreter in charge of control
flow and it will neatly serialize stuff in a very single-threaded manner.
More information about the collectd
mailing list