[collectd] Python plugin threading behavior question

Sven Trenkel collectd at semidefinite.de
Fri Dec 28 06:12:32 CET 2012


On 20.12.2012 12:59, Jevgenij Tsoi wrote:

> Looking at documentation, there is an explicit warning that the python
> plugins must be thread safe. Also,
> several plugins written in python (one being collectd-carbon) explicitly
> use a threading.Lock() to do some operations.
>
> Looking at logs, for read-only plugins, it seems that only one thread is
> calling the read callback at a time. (Even with multiple read-threads
> and explicit sleep in read callback longer than the interval)

Yes, collectd itself does not create any *Python* threads. But while 
there is only one Python thread by default there are a bunch of 
different OS threads. As a Python programmer you don't have to care 
about this as your Python code will *always* own the GIL when it is run. 
But it does mean that if you have multiple callbacks they might be run 
in parallel*. Python creates Dummy Python thread objects when this happens.
Here's an example to demonstrate, a plugin that registers 3 read 
callbacks that log some text, sleep and log some more text:

[2012-12-28 04:49:24] import
[2012-12-28 04:49:24] config('Module'<collectd.Config root node >,)
[2012-12-28 04:49:24] init()
[2012-12-28 04:49:24] Initialization complete, entering read-loop.
[2012-12-28 04:49:24] read1 begin (5 sec sleep) (thread Dummy-1) ()
[2012-12-28 04:49:24] read3 begin (3 sec sleep) (thread Dummy-2) ()
[2012-12-28 04:49:24] read2 begin (2 sec sleep) (thread Dummy-3) ()
[2012-12-28 04:49:26] read2 end()
[2012-12-28 04:49:27] read3 end()
[2012-12-28 04:49:29] read1 end()
[2012-12-28 04:49:34] read1 begin (5 sec sleep) (thread Dummy-1) ()
[2012-12-28 04:49:34] read2 begin (2 sec sleep) (thread Dummy-3) ()
[2012-12-28 04:49:34] read3 begin (3 sec sleep) (thread Dummy-2) ()
[2012-12-28 04:49:36] read2 end()
[2012-12-28 04:49:37] read3 end()
[2012-12-28 04:49:39] read1 end()

As you can see, the 3 callbacks are run in parallel. They are not even 
run in the same order in every loop. And while it is true here that 
every read callback always gets the same thread assigned to it, this is 
an implementation details of the collectd plugin dispatcher and might or 
might not change at any time.

> So my 2 questions are:
>
> In practice, only write plugins need proper locking and I can still
> count on the GIL in all cases?
> (Meaning, only complex non atomic operations need to be synchronized)

You will always hold the GIL when your code is run, but as seen above, 
multiple callbacks might be run in parallel. The collectd dispatcher 
makes only one guarantee here: A callback that was registered only once 
will not be called again before the previous call has returned.

> And finally, the python module seems to be imported twice?
> (At first I thought it once per read-thread, because i had 2, but it is
> regardless of threads)
> The module is loaded twice, the configure callback is called twice, BUT
> the initialize-callback is called only once.  Why is it behaving this way?

This doesn't happen to me, as the output above shows. Are you sure your 
config file does not contain that block twice? Or if it's in a separate 
file that's imported from the main config file, is that config file 
imported twice?





*Note: Of course Python code can never be truly executed in parallel, 
thanks to the GIL. But the instructions of your code might be executed 
in an unpredictable order. Also note that this will not appear in the 
interactive mode, as it puts the Python interpreter in charge of control 
flow and it will neatly serialize stuff in a very single-threaded manner.



More information about the collectd mailing list