This page describes potential support for parallel execution of code blocks in Cython and unPython based on OpenMP.
Note that this is different from parallel execution of Python code in threads or using the multiprocessing module, which can be easily achieved in both Python and Cython using decorated functions.
Parallel loops are important in most contexts but they are particularly important for numerical applications. Look at the following code
1 for i in xrange(m): 2 # C loop body goes here
If the programmer wishes to parallelize the above loop, there are currently no mechanisms to do so short of writing C code by hand.
The design constraints are
- It should have a well defined serial semantics to allow compilation and execution on systems without OpenMP support.
- It should match Python syntax and be executable in CPython. Since this deals with non-Python code, however, this is not a requirement.
- The construct should be extensible to include future enhancements like threadlocal variables or reduction variables.
- Nested parallelism should be easy.
Ideally, the implementation would accept any iterator and loop over it in parallel. A producer thread will iterate over an iterator and keep producing values. (This producer-consumer scenario was proposed by Stefan Behnel). Multiple worker threads will consume them. Its also possible to have a more restricted proposal where we only have parallel C style for loops.
The following are the proposals so far :
Proposal 1 :
Due to Rahul Garg
1 "pragma parallel for" 2 for i in xrange(m): 3 # parallel body goes here
The upside is that its easy to understand and use particularly for people familiar with OpenMP. Its also easy to extend. It also does not affect semantics when running on interpreter.
The downside is that representing annotations as strings does not look very Pythonic and follows its own mini-grammar.
1 #pragma: parallel for 2 for i in xrange(m): 3 # parallel body follows
Easy to understand, use, extend.
Not pythonic. Annotations should not be comments.
Adapted from Ipython1 and "with nogil" from Cython.
1 with parallel(): 2 for i in xrange(m): 3 # parallel body follows
Can also use nogil() ?
Easy to understand. Can be extended by introducing keyword arguments to parallel().
Downside is that it requires an implementation of parallel() which will return an object with empty enter, exit methods. It also adds one level of nesting. If you want to parallelize n loop nests, you end up 2n the indentation.
Due to Rahul Garg
1 for i in prange(i): 2 # parallel body goes here.
Easy to use and understand. Extensible through keyword arguments. Implementation required. Will work only with xrange() style of loops and no consideration given to other iterators.
Note from Rahul : prange will be implemented in unPython.
Due to Stefan Behnel.
1 with thread_each(iterator,threadlocal=...): 2 # parallel body
It only requires one level of nesting and makes it clear that what is supposed to happen is not a sequential loop but a parallel operation on the code block.
The downside is that this would be hard to support in CPython in a serialised form.
Due to Rahul Garg
What about mergin the proposals 4 and 5 to:
1 for i in thread_each(iterator,threadlocal=...): 2 # parallel body goes here
Compared to proposal 5, this makes it less obvious that things happen in parallel. The thing that is iterated over influences the way the loop is executed, i.e. it changes the semantics of the "for" keyword.
Use decorated inline functions.
1 @parallel(type="OpenMP") 2 cdef inline doit(item) nogil: 3 # parallel body goes here 4 doit(iterable)
The main advantage is that this would work without special syntax support and still makes it clear what happens.
The downside of this is that calling a function requires passing all required state into that function. This disadvantage is alleviated by the introduction of closures, however, this would result in additional overhead for the code execution.