CEP 508 - Compile time unrolling

This is a proposal for creating a code tree transform that will take a normal code-tree and perform compile-time unrolling, calculations etc. The purpose is not to be an optimizing compiler, but have a quick way of allowing some of the same programming techniques that C++ templates allow for.

The creative use C++'s templates has gotten over the years should demonstrate that there are good uses this (see http://ubiety.uwaterloo.ca/~tveldhui/papers/Expression-Templates/exprtmpl.html, Blitz++ etc.).

This attempts to get some of the advantages without introducing a full template compiler. It also allows using the regular Python syntax rather than a function-style side-effect of the template compilation.

Examples of use

These examples may seem useless. The main usecase for now is the operators for NumPy arrays, which will contain lots of tests on access methods, contiguous mode etc. declared compile-time; and where it is not acceptable to insert run-time tests.

Make it possible to use compile-time constants in new ways:

DEF STRATEGY = 2

@cython.unroll
def allocate_node():
    if STRATEGY = 1:
       ..lots of code ..
    elif STRATEGY == 2:
       .. lots of code ..

This will generate C code that only contains the second block, while the first block will never make it past Cython.

A small modification gives:

@cython.unroll("strategy")
def allocate_node(strategy):
    if strategy == 1:
       ..lots of code ..
    elif strategy == 2:
       .. lots of code ..


a = allocate_node(1)
b = allocate_node(2)
c = allocate_node(1 if after_midnight() else 2)

This will generate three different C functions:

Nothing is done to prevent the user from hanging oneself -- the following will output 10000 seperate functions to the C file:

@cython.unroll("i")
def fibonacci(i):
  if i == 0 or i == 1: return 1
  return fibonacci(i-1) + fibonacci(i-2)

a = fibonacci(10000)

Ie, @cython.unroll will only be used by people who know what they do.

Details

All the expressions in a function are classified to either be known compile-time or run-time. A lot of cases are obvious, other can demand developing a full-fledged optimizing compiler. To avoid the latter some simple restrictions are put on this. Some guidelines:

Uncertain about this one:

Constructs that will be supported:

Constructs that might be supported but not a priority:

Not sure about whether internal functions/closures should be supported, at least they should be supported in Cython first :-) Anyway, they would then inherit .

How to implement

I'll introduce an example:

t = False
cdef int s = 0
if t:
  a = [3, -2, 7]
else:
  a = [1, 3, 2]
for i in range(1, len(a)+1):
  s += a[i-1] ** 2
a = "something else"

The chances of the C translation of the above to be effectively inlined by GCC is effectively none at all: There will be loads of Python objects created, dictionary accesses and so on. Still the code can in principle be inlined to:

cdef int s = 14 
a = "something else"

(This seems useless -- the really useful cases are when you have code that cannot be inlined at code-writing-time but can be inlined at code-compilation-time - because it gets different type arguments, or is simply in a "compile-time library" like mentioned above.)

This might seem like magic, but in reality I don't believe it is all that difficult. One important principle first: Everything that happens happen on a best-effort basis. I.e., if a case turns out to be difficult, simply leave it non-inlined!

This basically requires a "Python interpreter" transform that can be run on any fragments of our parse tree, and evaluates any compile-time-evaluateable expressions using the full Python machinery while leaving run-time-only expressions alone. This simplistic interpreter will:

In the example above, one would:

How much time?

A week of hard work? All the cases that should be handled are already nicely declared as node types in our tree, it's just about writing a small piece of interpretive action for each of them (in a Transform descendant of course, not in the nodes themselves). A day or so for the basic structure that will simply pass things through

More if it is found that the code tree of Cython relies heavily on "cross-references", which the cloning approach would break. But this will tend to break many transforms anyway and must be dealt with.

enhancements/inlining (last edited 2009-03-01 01:02:24 by localhost)