Project for improved NumPy integration
This document is out of date, as the feature has progressed since then. See Cython documentation for current status.
NumPy support is already present in Cython (by using a proper numpy include file), however one currently has to make a choice between an interface that is easy to use but lacks runtime performance, and an interface that performs as well as C but is much harder to learn and use. This document explores how one should be able to use NumPy from Cython code. The resulting features will likely be implemented this summer as part of a Google Summer of Code project.
Array indexing / the buffer interface
It is possible to write code like this:
1 import numpy as np 2 def myfunc(n): 3 result = np.zeros((n,n), dtype=np.float64) 4 for i in xrange(n): 5 for j in xrange(n): 6 result[i, j] = 1 + i + j 7 return result
However, the loops and the access to the array will be very slow.
Typing the variables is a first step towards increased speed. However, knowing that result is a NumPy ndarray doesn't help that much, because without knowing the number of dimensions and datatype it's impossible to generate efficient code for the indexing operation.
One proposal for the syntax will make the above example look like this (the syntax under discussion here is the brackets after ndarray, everything else is existing Cython syntax):
1 import numpy as np 2 def myfunc(int n): 3 cdef np.ndarray[ndim=2, dtype=np.float64] result = np.zeros((n,n), dtype=np.float64) 4 cdef int i, j 5 for i in range(n): 6 for j in range(n): 7 result[i, j] = 2.4 * i ** 2 + 3.2 * i * j 8 return result
What the -syntax means is that a constraint is set on the result variable so that anything assigned to result will satisfy result.ndim == 2 and result.dtype == np.float64. It's perfectly ok to re-assign to arr as long as this constraint is satisfied, while assigning an array object with different type and/or number of dimensions will result in a run-time error.
The syntax above seems to be the preferred one currently; though it is not settled whether or not the keywords should be optional and a default order imposed, so that one could write np.ndarray(2, np.float64). Thoughts on this, as well as which order the arguments should have, is welcome.
Other syntax options include:
np.ndarray(ndim=2, dtype=np.float64). While visually "cleaner" and more in line with general Python code, this potentially looks confusing as it looks like the constructor of ndarray is being called.
Angle brackets, np.ndarray<2, np.float64>. This will be confusing to C++ and Java users since placing assumptions is not the same thing as creating a specialized type of a generic/template.
- A seperate keyword, something like
1 cdef np.ndarray arr = np.zeros(...) assume(dtype=np.float64, ndim=2)
1 cdef assume(np.ndarray, ndim=2, dtype=float.64) arr = np.zeros(...)
All of the suggestions above could also work in function argument lists, meaning that one can conveniently declare ones assumptions about function arguments directly in the function declaration. If one lets go of this, there are other options:
- Some form of seperate calls to a special function, i.e. something like
1 from cython import assume 2 ... 3 arr = assume(np.zeros((n, n), dtype=np.float64), np.ndarray, ndim=2, dtype=np.float64)
This has the advantage that it is parseable with plain Python (which could simply do the checks, but of course not do the optimizations). The disadvantage is that it is very difficult/impossible for Cython to make this match Python semantics exactly, because Python variables are not scoped in the same way as in C (consider placing the code above inside an if-test...)
What will be optimized?
The goal of this project is optimizing single-item array indexing. All NumPy indexing operations will be supported as usual, but slicing etc. will be handled differently and fall back to regular NumPy code rather than any direct buffer access. Figuring out whether one indexes a single item or not will of course happen compile-time.
This is the important case; and more than this is unlikely in the short run. On the "idea" account, it could be possible to optimize this:
1 for (x, y, z), value in ndenumerate(arr[n:m, p:q, r:s]): 2 ...
at compile-time into a triple for-loop in C going from n to m, p to q, and r to s. If contiguous-information/dimension ordering was provided one could even figure out the optimal nesting of the loops at compile-time.
Slices, picking etc.
It is proposed that indexing operations that return new array objects (slicing, picking) doesn't try to do anything smart -- they simply construct new ndarray objects (full Python objects). One can either fall back to the Python implementation of the indexing operators, or reimplement them by calls to the appropriate internal NumPy code (in the latter example one might be able to do some tricks to avoid the tuple packing and unpacking, I'm not sure and it's not a big issue).
The alternative to this would be to provide for generating plain C for loops to create from examples like this:
1 for cols in A[,:]: 2 for cell in cols: 3 print cell
but it seems to be beyond our reach for some time. Once automatic type inference is in place, one can return a subclass of ndarray containing compile-time information about the slicing one is performing, but this is very advanced. (Of course, a manual parse tree transform doing this specific case is possible but hardly worth it either.)