SIMD array operations
This proposal is about adding SIMD behaviour/elementwise behaviour to the arrays proposed in CEP 517, in arithmetic operators and function calls.
Noone stands ready to implement this at the moment, it mainly serves as a potential future development direction in order to discuss the merits of CEP 517. Discussing minute details in this proposal should therefore probably wait.
What will this make possible that is currently not possible?
The idea is to turn a longer expression into a single pass over the data. In many situations this can reduce memory bus traffic over the common Python approach (which is to evaluate each subexpression one by one into temporary buffers).
There seems to be only two possible ways of achieving this at compile-time. One is building it into the language (proposed here). The other would be using the powerful metaprogramming capabilities of C++ and wrap a C++ library. However to get the nice interopability with Python, such a C++ library would need to be pretty much custom-written for use by Cython.
cdef extern from "math.h": double sqrt(double) cdef double[:] radius(double[:] x, double[:] y): return sqrt(x*x + y*y)
This would correspond to a loop containing roughly tmp[idx] = sqrt(x[idx]*x[idx] + y[idx]*y[idx]) (though implementations could use faster methods).
- The operations are element-wise, so that the result is the same length as x and y. x and y must in general have the same length (but see "broadcasting" below).
- Element-wise operations have Cython semantics (Cython semantics for integer division; one cannot add two pointers; etc.)
For function calls like sqrt above, functions must have arguments typed as the base-type of the array. If they take object this is interpreted as the entire array coerced to an object, not element-wise behaviour.
- Side-effects in functions calls do (of course) happen once for each element in the array -- this is just part of the convention
The NumPy broadcasting rules will apply. This deals with how situations is handled for array with different sizes. The basic idea is that arrays with lower dimensionality gets added 1-length dimensions, and dimensions of length 1 can be repeated. See e.g.:
Doing arithmetic with one array and one scalar is seen as a special case of this; the scalar operates with every element in the array.
Because the way numerical computations like this can vary depending on usecase and available compilers/CPU instructions/libraries we could
- Implement a reference implementation creating naive C loops, not caring too much about efficiency
- Make it possible to develop plugins with other backends for higher performance.
- For instance we could generate small snippets of Fortran code and link them in.
Mark Wiebe recently wrote a new and better iteration API for NumPy: https://github.com/numpy/numpy/blob/master/doc/neps/new-iterator-ufunc.rst