C arrays deserve better language support
Note: This CEP should might possibly be merged into the Cython array CEP.
Original proposal by Brian Granger.
Things that could be improved:
- dynamically allocated and deallocated arrays. Example:
1 def func(size): 2 cdef int a[size]
- array iteration and Python type coercion. Example:
1 cdef int a 2 # fill a 3 l = list(a)
1 def foo(L) 2 cdef int a[len(L)] 3 a = L # unpacking code generated 4 for x in a: # for loop and indexing generated 5 print a 6 L = a # python list created and filled - StefanBehnel: while I like the rest, this should better be list(a)
- in functions that hold the GIL (i.e. that are not declared "nogil") this could use Python's own malloc functions instead of a plain system malloc(), as it is usually faster for small amounts of memory.
- Arrays declared in this manner are allocated at declaration time (which currently means not inside a block) and cannot be resized.
- The behavior of such arrays is as if they were allocated on the stack. This makes life much simpler, and is consistent with constant-sized arrays, but means they cannot be returned, assigned, etc.
The general consensus is that, especially as we support constant-sized arrays with this syntax, this will be a good thing to have.
It is natural to want to pass arrays around, assign them to variables, etc. as if they were atomic objects rather than pointers. Currently one has to manually use malloc and free (which is often done incorrectly, especially in the case of error recovery). The obvious way to do this is to tie into the Python memory management system. An example of this is given by Lisandro Dalcin:
1 cdef extern from "Python.h": 2 object PyString_FromStringAndSize(char*,Py_ssize_t) 3 char* PyString_AS_STRING(object) 4 5 cdef inline object pyalloc_i(int size, int **i): 6 if size < 0: size = 0 7 cdef Py_ssize_t n = size * sizeof(int) 8 cdef object ob = PyString_FromStringAndSize(NULL, n) 9 i = <int*> PyString_AS_STRING(ob) 10 return ob 11 12 def foo(sequence): 13 cdef int size = len(sequence), 14 cdef int *buf = NULL 15 cdef object tmp = pyalloc_i(size, &buf)
It may be worthwhile to add such functionality into the language itself, or via a provide extension type (with possible syntactic sugar). The returned object would be a special array type, assignable only to arrays of the same type, recounted, with fast (1-dimensional) indexing, iteration, etc. It would be much more lightweight (but potentially faster and fewer dependancies) then using, for example, NumPy. Maybe we could support a fast append too.
What syntax would we use. In light of the recent buffer interface, what about carray[int] for the type and carray[int](size) as a constructor? Some another alternative is [int].
1 cdef inline void* __malloc__(int bytes): return sage_malloc(bytes) 2 cdef inline void __free__(void* ptr): return sage_free(ptr)
at the top-level Cython module scope in order to override the malloc-er and use sage_malloc instead?