CEP 517: Typed memory views
- Status: Main idea accepted; details open for tuning
Started on implemented in Kurt Smith's Google Summer of Code: https://github.com/cython/cython/tree/gsoc-kurt-regraft.
PEP-3118 introduces a standard way of accessing binary array-like data of Python objects in a variety of layouts: Strided data, indirect data (pointers-to-arrays), and so on.
We provide native language support to access such buffers. This also provides a nice builtin array type in Cython, as well as a unified way of accessing data in Python objects, C pointers, and builtin arrays.
cdef int[:, :] x x = np.zeros((10, 10), dtype=np.intc) # acquire from Python object x = int[:10, :10]() # create instance of builtin array type; we do our own non-GIL refcounting for this
We also allow casting from C pointers, although we need to insert a bit of shape/stride information:
x = <int[:10:1, :10:10]>malloc(10 * 10 * sizeof(int)) ... free(x.data); x = None # both needed!
No matter how x is assigned to, it can be accessed through the same native code. Under the hood, x consists of a struct with information about pointer, strides, shape, access mode, etc.
- Unifies access to array-like data across C and Python -- same compiled code can operate on both
- Convenient way to get a native array type and get rid of malloc/free for native code
Cleaner interface for computational work than the current np.ndarray[int]
Necessary for efficient slicing and SIMD operations
Note: In the whole document we will use int as a placeholder for any primitive type + structs.
Can be assigned to None in the same way as cdef classes.
Fast indexing: x[3, 4]
Fast non-GIL slicing: y = x[3:10, None, :]. Passing a None slice has the same effect as in NumPy; a 1-length dimension is added.
Convenient copying emits optimal loop: x[...] = y. In SIMD we could go beyond this.
Fields: x.ndim, x.shape, x.strides, x.suboffsets export information from PEP-3118. Strides are given in number of bytes.
x.object returns the PEP-3118 object that holds a refcount to memory. However, x may be a sub-slice of the buffer returned; or None in the case of pointing to a C pointer or the builtin array type.
Access to base data pointer: x.data. It has the type char*. The reason for making this char* is that strides are in bytes and does not have to be a multiple of the size of the base type, so pointer arithmetic is in general a bad idea and one needs to think.
If the buffer has a corresponding "sane" C pointer type, x.typeddata is available. Examples:
cdef int[::1] x # x.typeddata has type int* cdef int[::1, ::follow] x # x.typeddata has type int* (strides guaranteed to be a multiple of sizeof(int)) cdef int[::indirect, ::1] x # x.typeddata has type int** # But neither of these have typeddata, due to presence of strides: cdef int[:] x cdef int[::indirect, :]
All operations honor @wraparound and @boundscheck directives. They can operate in nogil mode unless otherwise noted; if bounds-checking needs to raise an exception then the GIL will be acquired using with gil.
Resizing might never happen, as PEP 3118 does not support it (though we could make custom extensions to the API...). But we could provide utility functions to concatenate arrays etc. into new buffers.
It seems clear that we want to have full flexibility in passing typed memory views around the code. One should be free to say
cdef class A: cdef int[:] x def f(self, obj): cdef int[:] buf = obj self.x = buf[:10]
and this should be very efficient:
cdef f(int[:} x): ...
To make slices efficient, these don't enter Python space -- obj will never know that the slice is taken. Since we need to call PyBuffer_Release once our last reference disappears, we need our own reference counting -- or acquisition counting, just to keep them apart.
The levels are then:
An object containing the Py_buffer as well as an acquisition count. This is allocated on the heap, acquisition counted, and PyBuffer_Release is called when done. In the current implementation this is a Cython cdef class.
A stack/field-allocated struct __pyx_memviewslice that contains the unpacked shape, stride and buffer pointer, possibly re-sliced since acquisition. This is copied by value (taking care to update acquisition count in base object) when passing it to somebody else.
What about the GIL?
This remains to be hashed out. We want to support reassigning slice objects in situations such as
cdef int[:] = obj cdef int[:] curslice with nogil: for i in ... curslise = obj[i:i+10]
Since curslice could hold our sole reference to a Py_buffer, it may be necesarry to call PyBuffer_Release. One option is to automatically acquire the GIL in such cases, which are rather unlikely to crop up in user code. Another is to only allow this if we can infer that it is GIL-safe (because curslice is assigned to None above).
If we forget about Python buffers and only consider the builtin Cython array type, it would be very nice to allow using it in nogil mode without restrictions. We are then forced to do our own locking (one way or another) for the acquisition count; for instance it may be common for multiple threads to take seperate slices off the same master view. Still, a custom lock for this purpose is probably a lot faster than acquiring the GIL.
Coercion to Python object
A typed memory view should coerce to either a Python memoryview, or a more powerful Cython equivalent. We should not coerce back to the original object, because we may have resliced it. E.g., one is required to do np.asarray(mytypedview) to get a NumPy array.
The dimension specifiers
The number of dimensions are, at least for the time being, hard-coded (we could use ellipsis, int[...], to support arbitrary-dimensional arrays). Each dimension is given by a slice, which can specify data access mode.
Assume from cython.view import contig, indirect, strided, follow.
First dimension is contiguous: int[::contig, :] or int[::1, :]
Fortran-contiguous data (that is, x.strides == x.shape * sizeof(int)): int[::1, ::follow]
int**-style "ragged" arrays with contiguous pointer-array and contiguous data-array (although the shape must be rectangular, at least according to PEP-3118): int[::indirect & contig, ::1]
int**-style arrays where both the pointer-array and the data-array is strided: int[::indirect, ::1]
A 2D strided array containing pointers to single values: int[:, ::indirect].
There are two orthogonal questions for each dimension:
- Does the array (along the dimension in question) contain the data (/the rest of the array), or a pointer to the data?
- Is the array strided or not?
We can, e.g., have a strided array of pointers, a strided array of values, and so on.
Current state: There's contig, strided, follow, direct, indirect, full. Of these, full means to determine direct vs. indirect at runtime, while strided is a superset of contig.
These can then be combined using the & operator.
In some ways, this is less than ideal. Perhaps we can make sane constants for every possible combination...
Pure Python mode
The syntax is chosen partly because cython.int[:, :] can work in pure Python mode. This could, e.g., be a wrapper around a NumPy array, or we could rather easily write something in Cython to do the same if we can accept a Cython-compiled shadow module. Obviously, there would be no speed benefits, but it would be nice if one could write optionally compiled pure Python code.
What about current buffer support
The np.ndarray[int] support could either be kept fully or deprecated with a rather long deprecation cycle. Either way, it is better reimplemented as syntax candy on top of passing around both a typed memory view and an associated object. For cdef functions we could turn an argument of this type into two arguments.