Proposal for a new buffer syntax

This propsal is obsolete, please see the Cython array type CEP.

What is the problem?

The current syntax is e.g.

cdef object[int, mode="fortran"] x
cdef np.ndarray[double, ndim=2] y

This comes from viewing the buffer syntax as an optimization of the Python [] operator in certain special cases. Any non-optimizable operations are passed to the underlying object. In addition, the typename controls the default access mode ("strided" vs. "indirect").

Advantages:

Disadvantages:

cdef object[int] a = np.arange(10)
cdef object[int] b
b = a[5:] # made efficient
print b[0] # prints 5

Now, which Python object does b refer to? The one of a?

print b[<object>(0)] # huh, prints 0?

Or, perhaps None?

print b.foo() # crashes program

Proposed solution

The proposed solution would be introducing buffers as a first class native type with a new syntax.

cdef int[:] buf = obj
print buf[2] # fast access
print obj.some_method()
print buf.some_method() # NOT ALLOWED!

The syntax would embed everything needed to know for optimizing PEP 3118 buffer access without knowing anything about the underlying object type (like NumPy arrays) at all, or allowing operations on the object owning the buffer directly.

PEP 3118 allows for a very wide class of buffer layouts; restricting this is possible in a lot of ways and almost any restriction can give a lot of speedup.

It could work like this. Assume from cython import strided, contig, full, ptr:

Of course, all this mustn't be supported at once.

The existing usecase

An alternative to the existing syntax would be code like this:

from cython import shape # or cython.buffer
def mysum(int[:] arr):
    cdef int s
    for i in range(shape(arr, 0)):
        s += arr[i]
    return s

Here int[:] is an alternative to object[int, ndim=1].

int[:,:,:] is three-dimensional and so on. This makes a clear distinction from the C array syntax and it looks more Pythonic. Also it is within the Python grammar.

int[:] accesses only the buffer, not the corresponding Python object. Coercion from objects acquire a buffer view, while coercion to objects is disallowed in earlier Python versions and gives a standard Python memoryview in newer versions (backports could also be done, though likely e.g. a __frombuffer__ operator in numpy.pxd for efficient numpy.ndarray(buf) construction works better with less efforts).

Main differences from today, in the context of NumPy:

def f():
    cdef int[:] a = ..., b = ..., c
    c = a + b    # would not work before new features are implemented
    c = a[2:110] # would not work before new features are implemented
    print a.flags # nope

So, int[:] represents only the buffer and not the NumPy array object. Slicing and arithmetic on these

enhancements/buffersyntax (last edited 2009-06-17 15:41:13 by DagSverreSeljebotn)