CEP 506 - Parametrized types

Description

Parametrizing types has lots of potential uses:

Syntax user side

Type name with parenthesis containing the arguments. Each argument must be either a compile-time expression as documented on the Pyrex website, resolving to a value of a primitive type, or a Cython type.

All arguments are named and have an order (this is forward-compatible with keyword-only arguments etc.), and can be optional or mandatory.

   1 DEF ND = 4
   2 def foo(numpy.ndarray(float, ND) bar1):
   3     cdef cpp.map(str, int) my_cppmap

See also "Syntax discussion" below.

Syntax declarator side

By simply adding new keywords etc liberally one can end up with something like this, although it is only a suggestion. The main point is that it is declarative. Also this is supposed to be specified in the .pxd file-part of the declaration. Any declaration that leads to declaring new C types (ie not simply an "object") can take arguments.

   1 cdef class Foo:
   2     cdef int objectvar
   3 
   4     typearguments:
   5         # Providing defaults make the arguments optional
   6         cython.type dtype
   7         int strategy = 0
   8 
   9 ctypedef Foo Bar # Bar takes same arguments

Subclasses can only append arguments, not remove or override anything in the parent list. This restriction might be made weaker in time if needed (like it may become possible to change the ordering and make any new arguments come first etc.)

Effect

The effect is that the parser expects any mandatory arguments to be specified or raises an error. The arguments are stored in the type (which will be a subtype of a special "unparametrized" root type) and can be retrieved in different way throughout Cython's compilation process.

Type compatability

Some rules must be made with respect to how it is possible to convert types. These are the default rules, overrides can alwayas be done through overloaded coercion operators. Suggestion:

   1 def myfunc(foo(a=2,b=4) arg): ...
   2 
   3 cdef foo(a=2) myfoo
   4 myfoo = ...
   5 myfunc(myfoo) # we probably want this to produce error, not automatic conversion

Usecases

Any specific use-case is considered outside the scope of this spec itself; however here are some examples and ideas for usage:

Overloaded methods

Since, in Cython, self is a parameter to member functions, one could by implementing a form of function overloading provide different functionality depending on argument type; while making it perfectly clear that it is operating on instances of the same run-time class.

   1 cdef class Allocator:
   2     typearguments:
   3         int strategy
   4 
   5     def __init__(self, name): self.name = name
   6 
   7     cdef object newobj(Allocator(1) self):
   8         print "Strategy 1", self.name
   9         ...
  10 
  11     cdef object newobj(Allocator(2) self):
  12         print "Strategy 2", self.name
  13         ...

   1 >>> cdef Allocator(1) x
   2 >>> x = Allocator("instance A")
   3 >>> x.newobj()
   4 Strategy 1 instance A
   5 >>> cdef Allocatpr(2) y = x
   6 >>> y.newobj()
   7 Strategy 2 instance A

This could also be combined with method templates.

C++ template support

The type arguments would take the role of providing a way to instantiate the templates Cython-side. For outputting C++ code using templates one would need special support for it in the Cython compiler and extra syntax to "use" the type arguments as C++ template arguments.

Discussion

Stefan Behnel: Why not use Py3k type annotations instead of introducing yet another new syntax?

http://www.python.org/dev/peps/pep-3107/

Dag Sverre: The two are seperate things. In fact, if anything this proposal is a prerequisite for a good use of that PEP in Cython. It specifies what constitutes a type, not in what position the type is declared. It would not make sense to try to cram every possible future feature into this spec, and so I use example code with the currently supported syntax - one spec per change, and PEP-3107 belong to another spec (in fact the code lines needed to change for the two won't overlap at all).

I totally agree with making use of PEP-3107 (while keeping the old syntax for backwards compatability).

Hierarchy

Robertwb: Note that there are different levels of type specification. For example, if I have a numpy array, I may know just the type, or the type and dimension, or the type and dimension and size at compile time. We should be able to handle all of these cases. (Essentially, we can think of this as numpy.ndarray(numpy.uint8), numpy.ndarray(numpy.uint8, dim=2), and numpy.ndarray(numpy.uint8, 2, 10) are distinct types.)

Syntax discussion

Parametrized types with () can potentially be confusing. Does numpy.ndarray(2) mean an array with 2 dimensions, or attempting to pass 2 to the ndarray constructor? Depends on context. And what to do if wanting to pass something to the constructor of a parametrized type?

In most cases considered, this is not a problem for Cython, as there is a difference between "type context" and "runtime context". However it could become a problem if for instance C++ templates are wrapped; how would one call the constructor of a C++ vector of ints, specifying 10 elements? cpp.vector(int)(10)?

Also, from a pure usability perspective, perhaps the () is more difficult to learn as it looks like constructor syntax? When using the () syntax in an example for the NumPy community, the initial response was that the numpy.ndarray constructor should take the same arguments as usual (though after explaining about it being in a type context, the person in question were perfectly ok with it).

Options:

  1. Use double (), like this: myarray(int)(4, 4). The first () resolves the type, and the second () goes to the constructor. For empty type parameter list, one still has to call (), i.e. type_with_optional_params()(4,4).

  2. Use [] instead for type arguments. This is in line with disucssions in the Python community on generic types (and comes from Guido blogging using the [] for syntax example. However, this is not an official Python direction, see below). So one could allocate using myarray[int](4,4). A consequence is that cdef variables use [] instead:

cdef numpy.ndarray[numpy.int64, 2] myarr = numpy.ndarray([2,2])

Guido's blog post: http://www.artima.com/weblogs/viewpost.jsp?thread=86641 followed up with http://www.artima.com/weblogs/viewpost.jsp?thread=87182

DagSverreSeljebotn: I must say I very much prefer the latter one -- it is always clear what is a type, what is a call to a constructor, and what it means to declare something of a type vs. calling a type for construction.

robertwb: Hmm... the type(params) seems more natural to me, but in this context it does have issues. I'll have to give this more thought.

Work so far

A primitive patch can be downloaded on http://heim.ifi.uio.no/dagss/cython-typeargs1.diff , it does some small change to the parser etc:

Known issues:

enhancements/typeparameters (last edited 2008-04-13 11:23:29 by DagSverreSeljebotn)