Contents
-
Cython FAQ
- What is the relation between Cython and Pyrex? Are the barriers between the two based on technical direction? Differing goals?
- Can Cython generate C code for classes?
- Is it possible to make a cdef'd class that derives from a builtin Python type such as list?
- Can I place the output under the BSD licence, or does it have to be the python-licence as well?
- ''Why does ** on int literals not work (as it seems to do in Pyrex)?''
- How to pass string buffers that may contain 0 to Cython?
- Can Cython create objects or apply operators to locally created objects as pure C code?
- Why does Cython not always give errors for uninitialized variables?
- How well is Unicode supported?
- How do I pass a Python string parameter on to a C library?
- Can I use builtins like len() with the C type char *?
- How can I interface numpy arrays using Cython?
- Is it possible to call my Python code from C?
- What is the difference between a .pxd and .pxi file? When should either be used?
- How do I access native Python file objects?
- How do I declare a global variable?
- How do I use 'const'?
- How do I implement a single class method in a Cython module ?
- How can I run doctests in Cython code (pyx files)?
Cython FAQ
What is the relation between Cython and Pyrex? Are the barriers between the two based on technical direction? Differing goals?
Answer: Somewhat. Cython is much more open to extensions than Pyrex. Greg usually says that he's still "designing" Pyrex as a language, so he will sometimes reject patches for design reasons that solve practical problems in a practical way, and that therefore find (or found) their way into Cython. Eventually, these features might still make it into Pyrex in one way or another, but that usually means that Greg refactors or rewrites them his own way, which implies that he first has to find the time to do so.
So Cython can afford to be more agile and advanced, but not always in line with future Pyrex versions. However, both Greg Ewing and the Cython developers make reasonable effort to maintain compatibility.
Today, Cython is an advanced version of Pyrex that has several additions already integrated that never made it into mainline Pyrex, including:
Conditional expressions (a if blah else b)
- List/set/dict comprehensions
Optimized looping (for x in blah: is much faster in Cython)
- Compatibility with Python 3 (as well as Python 2.3 or later) without regenerating the C code
Support for the new buffer protocol (PEP 3118), featuring efficient access to data structures in NumPy or PIL
The intention is to make it for the most part a drop-in replacement for existing Pyrex code, though some changes to that existing code may have to be made. The immediate speed-up is generally worth the switch.
To you as a user this means that if you use Cython today, you can write your code a lot cleaner and simpler now as you can rely on Cython to optimise it for you in a lot of ways that you do not have to care about. But if you use Cython specific syntax features (i.e. syntax elements that are not described in the documentation of Pyrex or Python), you may have to do minor syntactic code changes in the near or far future if you want to go back to a future Pyrex version or want to support a future Cython version that followed a syntactic change made in Pyrex.
Cython used to follow a 4-digit versioning scheme that keeps the corresponding Pyrex version in the first three digits. As most of the development in Cython is now completely independent from what is going on with Pyrex, we have broken with this scheme.
Can Cython generate C code for classes?
Answer: Yes, these classes become fully fledged Python classes.
Is it possible to make a cdef'd class that derives from a builtin Python type such as list?
Answer: Yes it is. The only exception is the type PyStringObject (str), which can only be subtyped by Python classes (not cdef classes). This is considered a bug. However, you can subtype PyUnicodeObject instead.
Can I place the output under the BSD licence, or does it have to be the python-licence as well?
Answer: You can use the output of Pyrex/Cython however you like (and lincense it how you like - be it BSD, public domain, GPL, all rights reserved, whatever).
More details: The Python License is different from the GPL used for GCC, for example. GCC requires a special exception clause for its output as it is *linked* against the library part of GCC, i.e. against GPL software, which triggers the GPL restrictions.
Pyrex doesn't do anything similar, and linking against Python is not restricted by the Python License, so the output belongs to the User, no other rights or restrictions involved.
Also, all of the copyright holders of Pyrex/Cython stated in mailing list that people are allowed to use the output of Pyrex/Cython however they would like.
''Why does ** on int literals not work (as it seems to do in Pyrex)?''
The fact that a binary operation on two integer types returned a float was counter-intuitive (both compared to every other kind of binary op in C, and the "expected" behavior from python). We discovered it because it was causing errors (e.g. in functions that were expecting an integer value but getting a float) and after much discussion decided that disabling this behavior was better than letting it go. Also a**b will (silently) overflow as an int/be inexact as a double except for very small values of b. If one *wants* the old behavior, one can always do, e.g, 13.0**5, where it is much clearer what's going on. One would have to do <int>(13**5) in pyrex anyway, which looks kind of strange.
This is the only case I can think of that valid Pyrex is not valid Cython.
How to pass string buffers that may contain 0 to Cython?
Answer: There currently is no way of doing this directly into C code. You will have to accept the string buffer as an object. For efficient work with it, through a char *data, int size pair, you need to cimport Python C/API.
See the following example code:
1 cdef extern from "Python.h":
2 object PyString_FromStringAndSize(char *v, int len)
3 int PyString_AsStringAndSize(object obj, char **buffer, Py_ssize_t* length) except -1
4 #copy a string by converting it to C values and back
5 cdef copyString(object input_string):
6 cdef char *buffer
7 cdef Py_ssize_t len
8 PyString_AsStringAndSize(input_string, &buffer, &len)
9 return PyStringFromStringAndSize(buffer, len) #return a new string object
Can Cython create objects or apply operators to locally created objects as pure C code?
Answer: No you can't. For methods like __init__ and __getitem__ the Python calling convention is mandatory and identical for all objects. There is no way to avoid a Python call when constructing a class, except you would call __new__ and do it manually as a work around.
If speed of object creation or access matters, it is recommended to use a struct and free functions declared cpdef that work on the struct. The code should be equivalent, but can avoid the Python calls completely.
To get an equivalent of the __new__() technique, you can define this PY_NEW macro in a C header file ("theheader.h"):
/* in FILE "theheader.h" */
#define PY_NEW(T) \
(((PyTypeObject*)(T))->tp_new( \
(PyTypeObject*)(T), __pyx_empty_tuple, NULL))and then define it as a Cython function as follows:
1 cdef class ExampleClass:
2 cdef int _value
3 def __init__(self):
4 raise TypeError("This class cannot be instantiated from Python")
5 cdef extern from "theheader.h":
6 # macro call to 't->tp_new()' for fast instantiation
7 cdef ExampleClass NEW_EXAMPLE_CLASS "PY_NEW" (object t)
8 cdef ExampleClass _factory():
9 cdef ExampleClass instance
10 instance = NEW_EXAMPLE_CLASS(ExampleClass)
11 instance._value = 1
12 return instance
This is the fastest way to instantiate an extension class, even without a call to __init__. Note: while all Python class members will be initialised to None, you have to take care to initialise the C members. A factory function is a good place to do so.
Why does Cython not always give errors for uninitialized variables?
Answer: Cython does some static checks for variable initialization before use during compile time, but these are very basic, as Cython has no definite knowledge what paths of code will be taken at runtime:
Consider the following
1 def testUnboundedLocal1():
2 if False:
3 c = 1
4 print c
5 def testUnboundedLocal2():
6 print c
With CPython, both functions lead to the following exception:
NameError: global name 'c' is not defined
With Cython, the first variant prints "None", the second variant leads to a compile time error. Both behaviours differ from CPython's.
This is considered a BUG and will change in the future.
How well is Unicode supported?
Answer: The support for Unicode is as good as CPythons, as long as you are using Unicode. But there is no C type available for Unicode strings and treating them as char * is not going to be correct.
How do I pass a Python string parameter on to a C library?
Answer: It depends on the semantics of the string. Imagine you have this C function:
cdef extern from "something.h":
cdef int c_handle_data(char* data, int length)For binary data, you can simply require byte strings at the API level, so that this will work:
1 def work_with_binary_data(binary_data):
2 c_handle_data(binary_data, len(binary_data))
It will raise an error (with a message that may or may not be fine for your use case) if users pass other things than a byte string.
For textual data, however, you have to handle unicode data input. What you do with it depends on what your C function accepts. For example, if it requires UTF-8 encoded byte sequences, this might work:
1 def work_with_text_data(text):
2 if not isinstance(text, unicode):
3 raise ValueError("requires text input, got %s" % type(text))
4 utf8_data = text.encode('UTF-8')
5 c_handle_data( utf8_data, len(utf8_data) )
Note that this also accepts subtypes of the Python unicode type. Typing "text" as "unicode" will not cover this case.
The above is the right thing to do in Py3. However, some (not all, just some) module APIs may become more user friendly in Python 2.x if you additionally allow well defined byte strings. For example, it may make sense to allow plain ASCII strings in some cases, as they are often used for textual data in Python 2.x programs. This could be done as follows:
1 from python_version cimport PY_MAJOR_VERSION
2
3 def work_with_text_data(text):
4 if isinstance(text, unicode): # most common case first
5 utf8_data = text.encode('UTF-8')
6 elif (PY_MAJOR_VERSION < 3) and isinstance(text, str):
7 text.decode('ASCII') # trial decoding, or however you want to check for plain ASCII data
8 utf8_data = text
9 else:
10 raise ValueError("requires text input, got %s" % type(text))
11 c_handle_data(utf8_data, len(utf8_data))
Can I use builtins like len() with the C type char *?
Answer: Yes you can. But the generated code will be inefficient, as len() only works on Python strings, so a temporary object needs to be created. It's better to use strlen() instead.
How can I interface numpy arrays using Cython?
Answer: Follow the example: http://wiki.cython.org/WrappingNumpy
Is it possible to call my Python code from C?
Answer: Yes, easily. Follow the example in Demos/callback/ in the pyrex sources.
What is the difference between a .pxd and .pxi file? When should either be used?
SHORT Answer: One should (almost) always use .pxd files.
MEDIUM Answer: A .pxd files are lists of declarations, .pxi files are textually included, and their use for declarations is a historical artifact of the way common declarations were shared before .pxd files existed.
LONG Answer: A .pxd file is a declaration file, and is used to declare classes, methods, etc. in a C extension module, (typically as implemented in a .pyx file of the same name). It can contain declarations only, i.e. no executable statements. One can cimport things from .pxd files just as one would import things in Python. Two separate modules cimporting from the same .pxd file will receive identical objects.
A .pxi file is an include file and is textually included (similar to the C #include directive) and may contain any valid Cython code at the given point in the program. It may contain implementations (e.g. common cdef inline functions) which will be copied into both files. For example, this means that if I have a class A declared in a.pxi, and both b.pyx and c.pyx do include a.pxi then I will have two distinct classes b.A and c.A. Interfaces to C libraries (including the Python/C API) have usually been declared in .pxi files (as they are not associated to a specific module). It is also re-parsed at every invocation.
Now that "cimport *" can be used, there is no reason to use .pxi files for external declarations.
How do I access native Python file objects?
Answer: See this small example of how to access Python file objects:
1 # Idiom for accessing Python files.
2 # First, declare the Python macro to access files:
3 cdef extern from "Python.h":
4 ctypedef struct FILE
5 FILE* PyFile_AsFile(object)
6 void fprintf(FILE* f, char* s, char* s)
7 # Next, enter the builtin file class into the namespace:
8 cdef extern from "fileobject.h":
9 ctypedef class __builtin__.file [object PyFileObject]:
10 pass
11 # Now declare the C function that requires a file:
12 cdef void c_printSomething(FILE* outFile, char* str):
13 fprintf(outFile, "%s", str)
14 # Now create a class or some other definition that uses the function:
15 ctypedef class ExampleUsingFile:
16 def printSomething(self, file outFile, char* str):
17 c_printSomething(PyFile_AsFile(outFile), str)
with simple test:
1 import sys
2 import file_example
3 x = file_example.ExampleUsingFile()
4 x.printSomething(sys.stdout, "hello world!\n")
How do I declare a global variable?
Answer:
global variable
How do I use 'const'?
Answer: Cython doesn't support const directly but you can get it to compile it into the C source code:
cdef extern from *:
ctypedef char* const_char_ptr "const char*"
cdef public void foo_c(const_char_ptr s):
print sWill generate this C code:
__PYX_EXTERN_C DL_EXPORT(void) foo_c(const char* __pyx_v_s);
How do I implement a single class method in a Cython module ?
Answer: Cython-defined methods are always unbound, regardless from where they are referenced. Because of this the following does not work:
1 import cython_module
2
3 class A(object):
4 method = cython_module.optimized_method
method is unbound and trying to call it will result in an error:
1 >>> a = A()
2 >>> a.method()
3 exceptions.TypeError: optimized_method() takes exactly one argument (0 given)
You have to explicitly create a bound method:
1 import types
2 import cython_module
3
4 class A(object):
5 pass
6
7 A.method = types.MethodType(cython_module.optimized_method, None, A)
How can I run doctests in Cython code (pyx files)?
Answer: A problem with doctest is that it uses inspect.is_function to check whether something is a function, which fails for Cython functions (which instead answer to inspect.is_builtin).
This module (let's call it "cydoctest") offers a Cython-compatible workaround.
1 """
2 Cython-compatible wrapper for doctest.testmod().
3
4 Usage example, assuming a Cython module mymod.pyx is compiled.
5 This is run from the command line, passing a command to Python:
6 python -c "import cydoctest, mymod; cydoctest.testmod(mymod)"
7
8 (This still won't let a Cython module run its own doctests
9 when called with "python mymod.py", but it's pretty close.
10 Further options can be passed to testmod() as desired, e.g.
11 verbose=True.)
12 """
13
14 import doctest
15 import inspect
16
17 def _from_module(module, object):
18 """
19 Return true if the given object is defined in the given module.
20 """
21 if module is None:
22 return True
23 elif inspect.getmodule(object) is not None:
24 return module is inspect.getmodule(object)
25 elif inspect.isfunction(object):
26 return module.__dict__ is object.func_globals
27 elif inspect.isclass(object):
28 return module.__name__ == object.__module__
29 elif hasattr(object, '__module__'):
30 return module.__name__ == object.__module__
31 elif isinstance(object, property):
32 return True # [XX] no way not be sure.
33 else:
34 raise ValueError("object must be a class or function")
35
36 def fix_module_doctest(module):
37 """
38 Extract docstrings from cython functions, that would be skipped by doctest
39 otherwise.
40 """
41 module.__test__ = {}
42 for name in dir(module):
43 value = getattr(module, name)
44 if inspect.isbuiltin(value) and isinstance(value.__doc__, str) and _from_module(module, value):
45 module.__test__[name] = value.__doc__
46
47 def testmod(m=None, *args, **kwargs):
48 """
49 Fix a Cython module's doctests, then call doctest.testmod()
50
51 All other arguments are passed directly to doctest.testmod().
52 """
53 fix_module_doctest(m)
54 doctest.testmod(m, *args, **kwargs)
Does anyone have a good FAQ format to suggest for this page?
http://www.scipy.org/FAQ (uses Headlines + Table-of-Contents macro)
To accompany the bug tracker: https://answers.launchpad.net/cython/
