Contents
-
Cython FAQ
- What is the relation between Cython and Pyrex? Are the barriers between the two based on technical direction? Differing goals?
- Is Cython a Python implementation?
- Is Cython faster than CPython?
- Can Cython generate C code for classes?
- Is it possible to make a cdef'd class that derives from a builtin Python type such as list?
- Can I place the output under the BSD license, or does it have to be the python-license as well?
- ''Why does ** on int literals not work (as it seems to do in Pyrex)?''
- How to pass string buffers that may contain 0 bytes to Cython?
- Can Cython create objects or apply operators to locally created objects as pure C code?
- Why does Cython not always give errors for uninitialized variables?
- How well is Unicode supported?
- How do I pass a Python string parameter on to a C library?
- Can I use builtins like len() with the C type char *?
- How can I interface numpy arrays using Cython?
- Is it possible to call my Python code from C?
- What is the difference between a .pxd and .pxi file? When should either be used?
- How do I access native Python file objects?
- How do I declare a global variable?
- How do I assign to a global variable?
- How do I use 'const'?
- How do I implement a single class method in a Cython module ?
- How can I run doctests in Cython code (pyx files)?
- How to wrap C code that uses the restrict qualifier?
- Why does a function that I have provided cdef'd parameter to accept None?
- How do I work around the "unable to find vcvarsall.bat" error when using MinGW as the compiler (on Windows)?
- How do I work around the -Wno-long-double error when installing on OS X
- How do I use variable args.
Cython FAQ
What is the relation between Cython and Pyrex? Are the barriers between the two based on technical direction? Differing goals?
Answer: Somewhat. Cython is much more open to extensions than Pyrex. Greg usually says that he's still "designing" Pyrex as a language, so he will sometimes reject patches for design reasons that solve practical problems in a practical way, and that therefore find (or found) their way into Cython. Eventually, these features might still make it into Pyrex in one way or another, but that usually means that Greg refactors or rewrites them his own way, which implies that he first has to find the time to do so.
So Cython can afford to be more agile and advanced, but not always in line with future Pyrex versions. However, both Greg Ewing and the Cython developers make reasonable effort to maintain compatibility.
Today, Cython is an advanced version of Pyrex that has several additions already integrated that never made it into mainline Pyrex, including:
Conditional expressions (a if blah else b)
- List/set/dict comprehensions
Optimized looping (for x in blah: is much faster in Cython)
- Compatibility with Python 3 (as well as Python 2.3 or later) without regenerating the C code
Support for the new buffer protocol (PEP 3118), featuring efficient access to data structures in NumPy or PIL
The intention is to make it for the most part a drop-in replacement for existing Pyrex code, though some changes to that existing code may have to be made. The immediate speed-up is generally worth the switch.
To you as a user this means that if you use Cython today, you can write your code a lot cleaner and simpler now as you can rely on Cython to optimise it for you in a lot of ways that you do not have to care about. But if you use Cython specific syntax features (i.e. syntax elements that are not described in the documentation of Pyrex or Python), you may have to do minor syntactic code changes in the near or far future if you want to go back to a future Pyrex version. In general, however, both Pyrex and Cython try to adhere to the existing Python syntax as close as possible, so these cases should be rare.
In early versions, Cython used to follow a 4-digit versioning scheme that keeps the corresponding Pyrex version in the first three digits. As most of the development in Cython is now completely independent from what is going on with Pyrex, we have broken with this scheme, so Cython versions are now unrelated to Pyrex versions.
Is Cython a Python implementation?
Not officially, no. However, it compiles quite a lot of normal Python code, which gets it pretty close to a real Python implementation. In any case, it is an official goal for Cython 1.0 to compile regular Python code and run (most of) the normal Python test suite - obviously faster than CPython.
Is Cython faster than CPython?
For most things, yes. For example, Cython can compile most of pybench by now, and runs it more than 30% faster in total, while being 60-90% faster on control structures like if-elif-else and for-loops.
However the main advantage of Cython is that it scales very well to even greater performance requirements. For numerical code, speed-ups of 100-1000 times compared to CPython are not unusual, and are achieved by simply adding static type declarations to performance critical parts of the code, thus trading Python's dynamic typing for speed. As this can be done at any granularity in the code, Cython makes it easy to write simple Python code that is fast enough, and just tune the critical 5% of your code into maximum performance by using static C types in just the right places.
Can Cython generate C code for classes?
Answer: Yes, these classes become fully fledged Python classes.
Is it possible to make a cdef'd class that derives from a builtin Python type such as list?
Answer: Yes it is. The only exception is the type PyStringObject (str), which can only be subtyped by Python classes (not cdef classes). This is considered a bug. However, you can subtype PyUnicodeObject instead.
Can I place the output under the BSD license, or does it have to be the python-license as well?
Answer: You can use the output of Pyrex/Cython however you like (and license it how you like - be it BSD, public domain, GPL, all rights reserved, whatever).
More details: The Python License is different from the GPL used for GCC, for example. GCC requires a special exception clause for its output as it is *linked* against the library part of GCC, i.e. against GPL software, which triggers the GPL restrictions.
Pyrex doesn't do anything similar, and linking against Python is not restricted by the Python License, so the output belongs to the User, no other rights or restrictions involved.
Also, all of the copyright holders of Pyrex/Cython stated in mailing list that people are allowed to use the output of Pyrex/Cython however they would like.
''Why does ** on int literals not work (as it seems to do in Pyrex)?''
It works as expected in recent versions of Cython.
In older versions, it was considered that the fact that a binary operation on two integer types returned a float was counter-intuitive (both compared to every other kind of binary op in C, and the "expected" behavior from python). We discovered it because it was causing errors (e.g. in functions that were expecting an integer value but getting a float) and after much discussion decided that disabling this behavior was better than letting it go. Also a**b will (silently) overflow as an int/be inexact as a double except for very small values of b. If one *wants* the old behavior, one can always do, e.g, 13.0**5, where it is much clearer what's going on. One would have to do <int>(13**5) in pyrex anyway, which looks kind of strange.
How to pass string buffers that may contain 0 bytes to Cython?
You need to use either a Python byte string object or a char*/length pair of variables.
The normal way to convert a char* to a Python byte string is as follows:
1 cdef char* s = "a normal C byte string"
2 a_python_byte_string = s
However, this will not work for C strings that contain 0 bytes, as a 0 byte is the normal C way of terminating a string. So the above method will cut the string at the first 0 byte. To handle this case correctly, you have to specify the total length of the string that you want to convert:
1 cdef char* s = "an unusual \0 containing C byte string"
2 a_python_byte_string = s[:21] # take the first 21 bytes of the string, including the \0 byte
Note that this will not handle the case that the specified slice length is longer than the actual C string.
Since Cython 0.12, there is also support for decoding a C string slice efficiently into a Python unicode string. Just do this:
1 # -*- coding: ISO8859-15
2 cdef char* s = "a UTF-8 encoded C string with fünny chäräctörs"
3 cdef Py_ssize_t byte_length = 46
4
5 a_python_unicode_string = s[:byte_length].decode('ISO8859-15')
Can Cython create objects or apply operators to locally created objects as pure C code?
For methods like __init__ and __getitem__ the Python calling convention is mandatory and identical for all objects, so Cython cannot provide a major speed-up for them.
To instantiate an extension type in Cython 0.12, however, the fastest way is to actually use the normal Python idiom of calling the __new__() method of a type:
1 cdef class ExampleClass:
2 cdef int _value
3 def __init__(self):
4 # calling "__new__()" will not call "__init__()" !
5 raise TypeError("This class cannot be instantiated from Python")
6
7 cdef ExampleClass _factory():
8 cdef ExampleClass instance = ExampleClass.__new__(ExampleClass)
9 instance._value = 1
10 return instance
Note that this has similar restrictions as the normal Python code: it will not call the __init__() method (which makes it quite a bit faster). Also, while all Python class members will be initialised to None, you have to take care to initialise the C members. Either the __cinit__() method or a factory function like the one above are good places to do so.
In Cython 0.11 and older versions, you had to use the following C-ish hack in an external header file:
/* in FILE "theheader.h" */
#define PY_NEW(T) \
(((PyTypeObject*)(T))->tp_new( \
(PyTypeObject*)(T), __pyx_empty_tuple, NULL))and then define it as a Cython function as follows:
1 cdef extern from "theheader.h":
2 # macro call to 't->tp_new()' for fast instantiation
3 cdef ExampleClass NEW_EXAMPLE_CLASS "PY_NEW" (object t)
4
5 cdef ExampleClass _factory():
6 cdef ExampleClass instance = NEW_EXAMPLE_CLASS(ExampleClass)
7 instance._value = 1
8 return instance
Why does Cython not always give errors for uninitialized variables?
Answer: Cython does some static checks for variable initialization before use during compile time, but these are very basic, as Cython has no definite knowledge what paths of code will be taken at runtime:
Consider the following
1 def testUnboundedLocal1():
2 if False:
3 c = 1
4 print c
5 def testUnboundedLocal2():
6 print c
With CPython, both functions lead to the following exception:
NameError: global name 'c' is not defined
With Cython, the first variant prints "None", the second variant leads to a compile time error. Both behaviours differ from CPython's.
This is considered a BUG and will change in the future.
How well is Unicode supported?
Answer: The support for Unicode is as good as CPythons, as long as you are using the Python unicode string type. But there is no equivalent C type available for Unicode strings. To prevent user errors, Cython will also disallow any implicit conversion to char* as this not going to be correct.
How do I pass a Python string parameter on to a C library?
Answer: It depends on the semantics of the string. Imagine you have this C function:
cdef extern from "something.h":
cdef int c_handle_data(char* data, int length)For binary data, you can simply require byte strings at the API level, so that this will work:
1 def work_with_binary_data(bytes binary_data):
2 c_handle_data(binary_data, len(binary_data))
It will raise an error (with a message that may or may not be appropriate for your use case) if users pass other things than a byte string.
For textual data, however, you must handle Unicode data input. What you do with it depends on what your C function accepts. For example, if it requires UTF-8 encoded byte sequences, this might work:
1 def work_with_text_data(text):
2 if not isinstance(text, unicode):
3 raise ValueError("requires text input, got %s" % type(text))
4 utf8_data = text.encode('UTF-8')
5 c_handle_data( utf8_data, len(utf8_data) )
Note that this also accepts subtypes of the Python unicode type. Typing the "text" parameter as "unicode" will not cover this case.
The above is the right thing to do in Py3. However, some (not all, just some) module APIs may become more user friendly in Python 2.x if you additionally allow well defined byte strings. For example, it may make sense to allow plain ASCII strings in some cases, as they are often used for textual data in Python 2.x programs. This could be done as follows:
1 from python_version cimport PY_MAJOR_VERSION
2
3 def work_with_text_data(text):
4 if isinstance(text, unicode): # most common case first
5 utf8_data = text.encode('UTF-8')
6 elif (PY_MAJOR_VERSION < 3) and isinstance(text, str):
7 text.decode('ASCII') # trial decoding, or however you want to check for plain ASCII data
8 utf8_data = text
9 else:
10 raise ValueError("requires text input, got %s" % type(text))
11 c_handle_data(utf8_data, len(utf8_data))
Can I use builtins like len() with the C type char *?
Answer: Yes you can. Cython 0.12.1 and later map len(char*) directly to strlen(). Similarly, (char*).decode(...) is optimised into a C-API call since 0.12.
For other Python operations on char*, the generated code may be inefficient, as a temporary object may have to get created. If you notice this for your code and think that Cython can do better, please speak up on the mailing list.
How can I interface numpy arrays using Cython?
Answer: Follow the example: http://wiki.cython.org/WrappingNumpy
Is it possible to call my Python code from C?
Answer: Yes, easily. Follow the example in Demos/callback/ in the Cython source distribution.
What is the difference between a .pxd and .pxi file? When should either be used?
SHORT Answer: One should (almost) always use .pxd files.
MEDIUM Answer: A .pxd files are lists of declarations, .pxi files are textually included, and their use for declarations is a historical artifact of the way common declarations were shared before .pxd files existed.
LONG Answer: A .pxd file is a declaration file, and is used to declare classes, methods, etc. in a C extension module, (typically as implemented in a .pyx file of the same name). It can contain declarations only, i.e. no executable statements. One can cimport things from .pxd files just as one would import things in Python. Two separate modules cimporting from the same .pxd file will receive identical objects.
A .pxi file is an include file and is textually included (similar to the C #include directive) and may contain any valid Cython code at the given point in the program. It may contain implementations (e.g. common cdef inline functions) which will be copied into both files. For example, this means that if I have a class A declared in a.pxi, and both b.pyx and c.pyx do include a.pxi then I will have two distinct classes b.A and c.A. Interfaces to C libraries (including the Python/C API) have usually been declared in .pxi files (as they are not associated to a specific module). It is also re-parsed at every invocation.
Now that "cimport *" can be used, there is no reason to use .pxi files for external declarations.
How do I access native Python file objects?
Answer: See this small example of how to access Python file objects:
1 # Idiom for accessing Python files.
2 # First, declare the Python macro to access files:
3 cdef extern from "Python.h":
4 ctypedef struct FILE
5 FILE* PyFile_AsFile(object)
6 void fprintf(FILE* f, char* s, char* s)
7 # Next, enter the builtin file class into the namespace:
8 cdef extern from "fileobject.h":
9 ctypedef class __builtin__.file [object PyFileObject]:
10 pass
11 # Now declare the C function that requires a file:
12 cdef void c_printSomething(FILE* outFile, char* str):
13 fprintf(outFile, "%s", str)
14 # Now create a class or some other definition that uses the function:
15 ctypedef class ExampleUsingFile:
16 def printSomething(self, file outFile, char* str):
17 c_printSomething(PyFile_AsFile(outFile), str)
with simple test:
1 import sys
2 import file_example
3 x = file_example.ExampleUsingFile()
4 x.printSomething(sys.stdout, "hello world!\n")
How do I declare a global variable?
Answer:
global variable
How do I assign to a global variable?
You need to declare the variable to be global (see above) before trying to assign to it. Often this occurs when one has code like
int *data
def foo(n):
data = malloc(n * sizeof(int))This will result in an error "Cannot convert 'int *' to Python object." This is because, as in Python, assignment declares a local variable. Instead, you must write
int *data
def foo(n):
global data
data = malloc(n * sizeof(int))See http://docs.python.org/tutorial/classes.html#python-scopes-and-name-spaces for more details.
How do I use 'const'?
Answer: Cython doesn't support const directly but you can get it to compile it into the C source code:
cdef extern from *:
ctypedef char* const_char_ptr "const char*"
cdef public void foo_c(const_char_ptr s):
print sWill textually replace the type const_char_ptr by const char* and generate this C code:
__PYX_EXTERN_C DL_EXPORT(void) foo_c(const char* __pyx_v_s);
How do I implement a single class method in a Cython module ?
Answer: Cython-defined methods are always unbound, regardless from where they are referenced. Because of this the following does not work:
1 import cython_module
2
3 class A(object):
4 method = cython_module.optimized_method
method is unbound and trying to call it will result in an error:
1 >>> a = A()
2 >>> a.method()
3 exceptions.TypeError: optimized_method() takes exactly one argument (0 given)
You have to explicitly create a bound method:
1 import types
2 import cython_module
3
4 class A(object):
5 pass
6
7 A.method = types.MethodType(cython_module.optimized_method, None, A)
How can I run doctests in Cython code (pyx files)?
Answer: A problem with doctest is that it uses inspect.is_function to check whether something is a function, which fails for Cython functions (which instead answer to inspect.is_builtin).
This module (let's call it "cydoctest") offers a Cython-compatible workaround.
1 """
2 Cython-compatible wrapper for doctest.testmod().
3
4 Usage example, assuming a Cython module mymod.pyx is compiled.
5 This is run from the command line, passing a command to Python:
6 python -c "import cydoctest, mymod; cydoctest.testmod(mymod)"
7
8 (This still won't let a Cython module run its own doctests
9 when called with "python mymod.py", but it's pretty close.
10 Further options can be passed to testmod() as desired, e.g.
11 verbose=True.)
12 """
13
14 import doctest
15 import inspect
16
17 def _from_module(module, object):
18 """
19 Return true if the given object is defined in the given module.
20 """
21 if module is None:
22 return True
23 elif inspect.getmodule(object) is not None:
24 return module is inspect.getmodule(object)
25 elif inspect.isfunction(object):
26 return module.__dict__ is object.func_globals
27 elif inspect.isclass(object):
28 return module.__name__ == object.__module__
29 elif hasattr(object, '__module__'):
30 return module.__name__ == object.__module__
31 elif isinstance(object, property):
32 return True # [XX] no way not be sure.
33 else:
34 raise ValueError("object must be a class or function")
35
36 def fix_module_doctest(module):
37 """
38 Extract docstrings from cython functions, that would be skipped by doctest
39 otherwise.
40 """
41 module.__test__ = {}
42 for name in dir(module):
43 value = getattr(module, name)
44 if inspect.isbuiltin(value) and isinstance(value.__doc__, str) and _from_module(module, value):
45 module.__test__[name] = value.__doc__
46
47 def testmod(m=None, *args, **kwargs):
48 """
49 Fix a Cython module's doctests, then call doctest.testmod()
50
51 All other arguments are passed directly to doctest.testmod().
52 """
53 fix_module_doctest(m)
54 doctest.testmod(m, *args, **kwargs)
How to wrap C code that uses the restrict qualifier?
Answer: There currently is no way of doing this directly into C code. Cython does not understand the restrict qualifier. However you can wrap your way around it.
See the following example code:
slurp.h
#include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <regex.h> #include <Python.h> int th_match(char *, char *);
cslurp.c
#include "slurp.h"
int th_match(char *string, char *pattern) {
int status;
regex_t re;
if(regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) { return 0; }
status = regexec(&re, string, (size_t)0, NULL, 0);
regfree(&re);
if(status != 0)
return 0;
return 1;
}
slurp.pyx
cdef extern from "slurp.h":
int th_match(char *st, char *pt)
class Slurp:
'''
This is a simple, but optimized PEG (Parser Expression Group) parser.
It will parse through anything you hand it provided what you hand it
has a readline() method.
Example:
import sys
from thci.ext import slurp
o = slurp.Slurp()
o.register_trigger('^root:.*:.*:.*:.*$', sys.stdout.write)
o.process(open('/etc/passwd', 'r'))
'''
def __init__(self):
''' __init__(self) '''
self.map = {}
self.idx = 0
def register_trigger(self, patt=None, cback=None, args=None):
''' register_trigger(self, patt=None, cback=None, args=None) '''
if patt == None or cback == None:
return False
if args == None: args = False
self.map[self.idx] = (patt, cback, args)
self.idx += 0
return True
def process(self, fp=None):
''' process(self, fp=None) '''
if fp == None:
return False
while True:
buf = fp.readline()
if not buf: break
for patt, cback, args in self.map.values():
if th_match(buf, patt) == True:
if args == False:
cback(buf.strip())
else:
cback(buf.strip(), args)This avoids the problems using the restrict qualifiers (Such as are needed with the functions declared in regex.h on FreeBSD [at least 7.X]) by allowing the C compiler to handle things going from C to C, Cython's support for this even using the "const trick" doesn't seem to behave properly (at least as of 0.12). the following commands will generate your compiled module from the above source:
cython -o slurp.c slurp.pyx cc -shared -I/usr/include -I./ -I/usr/local/include/python2.5 -L/usr/local/lib -lpthread -lpython2.5 cslurp.c slurp.c -o slurp.so
It is also possible to use distutils by adding the file cslurp.c (or your files name) to the list of files to be compiled for the extension.
Why does a function that I have provided cdef'd parameter to accept None?
Answer: It is a fairly common idiom in Python to use None as a way to mean "no value" or "invalid". This doesn't play well with C, as None is not the same type as anything else. To accommodate this, the default behavior is for functions with cdefed parameters also to accept None. This behavior was inherited from Pyrex, and while it has been proposed that it be changed, it will likely stay (at least for a while) for backwards capability.
You have three choices for how to handle None in your code:
If you do want None to be able to be used no mean invalid input, then you need to write code that checks for it, and raised an appropriate exception.
If you want Cython to raise an exception if None is passed in, you can use the not None declaration:
def foo(MyClass a not None): <...>
which is a short-hand for
def foo(MyClass a):
if a is None: raise <...>
<...>You can also put #cython: nonecheck=True at the top of your file and all access will be checked for None, but it
will slow things down, as it is adding a check on every access, rather that once on function call.
How do I work around the "unable to find vcvarsall.bat" error when using MinGW as the compiler (on Windows)?
Answer: This may be fixed in the latest version of Cython, therefore download and install the latest version.
If this is unsucessful, try the following workarounds:
If no python libraries are imported, define the compiler by adding the following statement:
--compiler=mingw32
Therefore, the line should read:
python pyprog.py build_ext --compiler=mingw32 --inplace
This, however, does not solve the issue when using the pyximport method (see the tutorial).
Alternatively, the following patch can be applied. NOTE: This is untested.
Open the file pyximport/pyxbuild.py and add the four lines marked with "+" at the appropriate place.
diff -r 7fbe931e5ab7 pyximport/pyxbuild.py
--- a/pyximport/pyxbuild.py Wed Sep 16 15:50:00 2009 +0200
+++ b/pyximport/pyxbuild.py Fri Sep 18 12:39:51 2009 -0300
@@ -55,6 +55,11 @@
build = dist.get_command_obj('build')
build.build_base = pyxbuild_dir
+ config_files = dist.find_config_files()
+ try: config_files.remove('setup.cfg')
+ except ValueError: pass
+ dist.parse_config_files(config_files)
+
try:
ok = dist.parse_command_line()
except DistutilsArgError:Finally, if this does not work, create a file called "pydistutils.cfg" in notepad and give it the contents:
[build_ext] compiler=mingw32
Save this to the home directory, which can be found by typing at the command prompt:
import os
os.path.expanduser('~')
How do I work around the -Wno-long-double error when installing on OS X
Answer:
This is a known issue in OS X with some Python installs. It has nothing to do with Cython, and you will run on the same trouble every time you want to build an C extension module.
This is the most sane (if not the only) way to fix it:
1) Enter Python prompt, and type this:
>>> from distutils import sysconfig >>> sysconfig.get_makefile_filename()
That should output the full path of a 'Makefile'... Open that file with any text editor and remove all occurrences of '-Wno-long-double' flag.
How do I use variable args.
It can't be done cleanly yet, but the code below works:
cdef extern from "stdarg.h":
ctypedef struct va_list:
pass
ctypedef struct fake_type:
pass
void va_start(va_list, void* arg)
void* va_arg(va_list, fake_type)
void va_end(va_list)
fake_type int_type "int"
cdef int foo(int n, ...):
print "starting"
cdef va_list args
va_start(args, <void*>n)
while n != 0:
print n
n = <int>va_arg(args, int_type)
va_end(args)
print "done"
def call_foo():
foo(1, 2, 3, 0)
foo(1, 2, 0)Does anyone have a good FAQ format to suggest for this page?
http://www.scipy.org/FAQ (uses Headlines + Table-of-Contents macro)
To accompany the bug tracker: https://answers.launchpad.net/cython/
