How to work on the Cython compiler.
See also DevDocs
Bug/Feature Tracking and Project Culture
The Cython project is strongly driven by interest and has a rather free and open development culture. There are a couple of main developers and regular committers from various different backgrounds, but we are very happy to receive input and patches from everyone.
In order try to keep the intervals between releases short, however, there are a couple of restrictions that we impose on ourselves, especially when working on bug-fix (third digit) releases.
Every patch that goes into a bug-fix release of Cython must be backed by a trac ticket. To get a trac account, please send a htpasswd entry to Robert Bradshaw (you are reading the mailing list, aren't you?). Changes sent to the mailing list are welcome as well, but opening a trac ticket right away helps prevent valuable fixes from getting lost in the shuffle.
All changes and bug fixes that go into the development branch of Cython and that have a certain level of interest should also be backed by a ticket. This makes it easier to keep track of features and bug fixes that become part of the next release.
- Every trac ticket must have a bug test case associated with it. Fairly often, users who report a problem add an example to the ticket description anyway, but it definitely makes the life of the developers easier when they do so in form of a readily usable test case. Otherwise, the developers have to write it up themselves, in addition to fixing the bug. Please see the section about the test suite below to find out how to write a good test.
Either way, before a ticket gets fixed or assigned a milestone, there must be a failing test case in the appropriate tests/ directory (preferably tests/run) that is named "nicely_descriptive_name_here_Txyz.pyx" (where 'xyz' is the ticket number). Please try to do this even for the tricky cases that feel like there isn't a good test case. Reproducing a bug is critical for fixing it, and having a test case is critical for knowing when it's fixed and for not breaking it in the future. Broken examples are listed in the tests/bugs.txt file, and are skipped during normal testing (this makes it easier to detect regressions when doing other work).
A working patch attached to the ticket will definitely accelerate the mainline bug fixing. Please export the patch from a local commit in this case to make sure you attach name, date and commit comment. If you are unsure where to get started, it's usually best to ask on the mailing list before getting lost. The source distribution does not contain the entire repository for size reasons, but running make repo will download the revision history and allow you to commit and export any changes inplace.
Feel free assign priorities and milestones to trac tickets, though they may be revised. Tickets with working patches will usually be handled with priority and should at least receive a timely review. It's often useful to make a note of the patch on the developer mailing list as well.
When a bug is fixed, the test case will be removed from the tests/bugs.txt list.
The Git Project Repositories
The obvious place to start is the source repository of the development branch on github. The Cython source is kept under git control. In case you don't want to use git, you can also use Mercurial. If you know Subversion or CVS, the most important difference that you need to know about is that the repository does not just reside on a server with which you have to interact. Instead, you get the entire repository when you do a checkout. So you can easily work on your local copy and commit changes (git commit) as you see fit, pull updates from the main repository (git pull/fetch, possibly with rebase, see below) and then collect or select your changes to send them to the mailing list for approval. You can also clone the repository on github and trigger a pull request to let us review and merge your changes.
Note: Do not pull somebody's branch if the name starts with an underscore. See "Feature branches" section below. In general, only pull somebody's "private" branch if you really have to, always try to develop off the upstream master.
cython is the main development where all the major development work happens. Changes that go into this repository should never make the branch unreleasable (even if minor breakages can happen from time to time). The latest developer documentation from the docs directory is available from out build server.
There are other Cython repositories om github, for example we find it useful for Google Summer of Code students to have their own repositories which periodically get merged into main when they are ready. Being a distributed revision control system, anyone is free to host a personal repository elsewhere as well.
A new release typically comes every third month or so.
For non-trivial new features that may require a long list of commits and a final review in a pull request, it may be a good idea to use a rebasable feature branch (in Mercurial this is known as a "patch queue"). It is like a normal branch, but instead of merging, you rebase:
- Start a feature branch. It is a good idea to keep the name "master" for Cython's master and not make your own changes there.
git checkout -b _myfeature
- Develop and commit...
- You want to merge in upstream master. Now, do a rebase instead:
git checkout master git pull upstream master # should fast-forward if you stay off master for your own development git checkout _myfeature git pull --rebase master # leads you through a sequence of merging individual patches git push -f dagss _myfeature # Do a force-push
Finally, when merging feature branches into master, use the --no-ff flag, so that the feature branch stands out in history.
The advantage of this approach is that merges happen commit by commit. This makes it appear as though you developed starting from the freshest upstream/master, instead of developing
It is imperative that nobody else are fetched the commits you are rebasing. A rebase recreates all the commits, and if others also have those commits there will end up being duplicate commits: Commits that are really the same, but which Git sees as distinct commits.
We are using the convention that branch names should start with an underscore, like _mybranch, to say "keep off this branch, I may rebase it any moment".
Put this in your .bashrc to get a blue branch name in front of your prompt:
# make my prompt "(master) ~/code/cython $ " export PS1='\[\e[0;34;49m\]$(__git_ps1 "(%s) ")\[\e[0;0m\]\w $ '
https://github.com/magit/magit|Magit is a nice Emacs plugin for git. The key feature is that it allows you very easily to stage and commit sets of individual hunks, so that you can leave debug code in your working tree without any hassle. Some other Git GUIs have the same feature.
- or: where does that C code come from?
Most often, when you are new to Cython development, you have an idea about the Cython code you want to debug. So, looking at the generated C code, your main question will be: »where is that C code generated?«.
Luckily, the Cython compiler has a couple of debug features that you can use to pin-point the relevant code sections. They can be enabled in the module Cython.Compiler.DebugFlags. Read the comments in that file, enable the relevant debug features (usually debug_trace_code_generation to get started), and then read the C code that Cython generates to find out what is going on.
It's usually best to write a test case for the code you want to debug. See the next section on how to do this.
The test suite
A very good place to start understanding Cython is the test suite, which lives in the "tests" directory of the source repository. The tests (collected and run by runtests.py) use the doctest module of Python. They contain lots of little examples that Cython can compile, so if you want to understand a specific part of Cython or make a new feature work, it is a very good idea to look out for a related test case or to write one yourself, and then run Cython in a code coverage tool or debugger to see what happens. You run the test suite with this command:
python runtests.py -vv
To select a specific test (or a set of tests), just pass the name(s) as parameters. You can also pass regular expressions or tags in the form tag:value. The testrunner takes many options, see python runtests.py --help.
Another useful thing to know is that setting CFLAGS to -O0 can nearly half the runtime of the tests, as it disables all costly optimisations done by the C compiler.
Tests can be tagged for easy filtering and running. A tag is simply a comment, which must occur before any other non-whitespace non-comment lines, of the form #tag: value. Some tags have special meaning, for example tag:cpp tests are only compiled in C++. Multiple values for a single tag can be separated by commas or given in repeated tag lines.
There are three different kinds of tests:
- compile: tests that are only compiled, not run
- errors: tests that check for compile errors
- run: tests that are compiled and run using doctest
These are distinguished by a mode tag comment at the top of the file, which defaults to run if not given.
A test consists of a .pyx file that Cython compiles, possibly accompanied by a couple of .pxd or header files in the same directory. Error tests additionally contain an error description, as in this example:
1 # mode: error 2 3 cdef extern from *: 4 void foo(void) 5 6 _ERRORS = u""" 7 4:13:Use spam() rather than spam(void) to declare a function with no arguments. 8 """
This is regular Cython code, so you can compile this file yourself to see the error. However, the test runner splits the source file at the line starting with "_ERRORS" and parses the rest of the file for error output. That includes all lines that follow a "LINE:COLUMN:error message" scheme. The error lines are then compared against the actual output of the compiler run.
Runnable tests in the "tests/run/" directory use doctests, as in the following example:
# mode: run def f(): """ This doctest runs in plain Python: >>> 1 + 2 3 >>> f() 3 >>> b 3 """ # this is code that Cython executes when the doctest calls the function: a = 1 + 2 return a # this is code that Cython executes at module initialisation time: b = 1 + 2
The important thing to know here is that the doctest will be executed by Python, while the rest of the file will be compiled to C code by Cython and only called by the doctest when run by the test runner. So you can directly compare results that Python delivers with results that you get from Cython.
Parse tree assertions
Since Cython 0.12, you can add assertions for the parse tree. This is required when testing for optimisations that only impact the performance and do not change the behaviour. Otherwise, it would be impossible to tell if an optimisation strikes or not, thus rendering the test useless if the optimisation ever fails to apply for some reason.
You can express assertions using a simple XPath-like language called TreePath that traverses the parse tree. Nodes are referred to by their type name (inheritance is not considered). For example, to make sure that a Python function call "foo()" gets replaced by a C-API call to "c_foo()", you can write a test as follows:
1 # mode: run 2 3 cimport cython 4 5 @cython.test_fail_if_path_exists("//SimpleCallNode//NameNode[@name = 'foo']") 6 @cython.test_assert_path_exists("//SimpleCallNode//NameNode[@name = 'c_foo']") 7 def f(): 8 foo()
As known from XPath, you can use
for a Node of type 'NodeName'
for a Node of any type
for an attribute value
to descend into a subtree
to access a direct child
to refer to the current node
[ ... ]
to evaluate a predicate (which itself is a TreePath expression) at the current node
[@name = value]
to compare an attribute value (integer values, "string", 'string' and boolean True/False are supported)
[... and ...]
to connect two predicates with a boolean 'and'
The test suite contains some examples of accepted path expressions.
To test for more than one path, you can pass multiple path strings to each decorator. It is good practice to add partial paths before the complete test path, as this leads to better error messages if a subtree exists but does not fulfill the entire expression - especially if there is overlap with a fail-if path. Example:
1 #mode: run 2 3 cimport cython 4 5 @cython.test_fail_if_path_exists("//SimpleCallNode//NameNode[@name = 'foo']") 6 @cython.test_assert_path_exists("//SimpleCallNode//NameNode", 7 "//SimpleCallNode//NameNode[@name = 'c_foo']") 8 def f(): 9 foo()
These assertions can only be applied to functions, as it is good practice to split tests into test functions anyway. The test runner script (see below) enables the tree assertions in the test run, which are otherwise disabled in the normal compiler runs.
Note that the TreePath language is not a complete XPath implementation, so conditions are restricted to node/attribute tests and simple string comparisons for attribute values.
Running the CPython test suite
To test the compatibility with CPython (the standard Python distribution), you can copy the directory Lib/test in the Python source distribution over to the directory tests/pyregr (not into this directory, as this directory!) in the Cython source tree. The test runner will then compile all unit test modules with Cython and run them.
To avoid doing this over and over for different CPython versions, there is an option --sys-pyregr that you can pass to the test runner. If the installation of the running Python version contains the regression test package (simply called 'test'), the test runner will pick it up from the standard library automatically. However, note that many Python distributions do not include this package.
Tip to create doctest scripts
The doctest scripts have executable statements and output interleaved. It is possible to type the test program directly into python and copy/paste the output but when the sequence of statements is more than a few lines, it can be convenient to use a text editor to prepare them.
One useful technique to aid in this is to use the "screen" program to run a text file with the doctest snippet to be run. Screen can be instructed to read the text file and send it to python. The output can then be captured and placed into the doctest file. See the man page for screen on your system.
Some example steps to do this:
- Use your favorite text editor to create a file, say "t", with the code to run.
- Start screen in the same directory.
- Start an interactive python session by typing it on a line.
- To read the file into a screen buffer, type the command line:
- Paste the buffer into python by typing the characters:
- Save the screen "hardcopy output" to a file named "hardcopy.0" by typing the characters:
- Exit screen
- Edit the output of hardcopy.0 and paste the appropriate script into your doctest.
- Alternatively, just turn on logging for your window to a file "screenlog.0" by typing the characters:
- By using logging, you can reuse the session iteratively and just look at the bottom of the log file for the current output. Further, you can rerun the readbuf command quickly from the screen window history by just typing the characters (if it is the last command):
or typing: <ctl-a>:<ctl-p><ENTER>
The Cython buildbot can be found at https://sage.math.washington.edu:8091/hudson/ .