This attempts to list suggestions for smaller projects involving Cython, suitable for university assignments or similar. If you are interested, mail the Cython deveoper mailing list or Dag Seljebotn: firstname.lastname@example.org
All tasks will provide experience with Python, test-driven development, and working within and interacting with an open source community.
1. Test framework for directory tree compilation
- Difficulty: Normal
- Type of task: Utility scripting, testing
- Skills required: Basic Python, carefulness
Cython has lots of tests, but they focus on the Cython language and compiler themselves. Additions to the test framework must be made to regression test things like:
- Cython being able to find include files in their proper location (with directories searched in the appropriate order according to different sets of include paths).
- That different compilation methods (single file vs. multiple file) all give correct results.
- That one module written in Cython is callable from another one.
The target objective is a test framework, meaning convenience utilities for writing the actual tests. The problem with the current test framework in this area is that for each single test a full directory tree would have to be constructed manually; and a small variation would then need a duplication of that tree. This is too much of a hassle and so such tests just aren't written. So what is needed is a simple file format for specifying all the necesarry directories and file variations for a test suite in a single file (Dag Seljebotn has some ideas for the contents if anybody is interested).
- Think about and describe (in cooperation with Cython developers) how tests should be written using the framework.
- Write such a test, testing something simple.
- Write a script which generates an appropriate temporary directory structures from the test definition file.
- Integrate this with the existing test runner (simply feed the directory structure to it, and it takes care of compilation, running any doctests etc.)
2. Support for complex floating point datatype
- Difficulty: Normal
- Type of task: Integrating a new feature in an existing program (50 000 lines)
- Skills required: Good Python and C knowledge, ability to understand a large program structure without taking in all the details
Cython recently got more features for numerical computations. Unfortunately, there's a big hole in Cython's abilities: There's no convenient builtin support for complex datatypes (as in complex numbers). One can use the Python complex objects, but they are way too slow for numerical purposes.
The end-goal is that code like this:
1 cdef complex double x = 3.0 + 4j, y = 1 - 1j, z 2 z = x * y
results in efficient C code.
- Write some very simple C code defining complex datatype structs (containing two floats) and some C macros for doing the arithmetic operations with these. Note that we want to support non-C99-compilers so you cannot use the C complex type (though you are welcome to add this in addition).
- Write a test-case (like the above code), using perhaps only + at first. It will of course fail to compile.
Add a complex datatype to PyrexTypes.py
Add support for the complex keyword to Parsing.py.
Have a look at ImagNode in ExprNodes.py. It should no longer construct a Python object directly -- instead it should construct an object of the type added in step 3. This should follow the pattern in FloatNode mostly.
- To do this, one must add "type coercion" to correctly coerce the new complex type to Python complex float objects.
Modify BinaryOpNode in ExprNodes.py to add complex types as a case and call the appropriate arithmetic macros
3. Possible variable value analysis
- Difficulty: Ambitious (but fun!)
- Type of task: Create elegant and isolated algorithm and code
- Skills required: Good Python knowledge, good problem solving skills
(Also known as flow control analysis.)
Several optimizations can be done automatically if one infers certain things from the code. Below we will focus on whether a variable can be known not to be set to None -- if so, a check for this can be dropped in the C code on attribute access.
The only kind of statements that will be made by such analysis is of the kind "it is known that this variable is not None at this location". If nothing is known at a certain location, no harm is done (the generated code just runs a little bit slower). Therefore all such analysis is on best-effort basis.
Here's a code example, and the comments indicate what could be inferred by your code:
1 def f(arg): 2 if arg is not None: 3 print arg.x # arg cannot be None 4 arg = some() 5 try: 6 print arg.x # here arg can be None again as it was assigned to 7 print arg.y # arg cannot be None (as that would raise an exception on previous line) 8 except: 9 print arg.x # arg can be None 10 print arg.x # arg cannot be None, whether or not an exception was raised
The kind of code you need to write is a recursive algorithm working on a tree representing the code. It should work "from the top and downwards", in some ways simulating an interpreter, and record what it knows along the way (e.g. "what does this statement mean for the next line to be executed", "what does this statement mean for the next except block" and so on).
You should probably ask for more details and have an idea of how to proceed before picking this task! Also there is existing algorithms for this kind of thing which can be adapted (though we rather want something simple that covers 70% of the cases now than something perfect that covers everything in a year!)
- Write a very simple unit test. You write some code like the above, and give the existing test framework the test code and your algorithm as input. Then you write code to validate the the right things were detected by your (yet not existing) algorithm about variables (by inspecting the returned tree).
- Write an algorithm which makes the unit test case succeed.
- Try to break it with a new difficult testcase. It is ok for the algorithm to gracefully give up in a lot of cases and say "we don't know", but it should never guarantee things which cannot be guaranteed!
- Repeat with more sophisticated cases (with try-except statements, more complex if-expressions, more than one block inside an if-test and merge what is known when completing the blocks, and so on)
4. Unit test and replace the command line parser
- Difficulty: Easy
- Type of task: Code cleanup and testing
- Skills required: Python, perhaps some API design skills
There are two things that could use some cleaning up:
Currently Main.py under some circumstances calls CmdLine.py to parse a command line, and under other circumstances (when used as a library) not. This API is a bit unclean. Instead one should improve the API for using Cython as a library (i.e. on a "build this file and this file with these options" level) and then write CmdLine.py as a standalone, isolated client of this library.
Furthermore, CmdLine.py does not use the Python optparse module for parsing command line arguments, while it probably should.
In order to do this safely (and not have hundreds of users suddenly discover that their favorite command line no longer work) one should first write unit tests which test all of the existing command line parsing code thoroughly.