CEP 905 - Phase separation
- Community status: Idea (dagss, robertwb and stefan has noted approval of the general goal, AFAIK)
What should be done?
The idea is to split up the control flow of the Cython compiler slightly. Currently,
- Phases do not happen on a module-level basis. Instead, for each function all phases are run locally in isolation and code output. This is done because function bodies should have access to (fully typed) module-level symbols. See case 1.
- Type analysis and coercion happens in the same phase. See case 2 and 3.
Case study 1: Inner functions
In order to implement inner functions, one appealing strategy is to use tree transformation to lift out inner functions to global module scope (as well as making the necessary changes to the function (ie make it a class) and the references to it in the parent function (roughly, make "def foo" into "foo = FooClosure()")); that way using transforms to make the tree incrementally more similar to the output C structure.
This cannot currently be done, because in order to generate an efficient context for closures one should know the type of the variables referenced in the parent scope, however the types of the variables within functions aren't known until in the code generation phase of the function, at which point it is too late (or at least, kludgy) to lift out the function.
Case study 2: NumPy support
1 cdef numpy.ndarray(2) arr = numpy.zeros([2,2]) 2 cdef object val = arr[1,1]
Whatever happens, it is clear that a) one needs to know the type of arr before one can invoke a NumPy-specific -operator, b) the result of the -operator must be subject to coercion. So whatever one does to speed up NumPy, it needs to happen in between those phases. QED.
Case study 3: Type inference
One needs to do type inference after type analysis is done but before coercion nodes are inserted.
The most important change is that ModuleNode.process_implementation should drive the entire process. In general, for each phase transition, the tree first obeys a certain spec ("all type attributes set to None"), a phase is run on the tree ("do type analysis"), and then all nodes in the tree obeys another spec ("all type attributes are correctly set").
(Note: process_implementation might not be the "perfect spot" for this. The idea is that that method is "sufficiently far up" to allow refactoring of stuff beneath it in the call chain; after things are ok there, one can think about moving the phase control further up, perhaps all the way to Main.py).
Other than that, changes should be kept at a minimum.
Split up generate_function_definitions to its component phases, each callable entirely from ModuleNode. This affects about 15 functions in Nodes.py.
Split up analyse_declarations in the same way. This change goes deeper in the tree and affects ExprNodes.py.
As one adds new phases, it can be an idea to create a recursive tree visitor for applying the changes rather than making new recursive functions, so that it is easier.
Any problems we foresee coming from this, and their solutions.
Problem 1: Functions needs type information of other functions prior to type analysis
The problem is with this code snippet:
1 def bar(): 2 cdef int x = foo() 3 print x 4 5 cdef int foo(): 6 return 42
Here, the type analysis phase inside bar() should know that foo() returns an int, however analysis hasn't reached this point yet. Luckily, this problem doesn't "recurse": Inner functions and classes has to be declared before they are used.
Also, if inner class support gets added to Cython (I don't think it is now? but not 100% sure), then this presents a possible problem:
1 cdef int foo(): 2 return A.B.something 3 4 cdef class A: 5 cdef class B: 6 cdef int field = foo()
However, Python specifies that code within class definitions is run at class definition time, meaning there isn't a problem here: It would be illegal to move the definition of foo() to below class A. (However, current Cython does not follow Python rules here).
Still, this example points to how complicated this can be -- from 10 minutes of experimentation, a working ruleset for this seems to be:
- In module scope, handle functions breadth-first but recurse depth-first into classes.
- Deal with the functions, doing depth-first of inner functions and classes.
Solution 1: The naive approach: Have two type analysis passes, one at module level and one at function level. These can be seperate phases, both controlled from the top level. Basically that is the same approach as today, but more explicitly and with better control over phases.
Solution 2: (Strategy 2 or 3 below is mandatory for this): In the type analysis controller visitor, one can embed custom rules for the flow. So basically one starts off visiting the tree breadth-first rather than depth-first in module scope; while within functions, one should do depth-first to properly process inner functions and classes.
dagss: +1 for 2. Since this is a bit complicated to get right, solution 2 seems to allow more explicitly programming the right control flow for type analysis, and do necesarry tuning etc.
Problem 2: Local variables in recursive functions
Some of the methods in the current recursive functions carry state between what we want to make different phases (for instance, FuncDefNode.generate_function_definitions generate a scope for the function which is passed to the function sub-phases).
The way this will be handled is to have a phase (either the first phase after the split, or a new phase) add more information to the tree. So for instance one could have a ScopeTransform that is run prior to the phases, which goes through the tree and sets a "env" field on every module and function; and a "scope" field on every statement set to the module node for global statements and the enclosing function node for other statements.
(Just to be clear; this way of working (gradually annotating the tree with more information, passing information between phases through the tree) is tightly linked with this whole proposal, so odds are, if you didn't like the sound of this you should vote against the entire proposal.)
The important thing is to find the balance between future flexibility and allowing the codebase to slowly evolve...
Strategy 1 - Add new recursive phases directly in the nodes
Then have ModuleNode.generate_c_code call these.
This is very in line with current code.
Strategy 2 - Newly added phases are controlled by visitors, but live in nodes
Remove recursive calls to analyse_control_flow, analyse_declarations, analyse_expressions. When called on a node, these functions should take care of that node itself, and can assume that the children are analysed at the right time (probably before, see note)
- Add tree visitors that visits the tree and calls these functions on the nodes in the right order where applicable.
Have ModuleNode.generate_c_code initiate the visitor.
Note: If one cannot be consistent with calling the phase functions in preorder or postorder traversal, this is more complicated; especially if some node requires infix traversal. But stuff like this isn't too hard to get around.
This seems to give a lot more flexibility: If one decides that one has to split a refactored phase even more, one doesn't have to worry about control flow in the compiler, only splitting the phase in a very "local" scope. I would really prefer this prior to doing anything about splitting analyse_declarations.
Note: The newly written analyse_control_flow is has too complex flow for this approach (or rather, it must record information which is deeply coupled with the recursive structure) and so that phase is not a candidate for strategy 2 (must take 1 or 3).
1 def analyse_expressions(self, env): 2 for declarator in self.declarators: 3 declarator.analyse_expressions(env)
1 def analyse_types(self, env): 2 self.dup = self.create_dup_node(env) # re-assigns lhs to a shallow copy 3 self.rhs.analyse_types(env) 4 self.lhs.analyse_target_types(env) 5 if Options.incref_local_binop and self.dup.type.is_pyobject: 6 self.dup = self.dup.coerce_to_temp(env)
are modified so that self.rhs.analyse_types is removed; it is simply assumed to be called by the visitor.
- Move the phases that are refactored entirely into visitors.
Perhaps avoid some duplicate work if one moves C generation to visitors, but in the meantime, code will be spread in two places and it won't be obvious where to look.
Use visitors to the first function-level, and from there, invoke the existing recursive structure. This means no new in-tree methods are added, while minimal changes happen.
A problem here is with phases that modify the scope, because having scope creating both in a visitor and in the in-tree methods would be a problem. One can eitherdo 2 for "scope-creating" phases (analyse_declarations), or try to refactor out the scope-creation but leave the rest.
For analyse_control_flow this proved to be a good solution.
This is notes as I explore in more detail a strategy 2/strategy 4 mix. (I'm setting aside about half an hour a day on average, filling my in-between gaps, for this, so progress will be a bit slow. It seems like it is the kind of job where having to break off often keeps me on track and lets me avoid too extensive refactoring...)
The goal is to get rid of the first few lines of FuncDefNode.generate_function_definitions (everything that has to do with creating the scope and analyse_*). This should instead be done before the function is called; with the scope being available at self.env. From there everything should be as before.
A CreateScopes transform was created that allocates function scopes. Probably other kinds of scopes will be added as I discover them and need to refactor them...
This was simply handled by having a visitor invoke the existing recursive structure from 1st level function nodes:
class AnalyseControlFlow(VisitorTransform): def pre_FuncDefNode(self, node): lenv = node.env node.body.analyse_control_flow(lenv) return False # do not recurse beyond first function level
It didn't affect regression but then again I'm not sure if this is tested anywhere...