Generic tree visitors

Overview

In the future, we might have both an Abstract Syntax Tree (AST) and a bit later in the compilation phase, a tree that contains the (remains of the) current Cython compiler which I'll call the Pyrex tree (since it is mostly a legacy from the Pyrex implementation).

Common for both such trees (and perhaps other components in the program) is that it is often considered a good idea to seperate the data/structure representation from the algorithms operating on them. The leading paragraphs of http://en.wikipedia.org/wiki/Visitor_pattern contains some good philosophy on this.

This is then about what conveniences we should supply to make it easy to operate on trees (and possibly graphs) with algorithms that are external to data structure.

One might filter it (input and output have same basic format but some changes are made), make in-place modifications (same but doesn't make copies), do a conversion (input and output has different formats), build auxiliary data and so on, that's not important.

Efficient tree transform processes

This is a bit complicated (impossible?) to follow, I need to braindump it for now.

In many places it is likely to have many small, decoupled algorithms/changes to the tree. For instance, consider the post-parsing phase: Lots of small changes are likely to happen all over the tree.

One is left with three possibilities:

The rest is about doing the last one.

Some of the changes might trigger simply on node type, which is what Fabrizio's examples for the "with" transform etc. does. However other changes, such as "for in => for from" has a lot more complex criteria. Therefore the second option above might mean that one should look into basically have a node match state machine traversal of the tree, namely:

Keeping track of CT rules out infinite cycles (a with transform changes to a for changes to a with changes to a ...).

The result should be idistinguishable from running the transforms in order on the tree, but one only needs to traverse it once. (This also means than rather than actually implementing the mess above, one can stub it by applying the transforms one after another, but make sure the API is up to the job.)

If our position indicates a match, it will contain a list of transforms to apply. The

I believe that heavy hacking of webpath mentioned below can provide such a state machine. If not, one might simply drop this.

Visitor pattern

This is a standard way of doing this in OOP.

   1 
   2 class NameNode(Node):
   3    ...
   4    def visit(self, visitor, *args, **kwargs): return visitor.accept_NameNode(self, *args, **kwargs)

Accepting on superclasses

One can have an AbstractVisitor like this:

class B(A): ...

class AbstractVisitor:
    def accept_B(self, *args, **kwargs):
        # if not overriden, call accept_B
        self.accept_A(*args, **kwargs)

It is also possible to do stuff like this automatically by type introspection and getattr overloading but that is perhaps too convoluted...

Queries

Some sort of query language decorating member functions (we can type the decorators out for Python 2.3 support, but using Python 2.4 syntax below for brevity).

For instance, find a decent Python XPath engine, then throw a standard W3C DOM on top of our structures, and we're ready for action. Proposed DOM:

if a: pass

would be represented by the DOM API as (the "XML" will never hit string representation at any time of course):

#!xml
<p:if xmlns:p="cython:parsetree">
  <condition>
    <p:name name="a"/>
  </condition>
  <body>
    <p:statlist>
        <stats>
            <p:pass>
        </stats>
    </p:statlist>
  </body>
</p:if>

Then one can do:

   1 import cython.dom as d
   2 
   3 class MyTransform(TemplateMatchTransform):
   4   @template("p:module/body/p:if")
   5   def if_tests_in_module_scope(self, node, name):
   6       ...
   7 
   8 
   9   @template("p:funcdef//p:funcdef")
  10   def inner_functions(self, node, name):
  11       ...

Real example: The for-in to for-from transformation example has this "selection":

   1         if isinstance(node, Nodes.ForInStatNode) and \
                isinstance(node.iterator, ExprNodes.IteratorNode) and \
                isinstance(node.iterator.sequence, ExprNodes.SimpleCallNode) and \
                isinstance(node.iterator.sequence.function, ExprNodes.NameNode) and \
                node.iterator.sequence.function.name == "range":

which would be this in XPath (playing with the idea of letting functions acts as transforms as well):

   1 @template('p:forin[iterator/p:iterator/sequence/p:simplecall/function/p:name[@name = "range"]]', nses={"p": "cython:pyrextree"})
   2 def forin_to_forfrom(node):
   3   ...

while a more decent implementation would be something like (note that the = operator is generalized to set intersection in XPath!):

   1 @template(
   2   'p:forin[iterator/p:iterator/sequence/p:simplecall[@fully_resolved_name = $rangefuncs]]',
   3   nses={"p": "cython:pyrextree"},
   4   vars={"rangefuncs" : ["__builtin__.range", "__builtin__.xrange"]}
   5 )
   6 def forin_to_forfrom(node):
   7   ...

(BTW, to use the XPaths above, // is always prepended. Really, they are XSLT matches, which are not the same thing as XPath but very similar.)

Pros:

This is a naive XPath 2 implementation written entirely in Python: http://sourceforge.net/projects/webpath

Some quick testing makes me very confident that making this work with any tree we come up with is very little work, the "only" problem would be performance. In order to hack something like XSLT "match", one would select a nodeset and check on node id on traversal (possibly setting manual priorities, or extend the XPath lib to compute expression complexity according to XSLT spec).

In order for this to work more efficiently though it would be necesarry to implement "parallell XPaths" (working through many XPaths in parallell in a search) and an XPath combiner; giving something like a state machine that's traversed during tree traversal. Attractive, but perhaps overkill for the simple problem one is trying to solve here.

Finding an XSLT implementation written in Python (or at least able to operate on any foreign DOM) could be better, but finding that is not likely as efficient XPath implementations probably must cooperate closely with the DOM it operates on.

Discussion

DagSverreSeljebotn: I think one should look into XPaths for when the power is needed, but make it very optional and have visitor patterns be the most common choice.

DagSverreSeljebotn: Update: Consensus on mailing list seems to be that XPath is too complicated. I guess one should at least push visitor patterns to their limit first.

enhancements/treevisitors (last edited 2008-04-08 13:13:24 by DagSverreSeljebotn)