Enhancement proposal: Metaprogramming support

Description

The rise of C++ template metaprogramming has brought widespread acceptance of something Lisp programmers have known for a long time: the usefulness of arbitrary computation at compile time that generates code. However, C++ template metaprogramming is a pure functional programming language with syntax and semantics vastly different from C++ itself. In contrast, this proposal allows arbitrary Python code to be run at compile time that produces Cython code.

Basics

This proposal adds a new type of function, called a transform, using the keyword deftrans. This function is executed at compile time. Rather than receiving the values of its arguments, as a normal function does, it receives the parse trees of the arguments. The return value should also be a parse tree, which the compiler inserts in place of the call to the transform.

Examples

The current proof-of-concept implementation can properly handle these examples.

1. Symbolic Differentiation

The following is a transform implements the sum and product rules of calculus, along with two base cases:

deftrans deriv (myexpr, var):
   if isinstance(myexpr, ExprNodes.AddNode):
      # (f + g)' = f' + g'
      return ExprNodes.AddNode(myexpr.pos,
        operator = '+',
        operand1 = deriv(myexpr.operand1, var),
        operand2 = deriv(myexpr.operand2, var))
   elif isinstance(myexpr, ExprNodes.MulNode):
      #(f * g)' = f * g' + f' * g
      return ExprNodes.AddNode(myexpr.pos,
         operator = '+',
         operand1 = ExprNodes.MulNode(myexpr.pos,
            operator='*',
            operand1 = myexpr.operand1,
            operand2 = deriv(myexpr.operand2, var)),
         operand2 = ExprNodes.MulNode(myexpr.pos,
            operator = '*',
            operand1 = deriv(myexpr.operand1, var),
            operand2 = myexpr.operand2))
   elif isinstance(myexpr, ExprNodes.NameNode):
      if myexpr.name == var.name:
         return ExprNodes.IntNode(myexpr.pos, value = '1')
      else:
         return ExprNodes.IntNode(myexpr.pos, value = '0')
   elif isinstance(myexpr, ExprNodes.ConstNode):
      return ExprNodes.IntNode(myexpr.pos, value = '0')

When used like this:

def eggs(a, b):
   return deriv(5*a+b, a)

it is translated at compile time to:

def eggs(a, b):
   return (((5 * 1) + (0 * a)) + 0)

2. Symbolic Differentiation with Templates

Generating nodes by explicitly calling the constructor of Node subclasses is verbose and tedious, and therefore error prone. All existing metaprogramming facilities that I'm aware of (C++, Common Lisp, Scheme, Dylan, R, Boo) allow the programmer to write a template where expressions can be substituted.

The proof-of-concept implementation has a simple template mechanism: each node is represented as a list, where the first element represents the node type, and the remaining ones the children. A better idea is the one proposed by Boo, but the one here is easier to implement, so its used in the proof-of-concept implementation. Using this mechanism, the above transform can be written:

deftrans deriv2 (myexpr, var):
   import Cython.Compiler.ExprNodes as ExprNodes
   if isinstance(myexpr, ExprNodes.AddNode):
      # (f + g)' = f' + g'
      return ['+', deriv2(myexpr.operand1, var), deriv2(myexpr.operand2, var)]
   elif isinstance(myexpr, ExprNodes.MulNode):
      #(f * g)' = f * g' + f' * g
      return ['+',
         ['*', myexpr.operand1, deriv2(myexpr.operand2, var)],
         ['*', deriv2(myexpr.operand1, var), myexpr.operand2]]
   elif isinstance(myexpr, ExprNodes.NameNode):
      if myexpr.name == var.name:
         return '1'
      else:
         return '0'
   elif isinstance(myexpr, ExprNodes.ConstNode):
      return '0'

3. Newton's Method

Given a way to compute a function and its derivative, Newton's method is a way to numerically approximate a root. We can use the above symbolic differentiator to help us.

First, we define the core Newton's Method function. This is a standard Cython function that doesn't use any metaprogramming. It takes a function and an initial estimate of the root. The function that's passed in must return a tuple of length 2: the value of the function, and the value of its derivative.

def newtons(f, x):
   while True:
      (val, deriv) = f(x)
      print "f(", x, ") =", val, ", f'(", x, ") =", deriv
      if abs(val) < 1e-10:
         return x
      x -= val / deriv

To produce a suitable function, we can use our earlier deftrans, with added support for subtraction and exponentiation by a constant (not shown):

deftrans exprAndDeriv(expr, var):
   return ['tuple', expr, deriv2(expr, var)]

We can then define a function:

def myfunct(x):
   return exprAndDeriv(5*x**3 - 10, x)

which is transformed to:

def myfunct(x):
   return (((5 * (x ** 3)) - 10), (((5 * ((3 * (x ** (3 - 1))) * 1)) + (0 * (x ** 3))) - 0))

If we Cython this and load it into a Python interpreter, we get:

>>> import TestTrans
>>> TestTrans.myfunct(10.0)
(4990.0, 1500.0)
>>> TestTrans.newtons(TestTrans.myfunct, 10.0)
f( 10.0 ) = 4990.0 , f'( 10.0 ) = 1500.0
f( 6.67333333333 ) = 1475.93037185 , f'( 6.67333333333 ) = 668.000666667
f( 4.46385893383 ) = 434.735082042 , f'( 4.46385893383 ) = 298.890548717
f( 3.00936301916 ) = 126.267956666 , f'( 3.00936301916 ) = 135.843986716
f( 2.07985587115 ) = 34.9852072625 , f'( 2.07985587115 ) = 64.8870066716
f( 1.54068464016 ) = 8.28568621842 , f'( 1.54068464016 ) = 35.6056374064
f( 1.3079774954 ) = 1.18847303518 , f'( 1.3079774954 ) = 25.6620769269
f( 1.26166506953 ) = 0.0415843883547 , f'( 1.26166506953 ) = 23.8769812152
f( 1.25992345957 ) = 5.73769235945e-05 , f'( 1.25992345957 ) = 23.8111068596
f( 1.2599210499 ) = 1.09734443754e-10 , f'( 1.2599210499 ) = 23.8110157797
f( 1.25992104989 ) = 0.0 , f'( 1.25992104989 ) = 23.8110157795
1.2599210498948732

4. Indexing an arbitrary-dimension array

Metaprogramming has many applications to speeding execution, by doing computation at compile time that would normally be done at run time. This example performs compile time loop unrolling for a hypothetical arbitrary-dimension array:

deftrans at(array, *indexes):
   expr = indexes[0]
   for i in xrange(1, len(indexes)):
      expr = ['+', ['*', expr, ['strides', array, str(i)]], indexes[i]]
   return ['array_get', array, expr]

Then

def array_example(myarray, i, j):
    return at(myarray, i, j, 55)

is transformed to

def array_example(myarray, i, j):
   return array_get(myarray, ((((i * strides(myarray, 1)) + j) * strides(myarray, 2)) + 55))

Use Cases

The various Lisp communities have developed a number of use cases. No doubt, if this proposal is accepted, many more will develop that we can't currently forsee. What follows is a list of existing use cases drawn from the Paul Graham book On Lisp, the standard libraries of the programming language R., as well as from the 400,000 line Lisp program QPX by ITA Software.

1. Parsing expressions

2. Alternate representations of classes

3. Moving computation from runtime to compile time

4. Simplifying syntax

If there's some idiom you find yourself using, rather than typing it out over and over again, you can create a transform to insert it for you.

This assumes some features I haven't described yet, like the ability for compile time variables to be saved then restored at runtime, and the ability to return statements when a transform is called as the top level of a statement expression.

5. Other optimization

To be continued.

Design Issues

1. Template format

I think the best way to go is the way proposed by Boo.

2. How to make this as natural as possible by avoiding explicit trees as much as possible

Related to template format, and perhaps the solution to "evaluating arguments more than once," if we go for "using the arg inserts its value; only if we do unevaluated(arg) do we get the tree" or something like that.

3. How to implement

4. How will this work with method?

The examples are global functions. How does compile time dispatch work with methods in a dynamic language, where the type of the object isn't known until runtime? Do we only allow this for typed variables?

5. Whether/how to unify compiler macros and SBCL's transforms with this enhancement

6. When you want to insert statements inside an expression

7. Defining new statements, function definitions, class definitions, etc.

8. Variable capture and hygienic macros

9. Evaluating arguments more than once

Related Systems

Sources of inspiration for design decisions can come from the macro systems in various Lisps, namely Common Lisp, Scheme, and Dylan. Scheme and Dylan macros, like C++ templates, are purely substitution based. Thus there are no loops; loops must be implemented using recursion.

Common Lisp has two "gotchas" that catch both those new to macros and experts alike. The first is variable capture. The second is evaluating arguments more than once.

Scheme solves the variable capture problem with what it calls "hygienic macros." Identifiers that are passed in as part of the arguments are resolved in the lexical environment where the macro is called & expanded. Variables generated in the macro itself are resolved in the lexical environment where the macro was defined.

The R programming language doesn't have macros per say, but all functions get their arguments unevaluated.

enhancements/metaprogramming (last edited 2009-03-01 01:02:24 by localhost)