Towards Full Support of Python Syntax in Cython

Abstract

The Cython language makes writing C extensions for the Python language as easy as writing in Python itself, but it currently has some shortcomings.

I propose to improve Cython's compiler design by developing and integrating:

These enhancements will allow:

This is not a from-scratch implementation. I worked on similar projects before and many of the proposed ideas have been already tested in the CPython compiler and in pypy, another open source python compiler. (I participated in a pypy sprint in 2006 and worked on an AST peephole optimizer)

By modifying and adapting existing open source code and leveraging from my experience, I shall accomplish this project in the available 12 weeks.

The ability to maintain Cython in the long term will be significantly enhanced by these modifications.

Detailed description

Intro

The Python language has been demonstrated to be successful for its extendibility and his clean C API, besides at its intuitive and simple syntax. Many of the existing extensions are developed using the C language and this can be error prone and tedious which means longer development cycles or the rejection of developing the extension. Cython, as Pyrex before, focuses on solving this problem.

What is the Objective ?

What is Cython ? How it is related to Python?

What about the proposal ?

The current Cython implementation has some shortcomings. After studying Cython's design, I propose to improve Cython's compiler design by developing and integrating:

  1. a new Parser
  2. an Abstract Syntax Tree
  3. a Flow Graph

These enhancements will allow full parsing of all valid Python 2.5 code.

The central idea in my design is the construction of an Abstract Syntax Tree from the input source. The benefit of this approach are further underlined by noticing that Python itself adopted it during the 2.4 -> 2.5 transition PEP 0339

(More in the "How it will be done?" section)

Why the need of this project ? Why should be the project approved?

What will the accomplished project deliver? What is in it for the Cython project ?

These enhancements will allow:

  1. Simpler support of new features
  2. Full parsing of Python 2.5 syntax
  3. Better reasoning about the code for optimizations
  4. Pave the way to support Python 3000 features
  5. Improve performance: many well know optimizing techniques could be implemented, boosting extension speed execution

What is in it for the Python community ?

  1. Better usability: Faster prototyping, development and code re-use
  2. Better module support for the different Python API versions
  3. Easier porting for existing C code, giving the python community more extensions to use and leverage
  4. Improve performance: existing python modules could be translated to Cython

What is in it for the Open Source community ?

  1. Easier way of integrating python with existing C/C++ applications
  2. C Programs could leverage from the power of python in a faster way

What is in it for me?

  1. This work will be the base of my master's thesis on
    • "automatic code generation from dynamic language for high performance computers"
  2. Gain a better understanding of Cython and Python internals
  3. Improve my skill in writing optimizing python applications
  4. Learn more on Python and C / C++ integration
  5. Have fun with the community of Cython
  6. Be sponsored by my favorite company (Google)

How it will be done ?

Introduction

   1 def p_simple_expr(s):
   2     pos = s.position()
   3     expr = p_or_test(s)
   4     if s.sy == 'if':
   5         s.next()
   6         test = p_or_test(s)
   7         if s.sy == 'else':
   8             s.next()
   9             other = p_test(s)
  10             return ExprNodes.CondExprNode(pos, test=test, true_val=expr, false_val=other)
  11         else:
  12             s.error("Expected 'else'")
  13     else:
  14         return expr

   1 ast = parser.parse (source)
   2 ast.visit(AstVisitor())
   3 
   4 class AstVisitor:
   5     ...
   6     def visitIfStmt (self, node):
   7        node.test.accept (self)
   8        node.then.accept (self)
   9        node.else_.accept (self)

= Does the Project have sub-tasks ?=

Parser


Task
Adapt existing implementation of python parsers to Cython:
Goal
The Parser is able to parse the full Cython Test Suite (~200 tests, ~4176 line of code)
Estimated Time
1 - 2 weeks
Summary Description
  • creating the grammar
  • parsing the grammar Task Page: I can adapt this from a previous experience in a pypy sprint (see further section: Why me?) The parser will be generated from the standard Python grammar file, with additions for Cython keywords.
  • Abstract Syntax Tree


    Task
    Use the Parser to generate an Abstract Syntax Visitor
    Goal
    The Visitor runs and passes the entire Python and Cython test Suites (~200 tests, ~4176 line of code)
    Estimated Time
    3 - 6 weeks
    Description
    From the parsed structure
    Further informations

    http://wiki.Cython.org/enhancements/Ast/Mutator http://wiki.Cython.org/enhancements/Ast http://wiki.Cython.org/enhancements/Ast/Status

    Flow Graph


    Task
    Adapt the Flow Model to implement a Builder from the AST
    Goal

    The FlowGraph Logic is able to resolve the Locals test

    Estimated Time
    6+ weeks
    Description
    At this phase we have functional AST. To build a function's flow graph you create a Visitor, that hooks when he enter in a visitFuncDef Node:

    class FlowGraphGen(AstVisitor):
    
       def visitFuncDef(self,node):
           graph = FlowGraph(node)
           #...

    How do you know that your project is feasible?

    Methodology

    What is the Project Estimate ?

    Schedule

    http://wiki.Cython.org/FabrizioM/soc2008/Schedule

    Why Me ?

    1. I have done this before
    2. I am an experienced Python and C/C++ programmer
    3. I am competent about project's topic
    4. I am passionate
    1. I have done this before:
      • Many of the ideas of the Flow Graph and the base code comes from the pypy project.

        In Geneve 2006, I participated in the EuroPython Conference's pypy sprint and worked on the stackless back-end support. During the sprint I proposed to implement a peephole optimizer for the pypy AST. For evidence see http://codespeak.net/pipermail/pypy-svn/2006-July.txt and search for "misto" (my nick)

    2. I am an experienced Python / C/C++ programmer
      • Cython is written in python and I have 3+ years of experiance as a python programmer. The task also involves knowledge of C/C++ that I have. Finally, I have experience in writing compilers.
    3. I am competent about project's topic
      • I'm graduating with a master's degree in "Computer Science" from the University of Rome "La Sapienza". I am writing a thesis on "automatic code generation from dynamic language for high performance computers", where I dealt with and studied many of the project pitfalls. I presented a Talk at the first Python Conference in Italy.

        Title: Python & C/C++/Java/.Net : How To http://www.pycon.it/pycon1/schedule/speaker/1/ I described the preferred ways of integrating python with existing applications. I study the process of agile development with focus on the creative stage.

        I was speaker at the Italian Agile Day 2007 (http://www.agileday.it/index.php?page=speakers) and at the European Summer School of Agile Programming 2007 with the title: Software Creativity

        (http://essap.dicom.uninsubria.it/pmwiki.php?n=Main.FabrizioMiloSoftwareCreativity)

        and many other public conferences (Linux Day, BarCamp, Java Meetings, etc)

    4. I am passionate
      • I am passionate and highly motivated to accomplish this project, have strong communication skills and an innate drive to continue improve myself and the world.

    References

    FabrizioM/soc2008 (last edited 2009-03-01 01:02:25 by localhost)