[ANN] A PureScript Python Backend

thautwarm · February 16, 2020, 1:25pm

Hi community,

This is an announcement for a new Python back end of PureScript.

TL;DR: just reading the bold text in following paragraphs.

The main compiler repo is PureScript Python, in which I succeeded in reusing most codegen components of JavaScript back ends, including optimizations for JS, by slightly modifying CodeGen.JS and writing my own code emitter.

I’m one who learned a lot programming languages, and also created a lot. Also I’m a student who is studying under the supervision of some of the best researchers in the field of Programming Languages. Python is special to me, partly due to it’s the first language I digged deep in, however as time goes by, I do feel writing Python more and more disturbing.

I tried many approaches, Mypy, PyCharm, making statically typed programming languages, making Python code generators from functional programming languages like Idris. All of these workarounds have unsolvable problems, and I suffered a lot since these year. For instance,

Mypy: not strong enough, not higher kinded types, cannot use monad, verbose annotations…
PyCharm: even weaker than Mypy, just better when it comes to real time checking.
Making programming languages: type inference for Higher Rank Types turns out to be really horrible to me and I think it too difficult ot maintain a language by myself(seems no one in Python community cares about these problems).
Idris/Agda code generator: in lack of source code positions(hence I cannot redirect runtime errors to the original source code), and the restricted FFI is annoying though might be safer.

In terms of better languages like Haskell, they usually cannot provide a codegen backend interface.

I did suffer a lot, I did. However! However I finally got PureScript. This is not the first time I tried PureScript, 2 years ago I did but at that time I’m in my honeymoon with Haskell(now its horrible IDE support is dismissing me…)

PureScript solved all above problems in my previous workarounds had! And purescript has row polymorphisms, which is one of my favorite language features!

Since that I think PureScript shall be the answer! This feeling got stronger and stronger when I was enjoying PureScript IDE support in VSCode.

Then I feel responsible for making an actively maintained and highly available Python back end for PureScript.

I’d say I’m expert in Python, as I’m one of the very rare guys who totally understand Python execution models, and compiler internal stuffs such as mechanisms of runtime error reporting, and bytecode instructions. All of these does help to develop a good PureScript Python, for instance, I can redirect the runtime error to purescript source code, instead of the generated Python code.

So this is the reason why I think I’m suitable for making this work: I’m a PL researcher, a CPython expert, a previous Haskeller, and one who’s not satisfied with workarounds. I then started PureScript Python and after shorter than one week, I made it.

Currently I’m working on supporting PureScript libraries depending on foreign JavaScript implementations, and testing PureScript Python compiler at Testing-PureScript-Python.

There’re some progress, for instance, in purescript-python, now we can specify foreign implementation files in separate directories. I also have some ideas for distributions of multiple backends, and feel like to discuss if anyone has interests.

Besides, I’m looking for contributors and collaborators, such as who’re familiar with PureScript ecosystem, who use Python and who feel painful about Python.

Finally, I appreciate this community, and I appreciate developers of the PureScript compiler, if you visit Japan around Tokyo in the future, please let me know and I’d entertain you have a meal.

Your compiler developers’ design of PureScript’s alternative backends, is virtually a kind of art.

natefaubion · February 16, 2020, 2:50pm

Good work! I’ve always felt like Python would make a great, straightforward backend. I had never heard of pysexpr before (haven’t used python in a while), so was a little surprised by the emitted code . Seems like it has some advantages though. Have you considered having it output more standard looking python as an option?

thautwarm · February 16, 2020, 4:03pm

Hi, thanks for your kind words!

Have you considered having it output more standard looking python as an option?

This is an interesting question, and the answer is no. I can give you strong reasons why I use pysexpr, in following two respects:

Currently we can directly use some techniques to generate Python-like source code from Python code objects, for example, variable res here is a Python code object. Generating standard Python code is not a hard task.
Generating Python source code will lead to some unnecessary problems, such as
- Name collisions: Python keywords, and $ or . cannot be used as Python identifiers in source code level. We can see PureScript did a lot of stuffs dealing with name collisions, but for this Python backend, we don’t have to, which makes maintainence easier.
- Lack of expression-first: Python is statement-first, which is to say, statements definitely cannot occur in expressions, hence Python cannot have block expressions, and no assignment expressions before Python 3.8. Finally, code generation for constructs like while or block becomes very difficult, as we have to transform programs to applicative normal programs to translate expression-first programs to statement-first programs. This workaround has some further drawbacks, see section drawbacks of ANF transformations below.
- Cross-version support of Python. PySExpr is actually cross-version, and supporting all Python 3 versions used industrially(>=Python3.5). To make the Python source code cross-version, we have to use Python 3.5 only syntax, which means that some optimized Python VM instructions cannot be leveraged, such as FORMAT_VALUE for building interpolation strings much faster, and generating indirect jump instructions to greatly optimize pattern matching.
- Runtime error reporting linked to purescript source code, which is impossible if you directly generate Python source code. Basically this is what I think to be most important.
  This will give your very intuitive response when implementing FFIs(as FFI itself could be unsafe and produce runtime errors), and support many many wonderful debuggers(like PyCharm, pdb, etc.). A major reason that Haskell annoyed me is, no industrial debugger available, and causing verbose use of Debug.Trace. I want to use some debugger tools to step over/in/out, etc…

P.S: drawbacks of ANF transformations

This will produce unavoidable performance hits, check the code for a comparison between expression-first and statement-first in this post of mine, and the key comments are extracted here:

and you shall do optimizations including register reallocations to avoid using too many extra variables(variables not defined by users, but needed for gaining expression-first). Further, even if you apply many awesome optimizations, limited by the weakness of Python ASTs, you cannot still avoid redundant register allocations, while PySExpr via bytecode approach totally ends this problem.

kritzcreek · February 16, 2020, 8:11pm

Great work! One thing you might want to know is that you don’t need the extra .js file in the hello-world project for the IDE if you change the codegen setting in the plugin to corefn. It’s set to js by default which makes the compiler check for JavaScript FFI definitions.

I don’t know if there is a way to set project-specific plugin settings in VSCode though, so maybe keeping the .js file is the more robust solution for now.

thautwarm · February 17, 2020, 12:15am

Wow, thanks a lot for this information!

swizzard · February 20, 2020, 3:34am

have you considered something like Rpython as an IR?

thautwarm · February 20, 2020, 8:05am

You mean a python backend with JIT?
I’d say I’m very willing to do that after a full cpython compatible one got finished.