Hacker News new | past | comments | ask | show | jobs | submit login
Exploit custom codecs to write inline C in Python (github.com/georgek42)
190 points by ssize_t on Feb 2, 2020 | hide | past | favorite | 48 comments



Wait... looking at this, doesn't this mean that by deliberately misusing the `codec` module, meta programming at compile-time (somewhat like Lisp macros) is possible in Python?

By using a custom parser, people are able to add custom syntax in python (which is... undoubtedly unpythonic) with just a decorator(or a magic comment) on front. I can't wait to start abusing this... :-)

Edit: Looks like a pattern matching syntax that hooks in pampy would be my first hacking project :-)

    # coding: patmatch
    import patmatch
    
    input = [1, 2, 3]
    patmatch input:
        case [1, 2, x]:
            print(f"{x} is matched!") #=> 3 is matched!
    
    def fibonacci(n):
        patmatch n:
            case 1: return 1
            case 2: return 1
            case _: return fibonacci(n - 1) + fibonacci(n - 2)


Yes, and you can do it with import hooks as well.

The reason you don't see it in python much is more a matter of culture: we don't like too much magic nor DSL.

With the new generation of coders incomming, it may change though.

Hope not, as most devs are terrible at designing magic or languages and equally terrible at admitting they are.

It took a decade for the trend of monkey patching things in ruby and js to disapear and I don't regret that time one bit. Bad habits die hard.


Probably not so much a matter of culture, but more that the mechanism is a) garbage b) high effort.

I bet if either of those two weren't true, you'd see a lot more of it.


I think you might be putting the cart before the horse so to speak. The culture of python doesn't value these things so the only mechanism is high effort garbage.


There is truth in that, but also a bit of "he who can't do what he wants must want what he can do". The only reason python still matters (more than ruby) is because of its dominance in ML/datascience. And this ecosystem is full of nasty syntactic hacks.


I can't speak for other people but the only reason I still use Python is because of it's syntax (but the "one way to do it" is slowly changing). You get much better concurrency with Golang and even better parallelism with Julia.

It is dominant in data science but still some people use it for other things because the syntax is so simple (Scheme is a close second for me).


If you mean the pseudo-code like quality that (sadly) remains a distinguishing feature, I completely agree.

Simplicity, as in terms of specifying the grammar though? Python's grammar is exceedingly gnarly by now and I'm pretty sure Go, Lua, Prolog, Smalltalk and pretty much every Wirth language are massively simpler, syntactically.


Yes, I meant the pseudo-code quality of Python (I wish Wikipedia would use Python as it's pseudocode :)).

I completely agree that Python is pretty hard to implement. Perhaps you have heard that simple doesn't mean easy ? I could easily implement a Forth (or assembly) interpreter. But it isn't that easy to understand a big complex Forth program (compared to Python). Simple languages like Forth and assembly are too unstructured for me.


I'm sceptical of the idea that the good things about python make it hard to implement. Compared to forth – sure! But most of the non-simplicity beyond what's attributable to not being an ultra-bare bones languages comes from organic growth and bad design decisions.

From the top of my head, here's a list of things that were awesome in python's design:

1. Clean and concise syntax (whitespace, slices, rest and kwargs, multiline and raw strings, although both badly designed were ahead of the time)

2. good default builtin datatypes, with decent API (cf garbage like cons cells or STL)

3. in particular strings as immutable vector of bytes, no seperate character type (this got screwed up in python3 of course)

4. comparison on compound datatypes works out of the box

5. relative uniformity no value vs reference type; objects mostly just dicts, modules as well

6. repr and repl (both gimped, compared to lisp, but good enough to be super helpful in development), good tracebacks

7. anti-footgun conventions (notably: mutating functions return None convention; mutable -> no __hash__ convention; slicing copies convention)

Nothing in this list seems particularly hard on the implementor, but it's of course hardly exhaustive or unbiased. What things that you really like about python do you think are rather hard to implement?


Implementing a minimal Python is pretty easy (even with the huge standard library) but implementing a complete Python implementation is a herculean task. And even after that there is no guarantee that every Python library will run on your implementation (Pypy).

I don't know about you but I would rather make hundreds of Scheme and Forth implementations then one Python interpreter. But I don't really see that as a downside. I mean who decides on a language based on how easy it is to implement ?


Well, all things being equal a cleaner and more understandable language is going to be easier to implement, so I don't think difficulty of implementation should be dismissed so easily. Especially when the difficulty of implementation is not due to advanced features such as a fast runtime, resumable exceptions, proper tail calls, multi-method dispatch, pattern matching, powerful concurrency abstractions, SMP...


Absolutely :) I "discovered" this when researching implementing a new language on top of python. The decorator is completely unnecessary, so you can have a complete language with custom features built entirely on top of the python interpreter.


Speaking of building on top of CPython, you can also go another route with compiling however crazy syntax you can come up with into its bytecode. See e.g. https://pyos.github.io/dg/


I've been toying with the idea of implementing some sort of DSL on top of python but wouldn't really know where to start.

Would you have any pointers (heh) given the discovery you made?


I think the `#coding:` approach is promising. You have to fit your parsing/transpilation into a pretty small time budget (which is a fun challenge on its own) to keep the startup time hit negligible. The source code gets reparsed serval times in certain cases, e.g. when printing a stack trace, so its a good idea to have some sort of caching mechanism.

As far as parsing goes, if you want to stick with python I've had good success with pyparsing [1], otherwise I have a strong preference to do language-related things in OCaml with menhir [2]. I've toyed with wrapping the Python parser in an OCaml library with decent success [3]. But, of course, unless you're optimizing for fun, it's probably a good idea to stick with Python.

Another, weirder, approach is creating a language that happens to be able to be parsed by the same grammar as python, then using a decorator (or similar) to get the source code or the ast to reparse and transpile. As an example you could have something like `a <- b` be the syntax for an actor model message receive dsl (which is valid python).

[1] https://github.com/pyparsing/pyparsing [2] http://gallium.inria.fr/~fpottier/menhir/ [3] https://github.com/georgek42/pyparser


Thanks a mil for the tips.


Syntactic is a module specifically for doing this.

https://github.com/metatooling/syntactic


i would give anything to have a modern switch statement added to python. the justification for not having one does not hold water anymore. we even have enums in the standard library but nobody uses them -- afaik -- because without a switch statement, enums are not really that useful...


A dispatch dictionary covers many of the typical usecases? https://alysivji.github.io/quick-hit-dictionary-dispatch.htm...

Though I have missed switch statements a few times in Python, when implementing state machines - for parsing or sensor/actuator logic. Though these tended to be quite simple and not performance-critical, so a if/elif sequence was just fine.


Why would switch be useful? The interpreter would still need to evaluate in order like if/elif/else.


For similar reason why loops are useful even though we could just jump to addresses.


No, those aren't similar reasons. In this case you're just changing the word `elif` to `case`, with no semantic difference.


Someone can finally implement multiline lambdas!


Python DOES have multiline lambdas! It's a common misconception that lambdas have to all be on one line. They don't. What lambda's can't have are statements. Lambdas are restricted to a single expression due to their implicit return statement, but that expression can have any number of line breaks in it. I still shake my head when I hear this complaint about Python, because when coding in the functional style (which is when you need lambdas), expressions are all you need! Haskell and friends don't even have statements, just expressions. My Drython experiment (though unpythonic) demonstrates just how far you can stretch Python's lambdas. Check out its readme for examples: https://github.com/gilch/drython

Hissp, my Lisp-on-Python project, uses multiline lambdas to simplify its compilation target to a functional subset of Python: https://github.com/gilch/hissp

Hebigo is a skin built on Hissp with a more Python-like syntax and macros based on Drython. It's what Python would be like if its statements were composable like its expressions are: https://github.com/gilch/hebigo


Is there some way of simulating expression-based "let" in Python? Without that lambdas are very weak.


Recall that Python has assignment expressions `:=` now: https://www.python.org/dev/peps/pep-0572/

Even before that, I had a "let" in Drython: https://github.com/gilch/drython/blob/master/drython/stateme...

In Haskell let and lambda have a subtle distinction due to the type system, but in Lisp and dynamically typed languages like Python, "let" can simply be lambda definition that is called immediately.


You can actually just use a regular function declaration in the middle of arbitrary Python code if you want. Does that count as a multi-line lambda?


No.


not trying to shitpost - I'm legit curious. What's the difference between the two? Like in this snippet, are the two resulting functions different from each other in some way that I haven't considered?

    def parent_function():
        output_function_a = lambda x,y: (x + 23) * y

        def output_function_b(x, y):
            return (x + 23) * y 
        
        return output_function_a, output_function_b


The true comparison is

    def parent_function():
        def output_function_b(x, y):
            return (x + 23) * y 
        
        return lambda x,y: (x + 23) * y, output_function_b


Python lambdas don't have names, and can be defined in expression context (as in, 'map(lambda foo: bar, baz)').


I wrote a similar piece of code called MagicCodec back in 2008, including a zhpy codec(Chinese Python) and a Lua codec, which translates Lua source code to Python. (Blog content in Mandarin Chinese, with sample code http://weijr-note.blogspot.com/2008/02/python-magiccodec-01....) There was a time that I am slightly interested in Chinese Python for education purpose. Another interesting approach is using ctypes to replace keywords in CPython implementation, http://weijr-note.blogspot.com/2011/06/python-32-keyword.htm... (Also a blog post in Mandarin Chinese with sample code, which is also in Mandarin Chinese)


Oh never heard of the # codec declaration

relies on https://www.python.org/dev/peps/pep-0263/ it seems (this is from 2001, maybe there's more recent)


I envy you py3 kids...

encoding is a huge problem in py2


> encoding is a huge problem in py2°

Yes...exactly! Good thing that it's dead now, every meaningful library has either been updated or replaced and nobody is using it anymore, right? Like Windows 7! /s

Seriously tho, what's keeping you from updating? Genuinely curious at this point...


> what's keeping you from updating? Genuinely curious at this point...

My boss doesn't pay the upgrade bills. Only adding new features are billed. Sad.

And there are thousands of real-time users on the system and their task can not be interrupted, also if there is any py2 and py3 mismatch, the outcomes would be quite costy.


Interesting feature abuse. Perl's had various Inline modules for some time, but via a deliberate feature. https://www.perl.com/pub/2001/02/inline.html/


Indeed it is. Guido himself maintained a project called Pyxl3 [1] for a while (forked from one at Dropbox during his tenure there) that abused this codec feature to implement something similar to JSX within Python source.

[1] https://github.com/gvanrossum/pyxl3


Kinda sad when you think that CL could implement this feature in a pretty simple macro.


Python syntax can be heavily and easily manipulated via the ast module. An example of this is hylang https://github.com/hylang/hy


This has got to be the least-pythonic thing I've ever seen.


I read it and immediately thought "Hey, I remember doing this in Perl in the late '90s or early 2000s!" - very nostalgic...


I have also used this ability to make a dsl with https://noseofyeti.readthedocs.io (rspec style "describe", "it" for writing python tests).

It's a cool technique, but quite a lot of effort and can break a bunch of tooling (getting my codec to work with black for example was... fun)


I guess one could also use a lightweight compiler like TCC and write a python module with it. That's a good idea for a side project.


https://github.com/metatooling/syntactic

is a framework for creating custom Python syntax with the codec trick.


Guess I'm adding Python to my resume. Nice work


nice job


thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: