Hacker News new | past | comments | ask | show | jobs | submit login

First off, thank you for all your contributions to Python!

I completely take your point that there are many places where this approach won't fit. It was a surprise for me to trace the performance issue to allocations and GC, specifically because it is rare.

WRT boxing and unboxing, I'd imagine it depends on access patterns primarily - given I was extracting a small portion of data from the AST only once each, it was a good fit. But I can imagine that the boxing and unboxing could be a net loss for more read-heavy use cases.




You could create a custom C type that wrapped an arbitrary AST node and dynamically created values for attributes when you accessed them. The values would also be wrappers around the next AST node, and they could generate new AST nodes on writes. Python objects would be created on traversal, but each one would be smaller. It wouldn’t use Python lists to handle repeated fields It seems like a non-trivial implementation, but not fundamentally hard.

The analogy with numpy doesn’t seem quite right, as Raymond observes, because numpy depends on lots of builtin operations that operate on the underlying data representation. We don’t have any such code for the AST. You’ll still want to write Python code to traverse, inspect, and modify the AST.


Very fair points. For general purpose ASTs from Python your design should be more efficient while essentially keeping the existing interface.

When I referenced numpy, I was thinking of a query layer which could push traversal into the extension as well. Something that could have given me “.select(ast.Import).all()”, which in my head is kind of like doing a filtered sum in numpy.

Very cool to get your thoughts on this, thanks for making an account :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: