Would be interested to see a similar comparison for a problem where the Python s...

oefrha · on March 15, 2021

> if the Python solution were unable to leverage its rich and well optimised standard-library

collections.Counter is a very straightforward collection backed by dict. The code isn't specially optimized, you can write the same code yourself with dict and even avoid (probably a negligible amount of) overhead.

Now, dict is optimized, but that's a builtin type. You don't need PSL.

https://github.com/python/cpython/blob/2fe408497e6838b6fb761...

MaxBarraclough · on March 15, 2021

The article provides profiling data of the optimised Python solution. If I'm reading it right, 90% of the time (1148 milliseconds out of a total of 1280) is spent executing the _collections._count_elements and str.split methods, both of which are Python standard-library methods presumably implemented in optimized C.

Of course, they're right to optimize this way, and it's to Python's credit that it's possible to leverage this approach so successfully using only the standard-library, but I think my earlier point stands.

oefrha · on March 15, 2021

I practically linked to _collections._count_elements already (linked Counter.update instead because I thought linking to the call site is clearer), but here it is: https://github.com/python/cpython/blob/2fe408497e6838b6fb761...

Again, very straightforward, based on dict. Python itself does not function at all without the dict implementation since it's the underpinning of the object model.

str is a builtin type. Calling str.split() a standard library method is... okay.

benhoyt · on March 15, 2021

That link shows that yes, "_count_elements" is implemented in Python, but right after it pulls in the C version if available (which it presumably is in CPython):

   try:                                    # Load C helper function if available
       from _collections import _count_elements
   except ImportError:
       pass

Here is the C version (with a comment a little way down describing the "fast path advantages": https://github.com/python/cpython/blob/93d33b47af70ede473f82...

MaxBarraclough · on March 15, 2021

Well spotted, thanks.

MaxBarraclough · on March 15, 2021

Thanks for the link, interesting that _count_elements is written in Python. At a glance it looks like the kind of thing that might benefit from being written in C, but presumably that's not the case. Perhaps because of the dynamic typing of both arguments?

> Python itself does not function at all without the dict implementation since it's the underpinning of the object model.

Sure, but I don't see the point here.

> Calling str.split() a standard library method is... okay.

For our purposes there's no reason to draw a distinction between standard-library functionality closely integrated into the language, and standard-library functionality that isn't. The relevant point is that computational work is being handed off to optimised C code.

Again, if you're optimising Python, the smart move is indeed to make maximal use of the optimised machinery in the standard-library (or for that matter some other dependable library). My point still stands: it would be interesting to look at a problem that isn't amenable to this approach, where the performance of the Python interpreter itself would be brought to the fore by necessity.

How easy it is to find such a problem will be a function of how good a job Python does of providing applicable optimised functionality in its standard-library. Perhaps something like computing matrix determinants? I don't know enough Python to say.

acdha · on March 15, 2021

It can be interesting – especially if you also compare it to something like the PyPy JIT – but it runs afoul of the idiomatic angle since Python has had decades of recommended practice being to write code in Python and optimize the hotspots in C, which has led to very popular libraries like NumPy, PIL/Pillow, etc. becoming almost universal in community recommendations.

MaxBarraclough · on March 15, 2021

Right. They don't emphasise improving the performance of the Python interpreter precisely because they prefer to handle computational heavy-lifting in C instead. Python has been very successful with this approach. Java takes the opposite approach, promoting 'purity' and investing very heavily in JVM performance.

It would still be interesting to see for benchmark purposes.

acdha · on March 15, 2021

Agreed — and especially so if you could see how common it is that CPython performance tanks but PyPy (or Jython?) does not after the JIT kicks in.

heavyset_go · on March 15, 2021

> Perhaps because of the dynamic typing of both arguments?

It wouldn't matter, you'd be dealing with PyObject pointers in C anyway.

MaxBarraclough · on March 15, 2021

Right, that's what I meant. If the C code has to cope with dynamic typing anyway, there might be little scope to speed things up.

The same isn't true for, say, string operations, where there might be plenty of opportunity to move tight loops from Python to C and speed things up considerably.

heavyset_go · on March 15, 2021

Truthfully, I don't think optimization was a motivation for writing _count_elements() in Python or not. The CPython project doesn't prioritize optimization of the language or standard library unless there are clear performance gains to be made in the face of slow code. What most likely happened is someone wrote some code on a mailing list or bug report, and a maintainer included it as is.

Reading the bug report that introduced Counter[1], the author states it is the simplest implementation they came up with, and Guido and other maintainers prioritized simplicity.

[1] https://bugs.python.org/issue1696199

MaxBarraclough · on March 15, 2021

Interesting, thanks.

MaxBarraclough · on March 15, 2021

(Too late to edit) benhoyt's comment points out that _count_elements has two implementations, one in Python and one in C. The Python version is used only if the C function is not found.

musingsole · on March 15, 2021

I often wonder how fast I'd run a race if I shot my foot too.