I recently took a computer vision course taught in both Python and C++, students...

FpUser · on April 2, 2023

Python is a glue language. It is one thing to use it to string together bunch of algorithms written in C. But somebody's got this genius idea of using it as a general language and now we have this backend frameworks that are 2 orders of magnitude slower than they should be.

justinclift · on April 2, 2023

Not just the speed, but the memory use.

A popular approach for Python web serving is to launch a number of "workers" (eg via gunicorn, etc), that hang around waiting to serve requests.

Each one of these workers in recently running code (here) idled using ~250MB of non-shared memory. With about 40 workers needed to handle some fairly basic load. :(

Rewrote the code in Go. No need for workers (just using goroutines), and the whole thing idles using about 20MB of memory, completely replacing all those Python workers. o_O

This doesn't seem to be all that unusual for Python either.

berkle4455 · on April 2, 2023

In a forking model that shouldn’t be the case, I guess all the workers are loading and initializing things post-fork that likely could have been accomplished pre-fork?

That said, Python devs are some of the worst engineers I encounter, so it’s not surprising things are being implemented incorrectly.

wolfgang42 · on April 2, 2023

Last I heard, forking wasn’t a very effective memory-sharing technique on CPython because of the way it does reference counting: if you load things in before you fork, when the children start doing work they update the refcounts on all those pre-loaded objects and scribble all over that memory, forcing most of the pages to be copied anyway.

There seems to have been some recent-ish work on improving this, though: https://peps.python.org/pep-0683/#avoiding-copy-on-write

midasuni · on April 2, 2023

The ffmpeg patches of a few lines of assembly that make 10-100 fold improvements on slow C amaze me

bsenftner · on April 2, 2023

I worked on an facial recognition system written in C++ that used SIMD optimizations to achieve 25M facial compares per core per second. On that same system I wrote an optimized ffmpeg player library that only consumed one core while doing a few hundred frames per second with 4K video.