> the point of this post isn’t to compare highly-optimized Python to highly-opti...

wenc · on Oct 21, 2023

Pandas can be pretty fast, but DuckDB and Polars are both faster than Pandas. DuckDB supports vectorized and parallelized operations on Pandas dataframes, while Polars is written in Rust.

I feel though the killer is that inner loop where dataframe operations are being performed across a large number of iterations, and there's significant overhead there.

For-loops are usually not the most performant solution in Python.

I could be wrong, I feel that there's a SQL way to answer that question, in which case DuckDB might be able to exploit vectorization, parallelization, indexing and query optimization across the dataset in one fell swoop.

twelvechairs · on Oct 21, 2023

Numba is another excellent Python option for speedier python solutions

eskaytwo · on Oct 21, 2023

DuckDB can also operate on Polars data frames, and return the results to Polars data frames.

shoo · on Oct 21, 2023

as a rule of thumb, pure python code is often 1000x slower than naively written unoptimized native code.

code in pandas can be very slow (standard pure-python speed) or "fast-for-python" depending on if you are going with or against the grain. a pandas dataframe is basically a bunch of numpy arrays, one array per column. if you do columnar calculations that can be reduced to numpy operations, like summing over a column, then numpy will execute the operation in native code, and it will be fast-for-python, but there will still be some overhead due to wrapping things in python, as well as perhaps temporary array allocation etc.

if you do something against the grain, such as expressing all your pandas calculations as on row-wise operations instead of column-wise operations, or using "apply(lambda x: pure_python_expression_of(x))", then pandas cannot execute it efficiently, as the operations are going against the grain of how things are stored in memory, and the operations cannot be reduced to native code primitives on the column arrays that are implemented in numpy.

another alternative to switching to rust is using cython to define a native python module. by starting with python code and using many of the same optimization rules of thumb in this post (static typing! avoid frequent tiny allocation, preallocate stuff! avoid hashing complex things, prefer arrays with indexes!), you can translate idiomatic (and very slow) pure python code into simple code that looks closer to array-oriented fortran-in-C, that runs very fast and compiles to a native python module that is easy to integrate.

IshKebab · on Oct 21, 2023

1000x?

I think it's more like 50-200x in my experience. Which is still crazy slow, but not 1000x slow.

eskaytwo · on Oct 21, 2023

Pandas is ok for numerical, but Polars (rust-based) is absolutely the way to go for big datasets. The article is fascinating if you're a Python developer and you need to stray off the path of things Polars can do.

iwonthecase · on Oct 21, 2023

Yeah, Pandas would provide a pretty big improvement, but I wonder how much of an improvement one could get even by just replacing the c-style loop:

> for qs in combinations(all_qs, K):

> > ...

> > corrs.append({'qs': qs, 'r': r})

>

> corrs.sort_values(...)

with a python style list comprehension:

> def build_q(combination):

> > ...

> > return {'qs': qs, 'r': r}

>

> max(build_q(c) for c in combinations(all_qs, K), key = lambda v: v['r'])

georgeecollins · on Oct 21, 2023

I like Python too but let's face it, it is not performant. Why do we have to pretend otherwise?

wenc · on Oct 21, 2023

I think everybody agrees and is not pretending otherwise.

But my thing is, I think the problem could be solved in a different way in Python where we can use performant libraries to get an answer in a reasonable time.

The author says the naive Python implementation with a for-loop takes 36 milliseconds per iteration, and the problem requires 2.5 billion iterations (= 2.9 years, which is unreasonable) while optimized Rust takes 8 mins (corrected).

I believe we can solve the problem in Python in a reasonable amount of time (not 2.9 years) by expressing it differently. And I believe we can do it in Python without trying to optimize Python operations like the author is doing with Rust.

Imagine your boss came up to you and said I need the answer by this week and that you could only use Python, you would need to come up with a way to solve it. I wouldn't start by trying to optimizing Python's for-loop -- I would break out of the loop paradigm altogether and use arrays, database indices, optimized dataframe libraries (probably written in C++ or Rust) to get there. Because Python is not fast -- everyone knows this -- Python programmers will often think of other ways (generally reaching for libraries) to solve the problem.

bibmbop · on Oct 21, 2023

8 minutes.

tetsuhamu · on Oct 21, 2023

The industry agreed to standardize on Python for the task of describing compute-graphs that get executed by compute engines implemented in something other than Python. Python is not meant to be used for the computation itself.

FridgeSeal · on Oct 21, 2023

And then other people came along, using other languages, and realised we could do the same compute, with a fraction of the hardware, in about the same time, in the languages they’re already using, which really diminishes the appeal of Python, which boils down to:

“Setup some infra, and get someone who knows how to operate it, and then write new code in python, and babysit it as it ossifies into a core piece of infrastructure”

_or_ optimise it in a language you’re already using, or is semi-common in your stack already, and then move on with things.

bsder · on Oct 21, 2023

Yeah, this seems like really unoptimized Python. It would be far more useful to see tricks for torturing Python/Pandas to make things faster and how far you can take it.

The author wrote in a strongly typed, ahead of time compiled language, ran a profiler to help wipe out hotspots for optimization, and the program is now dramatically faster than unoptimized Python.

Yay? I guess?

firecraker · on Oct 21, 2023

I suppose there are already many articles showing how to speed calculations by avoiding/optimizing pandas.

It does feel a little unfair a comparison. Everyone knows that for loops are slow in python.. as is much of the core library. But pushing analysis to c using pythonic APIs (numpy/numba/pytorch) is fairly trivial

hobs · on Oct 21, 2023

I would also be curious given the Rust implementation gets a decent amount of iteration - 30x faster natively though aint bad.