Dask is actually similar to spark in that it will let you compute outside of you...

fifilura · on Jan 11, 2024

An analogy would be whether I should buy a 2 seated sports car or a family car when I know I have have to drive more than one person around.

I think I am a guy who prefer reliability and versatility over speed.

(When I was a kid you'd say Alfa Romeo vs Volvo)

benrutter · on Jan 12, 2024

I'm right there with you!

For disclosure I'm a minor contributor to dask so probably am a little biased.

I guess one side I probably haven't put forward though is that the memory footprint of something like dask/spark is higher because of its overheads. If you don't have scalable resources, then a polars / duckdb option would probably be your most reliable choice (I.e. the one that'll hit the fewest memory errors in the given architecture)

timost · on Jan 10, 2024

Nice video, thank you. It's interesting to see how each technology behaves depending on the scale of the dataset. Duckdb is definitely killing it.