Hacker News new | past | comments | ask | show | jobs | submit login

Dask doesn't solve that problem since it's a wrapper around pandas functions.

If you can't make the core pandas code decently fast, dask won't save you.




dask.dataframe might not help but dask.distributed could in that case.

I've had success using it on non vanilla stuff (i.e. code that could not get converted to play natively with numpy/pandas structures)

As a bonus, the nice profiling tools (built within dask) have also helped me improve the performance of the code.

See https://distributed.readthedocs.io/en/latest/


There’s dask.array which works on numpy arrays instead of dataframes. Otherwise, your argument holds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: