There's a DuckDB and Polars and similar tools now, which can finally outperform the venerable unix tools. Once the data is on the laptop you can get order of magnitude faster execution than Spark.
The unsolved problems are 1) what if the data and what you do with it suddenly doesn't fit on a laptop (giving everyone 64 GB RAM laptops for example seems like a waste) and 2) how do you deliver the relevant subset of the data from the petabyte place where you store it to the laptop.
If someone could solve that, Spark could finally go to hell.
The unsolved problems are 1) what if the data and what you do with it suddenly doesn't fit on a laptop (giving everyone 64 GB RAM laptops for example seems like a waste) and 2) how do you deliver the relevant subset of the data from the petabyte place where you store it to the laptop.
If someone could solve that, Spark could finally go to hell.