Simple, Efficient, and Robust Hash Tables for Join Processing

tanelpoder · 2024-06-08T17:23:57 1717867437

Does the (build side) hash table have to entirely fit in RAM with the current implementation?

lvogel · 2024-06-08T17:56:42 1717869402

Yes, the way it is currently implemented, the build side has to fit into RAM. There is no inherent reason we couldn't also spool to disk, but we haven't implemented that yet.

tanelpoder · 2024-06-08T18:28:27 1717871307

Thanks for confirming. (I deliberately worded my question like that, as it makes sense to roll such features out in phases, just like plenty of others have done - off the top of my head, DuckDB and Apache Impala for example.)

Edit: In the post you mentioned that you optimized the hot path for likely taking the non-matching record path. Sometimes with well-designed partition wise joins, most of the records actually do match and survive the join - I guess in such (estimated or detected) cases you could switch to an alternative path with a match being the likely branch in the hot path…