Hacker News new | past | comments | ask | show | jobs | submit login
Simple, Efficient, and Robust Hash Tables for Join Processing (cedardb.com)
40 points by mau 7 months ago | hide | past | favorite | 3 comments



Does the (build side) hash table have to entirely fit in RAM with the current implementation?


Yes, the way it is currently implemented, the build side has to fit into RAM. There is no inherent reason we couldn't also spool to disk, but we haven't implemented that yet.


Thanks for confirming. (I deliberately worded my question like that, as it makes sense to roll such features out in phases, just like plenty of others have done - off the top of my head, DuckDB and Apache Impala for example.)

Edit: In the post you mentioned that you optimized the hot path for likely taking the non-matching record path. Sometimes with well-designed partition wise joins, most of the records actually do match and survive the join - I guess in such (estimated or detected) cases you could switch to an alternative path with a match being the likely branch in the hot path…




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: