Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Group-By from Scratch (jakevdp.github.io)
2 points by keewee7 on Sept 6, 2022 | hide | past | favorite | 3 comments


From embedded C programming to web development in JavaScript (and everything between) I often find myself implementing operations involving multiple data structures that would have been an elegant group-by/join query in SQL.

Are there any standard idioms or design patterns for elegantly/correctly implementing relational operations (like group-by and join) in procedural languages like C?


If you are comfortable with translating monadic Haskell code to C for-loops[0], you may be interested in the academic line that wound up as LINQ in C#. It starts with Grust[1], but on the way to industry you ought to be able to find some nice papers dealing with group-by[2] and joins[3].

[0] compare how the Python VM implements list comprehensions to understand the general strategy.

[1] https://db.inf.uni-tuebingen.de/publications/2003/grust/mona...

[2] the tricky part about group-by being fiddling with the bindings. cf https://www.microsoft.com/en-us/research/wp-content/uploads/...

[3] squinting enough, joins (esp. equijoins) are just really fancy zips.


I haven't looked at numpy/Numeric in about 2 decades, but back then there were a few undocumented functions which were suitable for nested data-parallel aggregation. I had assumed someone would surely have documented them in the meantime, but TFA didn't make any attempt to use anything of the sort...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: